Deploying OpenShift smarter: GPU nodes for test clusters
This is a follow-up to my original post from November 2023 and another from Sept 2024. A lot has changed since then — not in how OpenShift installs, but in my AWS environment and in some installer requirements that crept in quietly.
This post covers the problems I hit deploying OCP 4.21 after not doing this since 4.16, and the changes I made to end up with a cheaper, leaner dev/test cluster.
What broke since last time
It’s been 6-12 months since I remember installing Openshift on AWS. Two things had changed that caused immediate failures.
1. Subnet tagging is now required
The installer now validates subnet tags before proceeding. If your subnets are untagged, you’ll get a fatal error before it even starts. Previously the installer would just tag your subnets itself — now it checks first.
Subnets you’re handing to the installer, ie those in your install-config.yaml, need to be tagged:
1
2
3
aws ec2 create-tags \
--resources subnet-xxxxxxxxxxxxxxxxx subnet-xxxxxxxxxxxxxxxxx \
--tags Key=kubernetes.io/cluster/ocpcluster,Value=shared
Use your cluster name from metadata.name in place of ocpcluster.
Any other subnets in the same VPC that the installer should ignore need this tag:
1
2
3
aws ec2 create-tags \
--resources subnet-xxxxxxxxxxxxxxxxx \
--tags Key=kubernetes.io/cluster/unmanaged,Value=true
If you skip this step, the installer will fail with an error about untagged subnets in the VPC.
2. Corporate IT had tightened my IAM permissions
My AWS credentials hadn’t changed, but the policy attached to my IAM user had. The new policy uses a NotAction block that explicitly excludes iam:*User* and iam:*AccessKey* operations (except on my own user). The installer’s default mint mode for the Cloud Credential Operator (CCO) needs to create and delete IAM users for each cluster component — so it now fails hard with:
1
2
3
WARNING Action not allowed with tested creds action=iam:DeleteAccessKey
WARNING Action not allowed with tested creds action=iam:DeleteUser
FATAL failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Permissions Check": validate AWS credentials: current credentials insufficient for performing cluster installation
The fix is to add credentialsMode: Passthrough to your install-config.yaml. In passthrough mode, the CCO copies your existing credential to each cluster component instead of minting new scoped IAM users. It never needs iam:CreateUser or iam:DeleteUser.
1
credentialsMode: Passthrough
This is a supported and documented mode, and for a dev/test cluster it’s a perfectly reasonable trade-off. The main thing to know is that your AWS credential becomes a long-lived dependency of the cluster — don’t rotate or delete it without updating the cluster secret first.
Cost optimizations
The original setup deployed 3 control plane nodes and 3 workers. That’s 6 EC2 instances running at all times. For a dev/test cluster this is overkill. Here’s what I changed.
Drop to 1 master and 1 worker
For dev/test, a single master and single worker is fine. A 3-node control plane gives you etcd quorum tolerance — if you don’t care about that (and for a throwaway lab cluster, I don’t), 1 master works.
Use Spot instances for workers
Workers are safely replaceable. The MachineSet will automatically provision a new one if a Spot interruption occurs. In us-east-1, an m5.xlarge Spot instance typically runs 60-70% cheaper than on-demand.
Important: do not use Spot for the master node. If it gets interrupted, the cluster is dead. Keep the master on on-demand.
Single availability zone
Running across multiple AZs incurs cross-AZ data transfer costs. For a dev/test cluster, just pick one AZ. This also means you only need 2 subnets (one public, one private) instead of 4.
Here’s my updated install-config.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
additionalTrustBundlePolicy: Proxyonly
apiVersion: v1
baseDomain: my-f5.com
credentialsMode: Passthrough
compute:
- architecture: amd64
hyperthreading: Enabled
name: worker
replicas: 1
platform:
aws:
type: m5.xlarge
spotMarketOptions: {} # Spot pricing - ~60-70% cheaper than on-demand
rootVolume:
size: 100
type: gp3
zones:
- us-east-1a
controlPlane:
architecture: amd64
hyperthreading: Enabled
name: master
replicas: 1
platform:
aws:
type: m5.xlarge # On-demand - spot interruption = dead cluster
rootVolume:
size: 100
type: gp3
zones:
- us-east-1a
metadata:
creationTimestamp: null
name: ocpcluster
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
machineNetwork:
- cidr: 10.0.0.0/16
networkType: OVNKubernetes
serviceNetwork:
- 172.30.0.0/16
platform:
aws:
region: us-east-1
subnets:
- subnet-xxxxxxxxxxxxxxxxx # Private subnet - us-east-1a
- subnet-xxxxxxxxxxxxxxxxx # Public subnet - us-east-1a
publish: External
pullSecret: 'my-pull-secret'
sshKey: |
ssh-rsa xxx...yyy imported-openssh-key
On-demand GPU nodes — deploy when you need them, delete when you don’t
I occasionally need to test an app that requires a GPU. Running a GPU instance full-time is expensive — a g4dn.xlarge (1x NVIDIA T4) runs about $0.53/hr on-demand, or around $0.17/hr on Spot. The right pattern here is to create a MachineSet for the GPU node after the cluster is up, scale it to 1 when you need it, and scale it back to 0 when you don’t.
When scaled to 0, the EC2 instance is terminated — you pay nothing for compute. The only ongoing cost is the EBS root volume that remains (~$10/month for 100GB gp3). If that bothers you, delete the MachineSet entirely and recreate it next time.
Creating the GPU MachineSet
Use an existing MachineSet as your template:
1
2
MACHINESET_NAME=$(oc get machineset -n openshift-machine-api -o name | head -1)
oc get machineset -n openshift-machine-api -o yaml $MACHINESET_NAME > gpu-machineset.yaml
Edit gpu-machineset.yaml and make these changes:
- Change the
name(in bothmetadata.nameand the selector labels) to something likeocpcluster-gpu-us-east-1a - Set
replicas: 0 - Set
instanceType: g4dn.xlarge - Add a GPU label to the node so workloads can target it:
1
2
3
4
5
6
spec:
template:
spec:
metadata:
labels:
node-role.kubernetes.io/gpu-worker: ""
Apply it:
1
oc apply -f gpu-machineset.yaml
Install the NVIDIA GPU Operator
Before scaling up, install the NVIDIA GPU Operator from OperatorHub. It handles driver installation automatically via a DaemonSet that targets GPU nodes. Without this, your GPU node will come up but your app won’t be able to use the GPU.
Scale up when you need the GPU, scale down when done
1
2
3
4
5
6
7
8
# Spin up the GPU node
oc scale machineset ocpcluster-gpu-us-east-1a -n openshift-machine-api --replicas=1
# Wait for node to be Ready
oc get nodes -w
# When finished, terminate the instance (MachineSet definition is preserved)
oc scale machineset ocpcluster-gpu-us-east-1a -n openshift-machine-api --replicas=0
Scaling to 0 is the better habit over deleting the MachineSet — you keep the definition and can scale back up any time without rebuilding the YAML.
Summary of changes since previous posts
| What | Before | After |
|---|---|---|
| Control plane nodes | 3 | 1 |
| Worker nodes | 3 | 1 |
| Worker pricing | On-demand | Spot |
| Availability zones | 2 (us-east-1a, us-east-1b) | 1 (us-east-1a) |
| Subnets | 4 | 2 |
| Credentials mode | Mint (default) | Passthrough |
| Subnet tagging | Not required | Required |
| GPU nodes | N/A | On-demand via MachineSet scaling |
The result is a cluster that costs a fraction of the original setup to run, handles GPU testing without ongoing GPU spend, and works within the tighter IAM restrictions my org has put in place.