cluster-api: Unexpected behaviour using kubectl scale with a topology managed cluster
When using kubectl scale machinedeployment
or kubectl scale kcp
with a topology managed cluster the command succeeds but does not have the expected result. Depending on how quickly the topology controller reacts to the change the resource either doesn’t appear to scale at all or else it scales up and then scales down quickly (in line with replicas specified in the cluster topology).
There’s two options to deal with this IMO:
- Intercept the scale request and send it to the Cluster Topology instead.
- Block a scale request when the scalable resource is topology managed and replicas is set.
Blocking the request is the most feasible IMO as intercepting the request would be control plane specific i.e. we could implement it on KCP but it would have to be done in all other control plane providers. It would also mean adding a client to all scalable resources, and it subverts some expectations about how the API should work. When blocking we can return a useful error message explaining how to make a resource independently scalable i.e. unsetting replicas.
I haven’t done a POC, but I think it would be feasible to both block changing the replicas field and block access to the scale subresource. This would have an impact on autoscaling, with the autoscaler returning a useful error when trying to improperly change a managed cluster’s scalable resource rather than having the scale operation be apparently successful, but resulting in no spec change.
This is also, presumably, an issue when a MachineSet is managed by a MachineDeployment, but scaling an MS individually is probably a less established pattern.
/area topology /cc @jackfrancis @sbueringer
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 19 (18 by maintainers)
Just had a look - it looks like the Cluster Autoscaler uses https://pkg.go.dev/k8s.io/client-go/scale which works on the scale subresource. But we can’t be sure this is the case for other autoscaler implementations of course.
I think if we implement either of the solutions it would have to work for both the scale subresource and for updating the replicas field.
I’m definitely not up for proposing it 😆 - just expressing the level of granularity this would need because of the multiple types of scalable resource and the multiple actual resource in the Cluster Topology.
correct, they are required and must be on only the MD or MS.
Hey folks, great dicussion, can you help understand and try to scope down what’s the concrete issue / use case here?
To me, the above describes expected and reasonable behaviour. If you express intent for a resource that is managed by another entity (clusterClass in this case), they will fight. That’s kube by design, entities being clusterClass, kubectl, autoscaler or anything else is circumstantial here. This is basically a tradeoff of using clusterClass that should be documented, as a user you have deliberately chosen to use an authority for managing the underlying resources. Now, if we consider this insufficient/poor UX I think we should focus on making our semantics/primitives more meaningful and flexible, I don’t think we should be implementing non standard kubectl logic or non standard API behaviour (I’d find that to be truly unexpected behaviour).
But then, based on @sbueringer comment it seems it is already the case that we support this more flexible use case and that the undesired UX @killianmuldoon describes that originated this issue has already a very reasonable alternative, i.e.
So all seems fairly reasonable to me as it is (pending fixing the limitation on annotations Stefan mentioned). Now, there’s some discussion around wanting a UX to use the Cluster as a scalable resource that signal to the underlying scalable resources: Can we describe use cases for this? In my mind, you’ll usually want to scale a particular scalable resource which would allocate a particular group of workloads (This is true for autoscaler as well, it knows about homogenous nodeGoups). This is possible today already by using scale sub resources over exiting scalable resources MD/MS/MP (you just don’t set a replica field in Cluster.spec.topology.workers) or expressing intent in clusterClass as the source of truth. Thinking of cluster as a first class scalable resource feels artificial to me.
I would definitely prefer not adding the necessary replica fields to Cluster to be able to make
kubectl scale
workThe first MachineDeployment is probably the 80 percent case but IMO it’s not a great solution to this problem. It’s not extensible to cover any other use case which makes it confusing and frustrating for users. It introduces inconsistency into the management of Clusters, and in the API which could cause further complications down the line.
I think blocking the request is probably a better solution, even if it means losing kubectl scale.
Wrt to the auto scaler, AFAIK we are not setting replicas in the cluster topology, so everything should work as usual