cluster-api: Unexpected behaviour using kubectl scale with a topology managed cluster

When using kubectl scale machinedeployment or kubectl scale kcp with a topology managed cluster the command succeeds but does not have the expected result. Depending on how quickly the topology controller reacts to the change the resource either doesn’t appear to scale at all or else it scales up and then scales down quickly (in line with replicas specified in the cluster topology).

There’s two options to deal with this IMO:

Intercept the scale request and send it to the Cluster Topology instead.
Block a scale request when the scalable resource is topology managed and replicas is set.

Blocking the request is the most feasible IMO as intercepting the request would be control plane specific i.e. we could implement it on KCP but it would have to be done in all other control plane providers. It would also mean adding a client to all scalable resources, and it subverts some expectations about how the API should work. When blocking we can return a useful error message explaining how to make a resource independently scalable i.e. unsetting replicas.

I haven’t done a POC, but I think it would be feasible to both block changing the replicas field and block access to the scale subresource. This would have an impact on autoscaling, with the autoscaler returning a useful error when trying to improperly change a managed cluster’s scalable resource rather than having the scale operation be apparently successful, but resulting in no spec change.

This is also, presumably, an issue when a MachineSet is managed by a MachineDeployment, but scaling an MS individually is probably a less established pattern.

/area topology /cc @jackfrancis @sbueringer

About this issue

Original URL
State: open
Created 2 years ago
Comments: 19 (18 by maintainers)

Most upvoted comments

Just had a look - it looks like the Cluster Autoscaler uses https://pkg.go.dev/k8s.io/client-go/scale which works on the scale subresource. But we can’t be sure this is the case for other autoscaler implementations of course.

killianmuldoon on Oct 12, 2022

Additionally, I have no idea if the autoscaler is using update/patch or the scale subresource.

I think if we implement either of the solutions it would have to work for both the scale subresource and for updating the replicas field.

This CLI syntax 100% requires a KEP for kubectl as it’s not supported today.

I’m definitely not up for proposing it 😆 - just expressing the level of granularity this would need because of the multiple types of scalable resource and the multiple actual resource in the Cluster Topology.

killianmuldoon on Oct 12, 2022

We have a limitation regarding ClusterClass and auto scaler, but that is that today we can’t set top-level annotations on MDs via Cluster.spec.topology to “fine-tune” the autoscaler. (@elmiko correct me if I’m wrong and the min/max annotations (I don’t remember which) are mandatory, if they are, then autoscaler really doesn’t work right now with ClusterClass until we implemented the annotation propagation)

correct, they are required and must be on only the MD or MS.

elmiko on Oct 20, 2022

Hey folks, great dicussion, can you help understand and try to scope down what’s the concrete issue / use case here?

Unexpected behaviour using kubectl scale with a topology managed cluster When using kubectl scale machinedeployment or kubectl scale kcp with a topology managed cluster the command succeeds but does not have the expected result. Depending on how quickly the topology controller reacts to the change the resource either doesn’t appear to scale at all or else it scales up and then scales down quickly (in line with replicas specified in the cluster topology).

To me, the above describes expected and reasonable behaviour. If you express intent for a resource that is managed by another entity (clusterClass in this case), they will fight. That’s kube by design, entities being clusterClass, kubectl, autoscaler or anything else is circumstantial here. This is basically a tradeoff of using clusterClass that should be documented, as a user you have deliberately chosen to use an authority for managing the underlying resources. Now, if we consider this insufficient/poor UX I think we should focus on making our semantics/primitives more meaningful and flexible, I don’t think we should be implementing non standard kubectl logic or non standard API behaviour (I’d find that to be truly unexpected behaviour).

But then, based on @sbueringer comment it seems it is already the case that we support this more flexible use case and that the undesired UX @killianmuldoon describes that originated this issue has already a very reasonable alternative, i.e.

you just don’t set a replica field in Cluster.spec.topology.workers… The topology controller will create the MD without specifying replicas. Defaulting will set replicas to 1

So all seems fairly reasonable to me as it is (pending fixing the limitation on annotations Stefan mentioned). Now, there’s some discussion around wanting a UX to use the Cluster as a scalable resource that signal to the underlying scalable resources: Can we describe use cases for this? In my mind, you’ll usually want to scale a particular scalable resource which would allocate a particular group of workloads (This is true for autoscaler as well, it knows about homogenous nodeGoups). This is possible today already by using scale sub resources over exiting scalable resources MD/MS/MP (you just don’t set a replica field in Cluster.spec.topology.workers) or expressing intent in clusterClass as the source of truth. Thinking of cluster as a first class scalable resource feels artificial to me.

enxebre on Oct 18, 2022

I would definitely prefer not adding the necessary replica fields to Cluster to be able to make kubectl scale work

sbueringer on Oct 17, 2022

The first MachineDeployment is probably the 80 percent case but IMO it’s not a great solution to this problem. It’s not extensible to cover any other use case which makes it confusing and frustrating for users. It introduces inconsistency into the management of Clusters, and in the API which could cause further complications down the line.

I think blocking the request is probably a better solution, even if it means losing kubectl scale.

killianmuldoon on Oct 17, 2022

Wrt to the auto scaler, AFAIK we are not setting replicas in the cluster topology, so everything should work as usual

fabriziopandini on Oct 14, 2022