cluster-api: Mutating a control plane KubeadmConfig triggers indefinite rolling upgrade

What steps did you take and what happened:

  1. Configure a mutating admission webhook that modifies the KubeadmConfig for control plane machines.
  2. Observe that control plane machines will indefinitely try to upgrade.

What did you expect to happen: Since matchInitOrJoinConfiguration checks against the entire KubeadmConfig Spec, any mutations applied for a KubeadmConfig (such as adding custom files or preKubeadmCommands), will trigger a rolling upgrade of all control plane machines. This behavior does not occur when mutating KubeadmConfig resources created from KubeadmConfigTemplate and likewise I would not expect this behavior when mutating a KubeadmConfig derived from a KubeadmControlPlane. To add another example, Kubernetes would not do a rolling upgrade of a Deployment if the Pods derived from that were mutated in any way (e.g. sidecars for service meshes). A rolling upgrade is only triggered when there is a change in the pod template.

A workaround can be to mutate KubeadmControlPlane in the same way KubeadmConfig is mutated, but if any mutations are dynamic or unique per machine, KubeadmControlPlane will continuously try to perform a rolling upgrade of the control plane machines since it expects KubeadmControlPlane and KubeadmConfig to match exactly.

matchInitOrJoinConfiguration should only match against fields that should trigger a rolling update instead of all of fields in KubeadmConfigSpec or more ideally, a hash of the last defined KubeadmConfig spec should be stored in KubeadmControlPlane to know when a rolling update should occur. This is the same mechanism used to determine whether a rolling update of a Deployment should occur.

Anything else you would like to add:

Environment:

  • Cluster-api version:
  • Minikube/KIND version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 17 (17 by maintainers)

Most upvoted comments

Agreed, I’ll file one

I can see both use cases, although I’m not sure if we can/should change the behavior now that v1alpha3 is going in maintenance mode, definitely open to suggestions

Fine with me, just giving my 2 cents that this was pretty unexpected. As @yastij mentioned above, this disallows any mutations that end up being specific to a machine.

as a reference, this issue https://github.com/kubernetes-sigs/cluster-api/issues/3170 triggered us moving away from an hash mechanism to the current approach