cluster-api: KubeadmConfig changes should be reconciled for machine pools, triggering instance recreation

What would you like to be added (User Story)?

As operator, I want KubeadmConfig.spec.{files,preKubeadmCommands,...} changes to have an effect on MachinePool-creates nodes, resulting in server instance recreation.

Detailed Description

Discussed in office hours 2023-06-14 (notes, main points copied into this issue below).

Situation: MachinePool manifest references AWSMachinePool and KubeadmConfig (a very regular machine pool config)

Expectation: Changing KubeadmConfig.spec.* should lead to recreating (“rolling”) nodes. With infra provider CAPA, nothing happens at the moment. Here’s why.

Problem 1: CAPI’s KubeadmConfigReconciler does not immediately update the bootstrap secret once KubeadmConfig.spec changes, but only once it rotates the bootstrap token (purpose: new machine pool-created nodes can join the cluster later on). This means several minutes of waiting for reconciliation.
- Suggestion: Simple bug fix. @AndiDog has a draft implementation that always considers updating the secret, not only if the token must be refreshed. In the meantime, users can work around by creating a new KubeadmConfig object.
Problem 2: CAPA (and likely all other infra providers) does not watch the bootstrap secret, so it cannot immediately react to KubeadmConfig.spec changes either.
- @AndiDog Should it even directly watch the secret? What should the CAPI ⇔ CAPx contract be?
- @fabriziopandini: Watching secrets can blow up memory [of the kubectl client]. Think of the UX and possible solutions first.
- @CecileRobertMichon: Maybe change MachinePool.spec.template.spec.bootstrap.dataSecretName every time because that triggers reconciliation for the MachinePool object (machinepool_controller_phases.go code).
- @sbueringer: For MachinePool support in ClusterClass we have to decide what the “ideal” way to rollout BootstrapConfig is
Problem 3: The bootstrap secret contains both the “how to set up this server init data” (e.g. cloud-init / ignition) and the random bootstrap token by which nodes join the cluster. If only the token gets refreshed (DefaultTokenTTL is 15 minutes), we don’t want nodes to be recreated, since that would recreate all nodes every few minutes.
- @AndiDog CAPA does not trigger a node rollout (AWS ASG instance refresh) right now (https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/4071) because it has no logic to detect when the bootstrap config changed. How can CAPA (or another infra provider) tell that something apart from the bootstrap token has changed? We have a CAPA PR (https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/4245) that checks “is there a difference apart from only the bootstrap token value” but that is very hacky and format/provider-specific. Maybe a “checksum without bootstrap token” provided by CAPI could help the infra provider?
- @CecileRobertMichon: Split bootstrap token vs. other init data?

Anything else you would like to add?

CAPA-specific issue: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/4071. The problem applies to all infra providers, though.
This issue asks for mutable KubeadmConfig, while another proposal wants to make it immutable even for machine pool (breaking change): https://github.com/kubernetes-sigs/cluster-api/issues/4910.

Label(s) to be applied

/kind feature /area bootstrap /area machinepool

About this issue

Original URL
State: open
Created a year ago
Reactions: 1
Comments: 24 (24 by maintainers)

Most upvoted comments

One idea was confirmed as reasonable by several people in the office hours: consider changing the bootstrap object reference as desire to roll out. I had the doubt that both the old and new bootstrap config (with different names) could produce the same output and the user could be surprised that all nodes get replaced. But then again, @CecileRobertMichon and others rightfully commented “Why would you create a new one with the same content?” since that seems unusual.

Next, I will try this solution and see whether other opinions appear as to how this can be solved. And @vincepri suggested factoring out a new issue for splitting off the random, long-lived bootstrap token from KubeadmConfig, since it’s intermixed here but really a separate problem. I’ll create an issue for that.

AndiDog on Oct 25, 2023

Thanks for tagging me @CecileRobertMichon. I’ll try to add my thoughts in general as well as in the context of the WIP proposal for addressing #5294.

Terminology disamiguation

First, I think we have to update our terminology to clearly distinguish between two related yet distinct concerns:

Running a “bootstrapper” such as kubeadm to bootstrap a server (namely: turn the server into a k8s node). Let’s call this “bootstrap” (we already do so in the glossary).
Interpreting a cloud-init or Ignition blob and customizing a server’s OS. Let’s call this “provisioning”. This is a new term and I’m happy to replace it with another as long as it’s distinct from “bootstrap” and doesn’t lead to more confusion.

Note that the bootstrap process is based on top of provisioning: We currently use cloud-init or Ignition in order to convey kubeadm configuration to a machine as well as to execute the shell commands which trigger the kubeadm init/join process. However, provisioning isn’t the same as bootstrap because provisioning can also contain OS-level customizations which aren’t related to bootstrap (e.g. tweaking a sysctl parameter). Furthermore, even if provisioning contained only bootstrap-related configuration, IMO we’d still need the separate terminology because kubeadm configuration isn’t the same as cloud-init or Ignition configuration and it’s important to distinguish between the two given the different stages performed in order to get from a declarative kubeadm config to a VM with a running kubelet.

My thoughts

With that said, here are my thoughts:

Machine recreation on bootstrap config change

Following what @fabriziopandini said, I think that before we figure out how to recreate machines following a change to the bootstrap config, we should decide whether we want to allow doing so in the first place. Personally, I don’t see why not but I don’t have a strong opinion here. Regardless of what gets decided, I agree that this should either be supported and working or explicitly unsupported and therefore documented and rejected by a webhook. In any case, I think we can agree that “supported by accident” isn’t something we want.

Multiple checksums

I don’t think we want to maintain two bootstrap config checksums for the following reasons:

We should avoid baking kubeadm-specific assumptions into core CAPI (or at least avoid adding more such assumptions on top of existing ones). AFAIK the project aims to be extensible and support more bootstrap mechanisms in the future.
The need to do a partial hash suggests to me that two unrelated concerns may have been mixed in a single data structure. I’d rather fix a design problem than add new complexity to work around it.

I’d have to dig deeper into the codebase to get a clearer understanding of the problem, but I’m guessing the thing which shouldn’t be hashed (bootstrap token) should have resided elsewhere if it’s not a part of { things we watch when deciding whether to replace a machine }. I don’t have an alternative suggestion at the moment, but chances are we’ll have to address this concern as a byproduct of working on #5294 anyway.

CAPI <> CAPx contract

Great point about the CAPI <> CAPx contract @AndiDog. This should be a core piece of the #5294 work and I’ve already started thinking about it. I’ll try to copy you in relevant discussions around the proposal process.

We already have https://cluster-api.sigs.k8s.io/developer/providers/machine-infrastructure which might be relevant but I need to dig deeper and see if we have an entire contract missing or we just need to update an existing one.

High-level description of #5294

The #5294 proposal is still WIP, but very briefly here is what I envision at the moment:

The user specifies their desired cluster config declaratively. The cluster config includes a selected bootstrapper (e.g. kubeadm) but any bootstrapper-specific knobs are hidden from the user to shield the user from changes to the bootstrapper config schema (that is, if kubeadm changes its API, there is no reason to break the CAPI UX).
A controller (need to define which) reads the cluster config and generates a bootstrap config, e.g. a kubeadm config document.
A controller (need to define which) uses the bootstrap config document to generate a provisioning config for the relevant provisioner (e.g. cloud-init) while potentially merging the generated config with any user-specified customizations which aren’t related to the bootstrapper. This allows CAPI to be opinionated about the bootstrap process while still allowing the user to perform arbitrary OS customizations without abusing the bootstrap mechanism.
An infra provider watches the final provisioning config and injects it to VMs using provider-specific mechanisms (e.g. AWS user data).

The proposal needs a lot more fleshing out so I don’t know about a timeline. I’m still gathering requirements all over the place since #5294 touches a very central area of the codebase which affects a lot of use cases. My hope is that we can cleanly separate the effort into manageable tasks and focus on a narrow change set at a given time without losing the bigger picture. Maybe we should break #5294 down to a couple of smaller proposals - we’ll see.

Infra providers watching KubeadmConfig

Following from the above, I don’t think infra providers should watch KubeadmConfig because I think it goes against the separation of concerns we want to eventually achieve. Infra providers should watch a location with bootstrapper-agnostic provisioning config (e.g. a cloud-config document) and pass that config to machines.

Any comments on the above are welcome. Sorry about the 📜🙂

johananl on Sep 21, 2023

Frankly speaking, what I found confusing in the UX is that we are changing an external reference in place, and this is inconsistent with everything else in CAPI. Additionally, the reference to a bootstrap object is defined inside a machine template, while e.g. in MD/MS we have a similar API modeling but we use a bootstrap template instead of a bootstrap object.

However, IMO while designing a solution for problem 1, I think we should aim for a solution where changes to external objects should surface to the parent objects, no matter if user-driven (e.g rotating template references) or machine-driven (by changing the data secret name).

This approach - changing the parent object - is consistent with everything else in CAPI and provides a clean pattern for all the providers, and IMO this could address problem 2.

WRT to problem 3, It seems to me that what we are saying is going in the direction of each machine having its own bootstrap token so machine provisioning is not affected by changed to concurrent changes to the bootstrap object, and this could be somehow overlapping with @Jont828 work (see eg. https://github.com/kubernetes-sigs/cluster-api/pull/8828#discussion_r1228631689)

Last note, what we are deciding

fabriziopandini on Jun 20, 2023

@AndiDog In #8842, we’re adding MachinePool Machines to DockerMachinePools. This would affect the ClusterClass rollout e2e test as we wouldn’t be able to rollout changes to the KubeadmConfig to the MachinePool Machines, so I’ve made a change to exclude them for now. But in the future, we’d want to revert this change. Tagging @willie-yao as well.

Jont828 on Nov 14, 2023

One additional thought: we might want to think about how to deal with “orphan” configs (for example, avoid rotating the KubeadmConfig bootstrap token if the config isn’t actually in use by a MachinePool)

CecileRobertMichon on Oct 13, 2023

We discussed this in a meeting (@CecileRobertMichon, @fabriziopandini, @vincepri) and decided to try and support the reference rotation (changed MachinePool.spec.template.spec.bootstrap.configRef.name) as described in https://github.com/kubernetes-sigs/cluster-api/issues/8858#issuecomment-1737930898. Other solutions didn’t seem so natural, or are more complex. I’ll give it a shot and report back.

AndiDog on Oct 13, 2023

https://github.com/kubernetes-sigs/cluster-api/issues/5294 is a discussion way bigger than this specific issue.

That means that if we put this in the critical path most probably it will delay the solution to this specific problem and move off the table more tactical alternatives we were considering till now like https://github.com/kubernetes-sigs/cluster-api/issues/8858#issuecomment-1719109202 or the idea I was mulling about in https://github.com/kubernetes-sigs/cluster-api/issues/8858#issuecomment-1719700381 about leveraging on object rotation.

I just want to make sure everyone agrees on the implications, starting from @AndiDog who reported this issue.

fabriziopandini on Sep 25, 2023

Here is the current bootstrap provider contract: https://cluster-api.sigs.k8s.io/developer/providers/bootstrap

If we add the field to MachinePoolStatus based on Boostrap config status field we would need to either make the status field part of the boostrap status contract or make it opt-in/optional so it doesn’t break other bootstrap providers that don’t implement the field.

cc @johananl @richardcase who are working on https://github.com/kubernetes-sigs/cluster-api/issues/5294 which might help simplify this

CecileRobertMichon on Sep 13, 2023

Does the Watches mean it reacts to changes in MachinePool.{spec,status} even if no AWSMachinePool object changed

yes, that’s correct

CecileRobertMichon on Sep 13, 2023

But then the providers need to watch KubeadmConfig

This is a dealbreaker if we’re talking about watching specific Kubeadm* types. Infra providers should be bootstrap provider agnostic and work with all bootstrap providers and control providers as kubeadm is not the only one out there. If we’re trying to watch the bootstrap reference that would be fine as we can do that without referencing kubeadmconfig specifically. In that case, status.dataSecretPartialChecksum would need to become part of the bootstrap provider contract and every bootstrap provider would need to implement it.

CecileRobertMichon on Sep 12, 2023