cluster-api: node.cluster.x-k8s.io/uninitialized cause race condition when creating cluster.

What steps did you take and what happened?

Hi this new label node.cluster.x-k8s.io/uninitialized will cause clusters creation to fail for out of tree cloud-providers for example cloud-provider-vsphere. Since cloud-providers only tolerates existing k8s tolerations, Example here: https://github.com/kubernetes/cloud-provider-vsphere/blob/master/releases/v1.26/vsphere-cloud-controller-manager.yaml#L218-L230

CPI is crucial to initialize the node, setting provideID and externalIP on the node. Now node will stuck in uninitialized state, because CPI can’t be deployed because of the toleration. And CAPI needs the providerID on the node to find specific node. Since can’t find the providerID of the node, so it will keep erroring out, and won’t remove the tolerations node.cluster.x-k8s.io=uninitialized:NoSchedule.

This is a breaking change that requires all cloud-providers to adopt this tolerations

What did you expect to happen?

Cluster creation suceeds

Cluster API version

Use CAPI 1.4.0-rc1

Kubernetes version

1.25

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 18 (14 by maintainers)

Most upvoted comments

We are going ahead with option 2. This fix will be cherry-picked to release-1.4 and will be part of v1.4.0.

@lubronzhan We have discovered that this is an issue with SSA not being able to apply a patch to labels when there is a duplicate field. This issue is tracked here: #8417

@CecileRobertMichon @willie-yao @lubronzhan it will be great if you could validate the fix that we merged on Friday…

We are running into this in the CAPZ PR to bump CAPI to v1.4.0-rc-0: https://kubernetes.slack.com/archives/CEX9HENG7/p1679689897005289?thread_ts=1679521084.692349&cid=CEX9HENG7

The symptom: Calico CNI pods are failing to schedule, failing with Warning FailedScheduling 2m36s default-scheduler 0/4 nodes are available: 4 node(s) had untolerated taint {node.cluster.x-k8s.io/uninitialized: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/3298/pull-cluster-api-provider-azure-e2e/1639072622090129408/artifacts/clusters/capz-e2e-4cl6mc-ipv6/calico-system/calico-kube-controllers-f7574cc46-cvvkp/pod-describe.txt

cc @willie-yao

I think that as long as we’re documenting that there’s cases where if you’re using inequality based selection based on some label syncing to CPs, you could still end up with pods landing on CPs, we should be fine going with option 2.

we also might want to broadcast to providers a change required for the next CAPI minor release to add the toleration. This should give enough soak time for folks to adapt and update their manifests