kubernetes: CoreDNS does not tolerate node.cloudprovider.kubernetes.io/uninitialized

What happened:

Clusters built with an external cloud-provider which needs to resolve DNS before it can remove the node.cloudprovider.kubernetes.io/uninitialized taints from nodes, get stuck in a catch-22 state.

CoreDNS cannot start because it does not tolerate node.cloudprovider.kubernetes.io/uninitialized: true…

kubectl --kubeconfig /etc/kubernetes/admin.conf describe  pods -n kube-system coredns-84b6ddc6c6-6vlrz|grep FailedScheduling
  Warning  FailedScheduling  <unknown>  default-scheduler  0/10 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/gateway: true}, that the pod didn't tolerate, 8 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.

I am using https://github.com/kubernetes/cloud-provider-vsphere as my cloud-provider. This needs to be able to reach a vcenter server before it can remove the node.cloudprovider.kubernetes.io/uninitialized taint from nodes. However it cannot resolve the vcenter DNS entry because CoreDNS cannot start.

kubectl --kubeconfig /etc/kubernetes/admin.conf -n kube-system logs -f vsphere-cpi-5kvzj
E0729 13:32:30.313559       1 connectionmanager.go:148] Cannot connect to vCenter with err: Post https://vcenter.local:443/sdk: dial tcp: i/o timeout

[ root@debug-container:/ ]$ nslookup vcenter.local
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'vcenter.local'

What you expected to happen: I expected to be able to bootstrap a cluster including control plane kubelets using --cloud-provider=external.

How to reproduce it (as minimally and precisely as possible):

Create a cluster using kubeadm with cloud-provider=external. Note that coredns cannot get scheduled.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.18.6
  • Cloud provider or hardware configuration: https://github.com/kubernetes/cloud-provider-vsphere
  • OS (e.g: cat /etc/os-release): RHEL7
  • Kernel (e.g. uname -a): 3.10.0-1062.9.1.el7.x86_6
  • Install tools: Kubeadm
  • Network plugin and version (if this is a network-related bug): Calico 3.14
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 20 (12 by maintainers)

Most upvoted comments

thanks, i will be watching if we see similar requests from users and reconsider adding the toleration.

/close

Will the k8s API be available from a node with the node.cloudprovider.kubernetes.io/uninitialized taint? If not, then even if the CoreDNS deployment tolerates the taint, the pods would remain unready.

Yes.

kube-apiserver has

tolerations:
  - effect: NoExecute
    operator: Exists

Additionally my Calico CNI has also been configured to tolerate node.cloudprovider.kubernetes.io/uninitialized.

I think @frapposelli may be able to hopefully point you in the right direction around the cloud-provider, if not I’d try asking in provider-vsphere on the Kubernetes slack?