rancher: Worker node fails to register in a custom rke cluster

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible):

  • Deploy a custom cluster in 2.3.6 k8s - 1.17.4-rancher1-2
  • Upgrade rancher server to 2.4.3-rc2
  • Upgrade k8s version to 1.17.5
  • Add a worker node to the cluster
  • the node is stuck in registering state
  • Logs of share-mnt container on the node
Error: failed to start containers: kubelet
+ sleep 2
+ docker start kubelet
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet
+ sleep 2

Expected Result: Kubelet should be deployed on the node And the worker node should get registered.

Other details that may be helpful:

  • This was not reproducible on an addition of a worker node in a node driver AWS cluster.
  • Reproduced this issue on an addition of the Second worker node in an upgraded cluster in a single node install. This was right after rancher server upgrade. No k8s upgrade was done on the cluster. K8s on this cluster - v1.17.4

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): 2.3.6 to 2.4.3-rc2
  • Installation option (single install/HA): HA

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported): custom-rke
  • Kubernetes version (use kubectl version):
1.17.5
  • Docker version (use docker version):
(paste the output here)

gzrancher/rancher#10231

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 15 (12 by maintainers)

Most upvoted comments

I’m seeing this same issue on 2.4.2 and 1.17.4. It happens completely randomly on the provision of a new cluster and only on 2-3 nodes.