rancher: Worker node fails to register in a custom rke cluster

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible):

Deploy a custom cluster in 2.3.6 k8s - 1.17.4-rancher1-2
Upgrade rancher server to 2.4.3-rc2
Upgrade k8s version to 1.17.5
Add a worker node to the cluster
the node is stuck in registering state
Logs of share-mnt container on the node

Error: failed to start containers: kubelet
+ sleep 2
+ docker start kubelet
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet
+ sleep 2

Expected Result: Kubelet should be deployed on the node And the worker node should get registered.

Other details that may be helpful:

This was not reproducible on an addition of a worker node in a node driver AWS cluster.
Reproduced this issue on an addition of the Second worker node in an upgraded cluster in a single node install. This was right after rancher server upgrade. No k8s upgrade was done on the cluster. K8s on this cluster - v1.17.4

Environment information

Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): 2.3.6 to 2.4.3-rc2
Installation option (single install/HA): HA

Cluster information

Cluster type (Hosted/Infrastructure Provider/Custom/Imported): custom-rke
Kubernetes version (use kubectl version):

1.17.5

Docker version (use docker version):

(paste the output here)

gzrancher/rancher#10231

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 1
Comments: 15 (12 by maintainers)

Most upvoted comments

I’m seeing this same issue on 2.4.2 and 1.17.4. It happens completely randomly on the provision of a new cluster and only on 2-3 nodes.

mcmcghee on Apr 24, 2020