rancher: Worker node fails to register in a custom rke cluster
What kind of request is this (question/bug/enhancement/feature request): bug
Steps to reproduce (least amount of steps as possible):
- Deploy a custom cluster in 2.3.6 k8s - 1.17.4-rancher1-2
- Upgrade rancher server to 2.4.3-rc2
- Upgrade k8s version to 1.17.5
- Add a worker node to the cluster
- the node is stuck in registering state
- Logs of
share-mntcontainer on the node
Error: failed to start containers: kubelet
+ sleep 2
+ docker start kubelet
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet
+ sleep 2
Expected Result: Kubelet should be deployed on the node And the worker node should get registered.
Other details that may be helpful:
- This was not reproducible on an addition of a worker node in a node driver AWS cluster.
- Reproduced this issue on an addition of the Second worker node in an upgraded cluster in a single node install. This was right after rancher server upgrade. No k8s upgrade was done on the cluster. K8s on this cluster - v1.17.4
Environment information
- Rancher version (
rancher/rancher/rancher/serverimage tag or shown bottom left in the UI): 2.3.6 to 2.4.3-rc2 - Installation option (single install/HA): HA
Cluster information
- Cluster type (Hosted/Infrastructure Provider/Custom/Imported): custom-rke
- Kubernetes version (use
kubectl version):
1.17.5
- Docker version (use
docker version):
(paste the output here)
gzrancher/rancher#10231
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 15 (12 by maintainers)
I’m seeing this same issue on 2.4.2 and 1.17.4. It happens completely randomly on the provision of a new cluster and only on 2-3 nodes.