cluster-api: [KCP] Fails to upgrade single node control plane
What steps did you take and what happened: On a single-node control-plane, I upgraded kubernetes version, etcd and CoreDNS image tags by modifying KCP object. New node with upgraded Kubernetes version is created, old node is physically deleted but old node object and all pods that was on the old pod are dangling.
root@cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr:/# kubectl get nodes -A
NAME STATUS ROLES AGE VERSION
cluster-tvnb5a-cluster-tvnb5a-control-plane-9jqkj NotReady master 7m11s v1.17.0
cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr Ready master 6m20s v1.17.2
cluster-tvnb5a-cluster-tvnb5a-md-0-66496845f6-kvh9v Ready <none> 6m51s v1.17.0
root@cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr:/# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6955765f44-dpwvr 1/1 Terminating 0 7m12s
kube-system coredns-6955765f44-g4df6 1/1 Terminating 0 7m12s
kube-system coredns-7987b8d68f-f2d78 1/1 Running 0 4m24s
kube-system coredns-7987b8d68f-rftvl 1/1 Running 0 4m23s
kube-system etcd-cluster-tvnb5a-cluster-tvnb5a-control-plane-9jqkj 0/1 CrashLoopBackOff 3 7m16s
kube-system etcd-cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr 1/1 Running 0 6m19s
kube-system kindnet-s6rvq 1/1 Running 0 7m8s
kube-system kindnet-wmxqx 1/1 Running 0 6m37s
kube-system kindnet-xvdks 1/1 Running 0 7m12s
kube-system kube-apiserver-cluster-tvnb5a-cluster-tvnb5a-control-plane-9jqkj 1/1 Running 0 7m16s
kube-system kube-apiserver-cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr 1/1 Running 0 6m36s
kube-system kube-controller-manager-cluster-tvnb5a-cluster-tvnb5a-control-plane-9jqkj 1/1 Running 1 7m16s
kube-system kube-controller-manager-cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr 1/1 Running 0 6m35s
kube-system kube-proxy-gfghs 1/1 Running 0 4m22s
kube-system kube-proxy-lj9lv 1/1 Terminating 0 7m12s
kube-system kube-proxy-qlgpw 1/1 Running 0 3m57s
kube-system kube-scheduler-cluster-tvnb5a-cluster-tvnb5a-control-plane-9jqkj 1/1 Running 1 7m16s
kube-system kube-scheduler-cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr 1/1 Running 0 6m35s
root@cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr:/#
What did you expect to happen: I expected KCP to remove node object and cleanup resources that was on that node.
/kind bug /area control-plane
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 26 (12 by maintainers)
I have seen that consistently in metal3-dev-env. In the upgrade of K8S version process, both during scale up and scale down. In order to reach the scale down phase.
@sedefsavas Do you have any update on this issue or some time line ?
Can you run a test locally with a custom build after adding a check the NodeRef is there for the leaderCandidate and see if that fixes it?
@vincepri @detiber Sorry for the confusion. I am not scaling down, I changed the hardcoded 3 control plane replica number in the test to 1.