kubeadm: When running a single control plane node cluster, `kubeadm upgrade` hangs after printing `[addons] Applied essential addon: CoreDNS`
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version (use kubeadm version
): 1.15.3-00 and 1.16.7-00
Environment:
- Kubernetes version (use
kubectl version
): Already upgraded past what it was, sorry - Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release): Ubuntu 16.04.6 LTS
- Kernel (e.g.
uname -a
): Linux debug-6 4.4.0-169-generic #198-Ubuntu SMP Tue Nov 12 10:38:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux - Others:
What happened?
Upgraded from 1.15 to 1.16 and 1.16 to 1.17 as per instructions in https://v1-16.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ and https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/.
The upgrade hung during step (4) of https://v1-16.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#upgrading-control-plane-nodes. The last thing kubeadm upgrade printing
[addons] Applied essential addon: CoreDNS`. I let it sit there for 30 minutes and then skipped ahead to step (6) and uncordoned the control plane node. After uncordoning the node, it completed within a few minutes.
What you expected to happen?
I expected it to complete within half an hour.
How to reproduce it (as minimally and precisely as possible)?
I suspect if you run a single control plane cluster and follow the control plane upgrade steps, it’ll repro. It looks like it will wait until coredns is ready. This will never happen because the single node in the cluster is drained, as per step (2) in https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#upgrade-the-first-control-plane-node. So it waits until the timeout, which seems to initially be 5 seconds with exponential backoff up to 10 times or so. I think in aggregate that’s over an hour, which would explain why it seemed to me to hang for half an hour.
Anything else we need to know?
We’re running a single control plane cluster as per https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/, which seems to be the primary culprit here.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 7
- Comments: 43 (20 by maintainers)
I am also getting the error, When running a single control plane cluster,
kubeadm upgrade
hangs after printing[addons] Applied essential addon: CoreDNS
.I am trying to upgrade from v1.17.2 to v1.17.4.Drained all nodes and upgrade each individually was successful
I reproduced this using a single node k8s (one node serves as both control plane and worker) when upgrading from v1.15.0 to v1.16.12. I applied the following command to apply upgrade:
apt-get update && \
apt-get install -y --allow-change-held-packages kubeadm=1.16.12-00
kubeadm upgrade plan
kubectl drain test-node --ignore-daemonsets --delete-local-data
kubeadm upgrade apply v1.16.12
After applyting the upgrade apply command, CLI stucked on “[addons] Applied essential addon: CoreDNS”.
Using the upgrade apply command with argument
-v=10
indicates kubeadm is waiting a CoreDNS pod to start. However this is a single node k8s, the only node in it is already drained, therefore no CoreDNS can be started. I wonder if this is the root cause of the issue.I did an experiment to prove my assumption: when the upgrade process stucks on “[addons] Applied essential addon: CoreDNS”, I uncordoned the only node in k8s, then the upgrade process finished successfully.
Not sure why this is marked as fixed since it actually requires
ALL
nodes to be drained not control plane only. That’s what hangs process on single node control plane. I have worker node still schedulable and hence I guess original fix was never run. This fixed it https://github.com/kubernetes/kubeadm/issues/2035#issuecomment-607330985