kubeadm: kubeadm join on control plane node failing: timeout waiting for etcd
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT or FEATURE REQUEST
Versions
kubeadm version (use kubeadm version):
kubeadm version: &version.Info{Major:“1”, Minor:“20”, GitVersion:“v1.20.5”, GitCommit:“6b1d87acf3c8253c123756b9e61dac642678305f”, GitTreeState:“clean”, BuildDate:“2021-03-18T01:08:27Z”, GoVersion:“go1.15.8”, Compiler:“gc”, Platform:“linux/amd64”}
Environment:
- Kubernetes version (use
kubectl version): Client Version: version.Info{Major:“1”, Minor:“20”, GitVersion:“v1.20.5”, GitCommit:“6b1d87acf3c8253c123756b9e61dac642678305f”, GitTreeState:“clean”, BuildDate:“2021-03-18T01:10:43Z”, GoVersion:“go1.15.8”, Compiler:“gc”, Platform:“linux/amd64”} - Cloud provider or hardware configuration: cluster-api / capz
- OS (e.g. from /etc/os-release): NAME=“Ubuntu” VERSION=“18.04.5 LTS (Bionic Beaver)” ID=ubuntu ID_LIKE=debian PRETTY_NAME=“Ubuntu 18.04.5 LTS” VERSION_ID=“18.04” HOME_URL=“https://www.ubuntu.com/” SUPPORT_URL=“https://help.ubuntu.com/” BUG_REPORT_URL=“https://bugs.launchpad.net/ubuntu/” PRIVACY_POLICY_URL=“https://www.ubuntu.com/legal/terms-and-policies/privacy-policy” VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic
- Kernel (e.g.
uname -a): Linux acse-test-capz-repro-c8cd6-control-plane-9kvrx 5.4.0-1041-azure #43~18.04.1-Ubuntu SMP Fri Feb 26 13:02:32 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux - Others: Cluster built using cluster-api from this capz example template:
tl;dr 3 control plane nodes, 1 node pool w/ 1 worker node
What happened?
- The first control plane node comes online and Ready (kubeadm init)
- The 2nd control plane node bootstraps but never comes online/Ready (kubeadm join)
From the cloud-init logs, kubeadm tells us that it timed out waiting for etcd:
[2021-04-16 22:09:39] [etcd] Announced new etcd member joining to the existing etcd cluster
[2021-04-16 22:09:39] [etcd] Creating static Pod manifest for "etcd"
[2021-04-16 22:09:39] [etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[2021-04-16 22:10:12] [kubelet-check] Initial timeout of 40s passed.
[2021-04-16 22:42:38] error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
What you expected to happen?
This does not repro in other Kubernetes versions. I’ve tested 1.19.7 specifically. I expected 1.20.5 to bootstrap as 1.19.7 does.
How to reproduce it (as minimally and precisely as possible)?
I have a repro script:
https://github.com/jackfrancis/cluster-api-provider-azure/blob/repro/repro.sh
Anything else we need to know?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 20 (10 by maintainers)
@neolit123 I see your point that we want to reduce the likelihood of kubelet race conditions
In the meantime we will continue to investigate how to produce a working 1.20+ kubeadm solution for folks.
I’ll follow the issue you linked and close this one for now, thanks!