kubernetes: Race condition in Kubelet service
@kubernetes/sig-api-machinery-bugs
What happened:
Race condition in Kubelet causes Kubernetes cluster to become unstable due to high CPU load when two nodes share the same hostname or --hostname-override
value.
What you expected to happen:
Since node hostname or --hostname-override
value is used to compute the etcd key name (/registry/minions/[NODE-HOSTNAME]
), the system should either:
- prevent nodes with a duplicate hostname or
--hostname-override
value to join the cluster, - add a prefix/suffix to the etcd key name (e.g.
/registry/minions/[NODE-HOSTNAME]-[SUFFIX]
)
How to reproduce it (as minimally and precisely as possible):
-
Spawn 3 new servers
-
Setup Kubernetes on all 3 servers, according to the following table:
Server ID Hostname Cluster Role 0 k8s-master Master 1 k8s-node-1 Worker 2 k8s-node-2 Worker -
Start all 3 servers and initialize the cluster
-
Access the server w/ ID 1 and change its hostname to
k8s-master
-
Run the
kubeadm join
with appropriate flags and tokens.You should see on
stdout
the following messagesThis node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
-
Access the server w/ ID 0 and run
$ kubectl get nodes
. The output should have a single entry. -
Access the server w/ ID 2 and change its hostname to
k8s-master
-
Repeat steps 5 and 6
-
Keep monitoring load average on server w/ ID 0.
Anything else we need to know?:
Running kubeadm join
on nodes sharing the same hostname exits with a success exit code but kubectl get nodes
on the control-pane fails to list them (only one is reported). This is also valid while changing the hostname of a node that already belongs to the cluster (e.g. sudo hostnamectl set-hostname DUPLICATE_HOSTNAME
).
In such scenario, we can observe a severe CPU load increase on k8s master node, caused by concurrent updates to the etcd key /registry/minions/[NODE-HOSTNAME]
, which, in turn, generate several etcd events to be handled by k8s components.
Environment:
-
Kubernetes version (use
kubectl version
):Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:36:19Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
-
Cloud provider or hardware configuration:
# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 2 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 6 Model name: QEMU Virtual CPU version 2.5+ Stepping: 3 CPU MHz: 2394.454 BogoMIPS: 4788.90 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K L3 cache: 16384K NUMA node0 CPU(s): 0,1 Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl xtopology eagerfpu pni cx16 x2apic hypervisor lahf_lm
# free -h total used free shared buff/cache available Mem: 990M 552M 68M 14M 369M 255M Swap: 0B 0B 0B
# lshw -class network *-network description: Ethernet controller product: Virtio network device vendor: Red Hat, Inc. physical id: 3 bus info: pci@0000:00:03.0 version: 00 width: 64 bits clock: 33MHz capabilities: msix bus_master cap_list rom configuration: driver=virtio-pci latency=0 resources: irq:10 ioport:c0a0(size=32) memory:febd1000-febd1fff memory:fe000000-fe003fff memory:feb80000-febbffff *-virtio0 description: Ethernet interface physical id: 0 bus info: virtio@0 logical name: eth0 serial: 9e:a3:f5:c1:6e:36 capabilities: ethernet physical configuration: broadcast=yes driver=virtio_net driverversion=1.0.0 ip=192.168.122.81 link=yes multicast=yes
-
OS (e.g:
cat /etc/os-release
):NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"
-
Kernel (e.g.
uname -a
):Linux k8s-master 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
-
Install tools:
cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg EOF yum install kubeadm --nogpgcheck -y && \ systemctl restart kubelet && systemctl enable kubelet
Cheers, Paulo A. Silva
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 3
- Comments: 32 (18 by maintainers)
Hi @neolit123, Yes I did, and in fact as much more nodes with the same hostname you join to the cluster, faster the clusters becomes unstable.
In our original research, joinning two worker nodes with the same hostname to the cluster, led the cluster to crash after a few hours