kubernetes: Race condition in Kubelet service

What happened:

Race condition in Kubelet causes Kubernetes cluster to become unstable due to high CPU load when two nodes share the same hostname or --hostname-override value.

What you expected to happen:

Since node hostname or --hostname-override value is used to compute the etcd key name (/registry/minions/[NODE-HOSTNAME]), the system should either:

prevent nodes with a duplicate hostname or --hostname-override value to join the cluster,
add a prefix/suffix to the etcd key name (e.g. /registry/minions/[NODE-HOSTNAME]-[SUFFIX])

How to reproduce it (as minimally and precisely as possible):

Spawn 3 new servers
Setup Kubernetes on all 3 servers, according to the following table:

Server ID Hostname Cluster Role

0 k8s-master Master

1 k8s-node-1 Worker

2 k8s-node-2 Worker
Start all 3 servers and initialize the cluster
Access the server w/ ID 1 and change its hostname to k8s-master

Server ID	Hostname	Cluster Role
0	k8s-master	Master
1	k8s-node-1	Worker
2	k8s-node-2	Worker

Run the kubeadm join with appropriate flags and tokens.

You should see on stdout the following messages

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

Access the server w/ ID 0 and run $ kubectl get nodes. The output should have a single entry.
Access the server w/ ID 2 and change its hostname to k8s-master
Repeat steps 5 and 6
Keep monitoring load average on server w/ ID 0.

Anything else we need to know?:

Running kubeadm join on nodes sharing the same hostname exits with a success exit code but kubectl get nodes on the control-pane fails to list them (only one is reported). This is also valid while changing the hostname of a node that already belongs to the cluster (e.g. sudo hostnamectl set-hostname DUPLICATE_HOSTNAME).

In such scenario, we can observe a severe CPU load increase on k8s master node, caused by concurrent updates to the etcd key /registry/minions/[NODE-HOSTNAME], which, in turn, generate several etcd events to be handled by k8s components.

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:36:19Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration:

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             2
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 6
Model name:            QEMU Virtual CPU version 2.5+
Stepping:              3
CPU MHz:               2394.454
BogoMIPS:              4788.90
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
L3 cache:              16384K
NUMA node0 CPU(s):     0,1
Flags:                 fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl xtopology eagerfpu pni cx16 x2apic hypervisor lahf_lm

# free -h
              total        used        free      shared  buff/cache   available
Mem:           990M        552M         68M         14M        369M        255M
Swap:            0B          0B          0B

# lshw -class network
*-network
     description: Ethernet controller
     product: Virtio network device
     vendor: Red Hat, Inc.
     physical id: 3
     bus info: pci@0000:00:03.0
     version: 00
     width: 64 bits
     clock: 33MHz
     capabilities: msix bus_master cap_list rom
     configuration: driver=virtio-pci latency=0
     resources: irq:10 ioport:c0a0(size=32) memory:febd1000-febd1fff memory:fe000000-fe003fff memory:feb80000-febbffff
   *-virtio0
        description: Ethernet interface
        physical id: 0
        bus info: virtio@0
        logical name: eth0
        serial: 9e:a3:f5:c1:6e:36
        capabilities: ethernet physical
        configuration: broadcast=yes driver=virtio_net driverversion=1.0.0 ip=192.168.122.81 link=yes multicast=yes

OS (e.g: cat /etc/os-release):

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Kernel (e.g. uname -a):

Linux k8s-master 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Install tools:

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
       https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

yum install kubeadm --nogpgcheck -y && \
  systemctl restart kubelet && systemctl enable kubelet

Cheers, Paulo A. Silva

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 3
Comments: 32 (18 by maintainers)

Most upvoted comments

Hi @neolit123, Yes I did, and in fact as much more nodes with the same hostname you join to the cluster, faster the clusters becomes unstable.

In our original research, joinning two worker nodes with the same hostname to the cluster, led the cluster to crash after a few hours

[root@k8s-master ~]# docker ps --format "table {{.Names}}\t{{.RunningFor}}\t{{.Status}}"
NAMES                                                                                         CREATED             STATUS
k8s_coredns_coredns-fb8b8dccf-zxbhf_kube-system_9a1234d8-56dd-11e9-b96c-1aad9bab15a2_22       22 minutes          Up 22 minutes
k8s_coredns_coredns-fb8b8dccf-xd4w2_kube-system_9a0dcd4e-56dd-11e9-b96c-1aad9bab15a2_22       22 minutes          Up 22 minutes
k8s_kube-apiserver_kube-apiserver-k8s-master_kube-system_2ea83c681043b9f023e3799cff75a51d_8   22 minutes          Up 22 minutes
k8s_etcd_etcd-k8s-master_kube-system_23fac8f75b8a758fb5fa22260098dad5_1                       8 hours             Up 8 hours
k8s_POD_coredns-fb8b8dccf-xd4w2_kube-system_9a0dcd4e-56dd-11e9-b96c-1aad9bab15a2_0            19 hours            Up 19 hours
k8s_POD_coredns-fb8b8dccf-zxbhf_kube-system_9a1234d8-56dd-11e9-b96c-1aad9bab15a2_0            19 hours            Up 19 hours
k8s_weave-npc_weave-net-9mhxk_kube-system_9a62434e-56dd-11e9-b96c-1aad9bab15a2_0              19 hours            Up 19 hours
k8s_weave_weave-net-9mhxk_kube-system_9a62434e-56dd-11e9-b96c-1aad9bab15a2_0                  19 hours            Up 19 hours
k8s_kube-proxy_kube-proxy-25jt7_kube-system_9a0d93e7-56dd-11e9-b96c-1aad9bab15a2_0            19 hours            Up 19 hours
k8s_POD_weave-net-9mhxk_kube-system_9a62434e-56dd-11e9-b96c-1aad9bab15a2_0                    19 hours            Up 19 hours
k8s_POD_kube-proxy-25jt7_kube-system_9a0d93e7-56dd-11e9-b96c-1aad9bab15a2_0                   19 hours            Up 19 hours
k8s_POD_kube-scheduler-k8s-master_kube-system_58272442e226c838b193bbba4c44091e_0              19 hours            Up 19 hours
k8s_POD_kube-controller-manager-k8s-master_kube-system_19c98a787281fe5ad8336ddcc184bbce_0     19 hours            Up 19 hours
k8s_POD_kube-apiserver-k8s-master_kube-system_2ea83c681043b9f023e3799cff75a51d_0              19 hours            Up 19 hours
k8s_POD_etcd-k8s-master_kube-system_23fac8f75b8a758fb5fa22260098dad5_0                        19 hours            Up 19 hours

PauloASilva on Aug 6, 2019