kubernetes: Race condition in Kubelet service

@kubernetes/sig-api-machinery-bugs

What happened:

Race condition in Kubelet causes Kubernetes cluster to become unstable due to high CPU load when two nodes share the same hostname or --hostname-override value.

What you expected to happen:

Since node hostname or --hostname-override value is used to compute the etcd key name (/registry/minions/[NODE-HOSTNAME]), the system should either:

  1. prevent nodes with a duplicate hostname or --hostname-override value to join the cluster,
  2. add a prefix/suffix to the etcd key name (e.g. /registry/minions/[NODE-HOSTNAME]-[SUFFIX])

How to reproduce it (as minimally and precisely as possible):

  1. Spawn 3 new servers

  2. Setup Kubernetes on all 3 servers, according to the following table:

    Server ID Hostname Cluster Role
    0 k8s-master Master
    1 k8s-node-1 Worker
    2 k8s-node-2 Worker
  3. Start all 3 servers and initialize the cluster

  4. Access the server w/ ID 1 and change its hostname to k8s-master

  5. Run the kubeadm join with appropriate flags and tokens.

    You should see on stdout the following messages

    This node has joined the cluster:
    * Certificate signing request was sent to apiserver and a response was received.
    * The Kubelet was informed of the new secure connection details.
    
    Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
    
  6. Access the server w/ ID 0 and run $ kubectl get nodes. The output should have a single entry.

  7. Access the server w/ ID 2 and change its hostname to k8s-master

  8. Repeat steps 5 and 6

  9. Keep monitoring load average on server w/ ID 0.

Anything else we need to know?:

Running kubeadm join on nodes sharing the same hostname exits with a success exit code but kubectl get nodes on the control-pane fails to list them (only one is reported). This is also valid while changing the hostname of a node that already belongs to the cluster (e.g. sudo hostnamectl set-hostname DUPLICATE_HOSTNAME).

In such scenario, we can observe a severe CPU load increase on k8s master node, caused by concurrent updates to the etcd key /registry/minions/[NODE-HOSTNAME], which, in turn, generate several etcd events to be handled by k8s components.

Environment:

  • Kubernetes version (use kubectl version):

    Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:36:19Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
    
  • Cloud provider or hardware configuration:

    # lscpu
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                2
    On-line CPU(s) list:   0,1
    Thread(s) per core:    1
    Core(s) per socket:    1
    Socket(s):             2
    NUMA node(s):          1
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 6
    Model name:            QEMU Virtual CPU version 2.5+
    Stepping:              3
    CPU MHz:               2394.454
    BogoMIPS:              4788.90
    Hypervisor vendor:     KVM
    Virtualization type:   full
    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              4096K
    L3 cache:              16384K
    NUMA node0 CPU(s):     0,1
    Flags:                 fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl xtopology eagerfpu pni cx16 x2apic hypervisor lahf_lm
    
    # free -h
                  total        used        free      shared  buff/cache   available
    Mem:           990M        552M         68M         14M        369M        255M
    Swap:            0B          0B          0B
    
    # lshw -class network
    *-network
         description: Ethernet controller
         product: Virtio network device
         vendor: Red Hat, Inc.
         physical id: 3
         bus info: pci@0000:00:03.0
         version: 00
         width: 64 bits
         clock: 33MHz
         capabilities: msix bus_master cap_list rom
         configuration: driver=virtio-pci latency=0
         resources: irq:10 ioport:c0a0(size=32) memory:febd1000-febd1fff memory:fe000000-fe003fff memory:feb80000-febbffff
       *-virtio0
            description: Ethernet interface
            physical id: 0
            bus info: virtio@0
            logical name: eth0
            serial: 9e:a3:f5:c1:6e:36
            capabilities: ethernet physical
            configuration: broadcast=yes driver=virtio_net driverversion=1.0.0 ip=192.168.122.81 link=yes multicast=yes
    
  • OS (e.g: cat /etc/os-release):

    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"
    CENTOS_MANTISBT_PROJECT="CentOS-7"
    CENTOS_MANTISBT_PROJECT_VERSION="7"
    REDHAT_SUPPORT_PRODUCT="centos"
    REDHAT_SUPPORT_PRODUCT_VERSION="7"
    
  • Kernel (e.g. uname -a):

    Linux k8s-master 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
    
  • Install tools:

    cat <<EOF > /etc/yum.repos.d/kubernetes.repo
    [kubernetes]
    name=Kubernetes
    baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
    enabled=1
    gpgcheck=1
    repo_gpgcheck=1
    gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
           https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
    EOF
    
    yum install kubeadm --nogpgcheck -y && \
      systemctl restart kubelet && systemctl enable kubelet
    

Cheers, Paulo A. Silva

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 3
  • Comments: 32 (18 by maintainers)

Most upvoted comments

Hi @neolit123, Yes I did, and in fact as much more nodes with the same hostname you join to the cluster, faster the clusters becomes unstable.

In our original research, joinning two worker nodes with the same hostname to the cluster, led the cluster to crash after a few hours

[root@k8s-master ~]# docker ps --format "table {{.Names}}\t{{.RunningFor}}\t{{.Status}}"
NAMES                                                                                         CREATED             STATUS
k8s_coredns_coredns-fb8b8dccf-zxbhf_kube-system_9a1234d8-56dd-11e9-b96c-1aad9bab15a2_22       22 minutes          Up 22 minutes
k8s_coredns_coredns-fb8b8dccf-xd4w2_kube-system_9a0dcd4e-56dd-11e9-b96c-1aad9bab15a2_22       22 minutes          Up 22 minutes
k8s_kube-apiserver_kube-apiserver-k8s-master_kube-system_2ea83c681043b9f023e3799cff75a51d_8   22 minutes          Up 22 minutes
k8s_etcd_etcd-k8s-master_kube-system_23fac8f75b8a758fb5fa22260098dad5_1                       8 hours             Up 8 hours
k8s_POD_coredns-fb8b8dccf-xd4w2_kube-system_9a0dcd4e-56dd-11e9-b96c-1aad9bab15a2_0            19 hours            Up 19 hours
k8s_POD_coredns-fb8b8dccf-zxbhf_kube-system_9a1234d8-56dd-11e9-b96c-1aad9bab15a2_0            19 hours            Up 19 hours
k8s_weave-npc_weave-net-9mhxk_kube-system_9a62434e-56dd-11e9-b96c-1aad9bab15a2_0              19 hours            Up 19 hours
k8s_weave_weave-net-9mhxk_kube-system_9a62434e-56dd-11e9-b96c-1aad9bab15a2_0                  19 hours            Up 19 hours
k8s_kube-proxy_kube-proxy-25jt7_kube-system_9a0d93e7-56dd-11e9-b96c-1aad9bab15a2_0            19 hours            Up 19 hours
k8s_POD_weave-net-9mhxk_kube-system_9a62434e-56dd-11e9-b96c-1aad9bab15a2_0                    19 hours            Up 19 hours
k8s_POD_kube-proxy-25jt7_kube-system_9a0d93e7-56dd-11e9-b96c-1aad9bab15a2_0                   19 hours            Up 19 hours
k8s_POD_kube-scheduler-k8s-master_kube-system_58272442e226c838b193bbba4c44091e_0              19 hours            Up 19 hours
k8s_POD_kube-controller-manager-k8s-master_kube-system_19c98a787281fe5ad8336ddcc184bbce_0     19 hours            Up 19 hours
k8s_POD_kube-apiserver-k8s-master_kube-system_2ea83c681043b9f023e3799cff75a51d_0              19 hours            Up 19 hours
k8s_POD_etcd-k8s-master_kube-system_23fac8f75b8a758fb5fa22260098dad5_0                        19 hours            Up 19 hours