kubeadm: Setup control node on HA cluster hangs - upload CRI socket information

The control node did not join node with HA etcd cluster, it hangs on step Uploading the CRI Socket information

What keywords did you search in kubeadm issues before filing this one?

search around Error writing Crisocket information for the control-plane node

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version: # kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:56:34Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version:
# kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:58:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: baremetal

  • OS (e.g. from /etc/os-release):

# cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.8 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.8"
PRETTY_NAME=RHEL
  • Docker identically on all nodes:
# docker version
Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:46:54 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:45:28 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
  • Kernel (e.g. uname -a): # uname -a Linux control1.node 3.10.0-1127.8.2.el7.x86_64 #1 SMP Thu May 7 19:30:37 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
  • ETCD:
# docker run --rm -it --net host -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:${ETCD_TAG} etcdctl --cert /etc/kubernetes/pki/apiserver-etcd-client.crt  --key /etc/kubernetes/pki/apiserver-etcd-client.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints https://${HOST0}:2379 endpoint health --cluster
https://172.....193:2379 is healthy: successfully committed proposal: took = 16.302685ms
https://172.....195:2379 is healthy: successfully committed proposal: took = 16.987001ms
https://172.....194:2379 is healthy: successfully committed proposal: took = 16.765991ms
https://172.....196:2379 is healthy: successfully committed proposal: took = 17.314884ms
https://172.....197:2379 is healthy: successfully committed proposal: took = 16.864282ms

What happened?

kubeadm init --config /root/kubeadmcfg.yaml --upload-certs --v=10

the procedure hangs Uploading the CRI Socket information

see output

I0825 22:30:16.940170   22079 round_trippers.go:449] Response Headers:
I0825 22:30:16.940176   22079 round_trippers.go:452]     Cache-Control: no-cache, private
I0825 22:30:16.940182   22079 round_trippers.go:452]     Content-Type: application/json
I0825 22:30:16.940200   22079 round_trippers.go:452]     Content-Length: 228
I0825 22:30:16.940211   22079 round_trippers.go:452]     Date: Tue, 25 Aug 2020 20:30:16 GMT
I0825 22:30:16.940249   22079 request.go:1068] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nodes \"control1.node\" not found","reason":"NotFound","details":{"name":"control1.node","kind":"nodes"},"code":404}
I0825 22:30:17.436902   22079 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.18.6 (linux/amd64) kubernetes/dff82dc" 'https://172.....211:6443/api/v1/nodes/control1.node?timeout=10s'
I0825 22:30:17.439747   22079 round_trippers.go:443] GET https://172.....211:6443/api/v1/nodes/control1.node?timeout=10s 404 Not Found in 2 milliseconds
# cat /root/kubeadmcfg.yaml
### control1.node == 172.1....211, used correct names and ips, only replaced here in this output
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "172.1....211:6443"
networking:
    podSubnet: 10.244.0.0/16
etcd:
    external:
        endpoints:
        - https://172....193:2379
        - https://172....194:2379
        - https://172....195:2379
        - https://172....196:2379
        - https://172....197:2379
        caFile: /etc/kubernetes/pki/etcd/ca.crt
        certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
        keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
I0825 22:38:35.058163   25434 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/containerd/containerd.sock" to the Node API object "control1.node" as an annotation
[kubelet-check] Initial timeout of 40s passed.
I0824 19:07:59.425173   15940 uploadconfig.go:127] [upload-config] Preserving the CRISocket information for the control-plane node
I0824 19:07:59.425183   15940 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "control1.node" as an annotation

containerd status

# systemctl status -l containerd
● containerd.service - containerd container runtime
   Loaded: loaded (/usr/lib/systemd/system/containerd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-08-25 22:38:26 CEST; 10h ago
     Docs: https://containerd.io
 Main PID: 24878 (containerd)
    Tasks: 77
   Memory: 33.5M
   CGroup: /system.slice/containerd.service
           ├─24878 /usr/bin/containerd
           ├─25090 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/f79c073fb0c55ed668cc3db605f9a9ae93a9fa0b9bb2957e268fef66792802e4 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
           ├─25104 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/b977b9415947ac477d1383499276dbfaa9104abd5ff413e17219932a2d4e1a50 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
           ├─25105 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/aa204bdf0f0239f132b31ae6944048ee714441e56cb79e01f93c02834bc75c9d -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
           ├─25193 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/248be8c76425ada5ff95cebccb6de735ed0a81b31c1e7cc8cdd81ce503a96e84 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
           ├─25200 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/c1c50fed28697b1f874be411147080c45aeffd0cd3592fb084c892134875722e -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
           └─25207 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/5817bfbf3737767a5e32153881b16dfdda1a967924b15d0da77d8bffad38acd9 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup

Aug 25 22:38:26 control.node containerd[24878]: time="2020-08-25T22:38:26.813233204+02:00" level=info msg="Start streaming server"
# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
        /etc/systemd/system/kubelet.service.d
           └─20-etcd-service-manager.conf
   Active: active (running) since Tue 2020-08-25 22:38:32 CEST; 10h ago
     Docs: https://kubernetes.io/docs/
 Main PID: 25525 (kubelet)
    Tasks: 16
   Memory: 22.9M
   CGroup: /system.slice/kubelet.service
           └─25525 /usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd

Aug 26 09:01:16 control.node kubelet[25525]: I0826 09:01:16.090610   25525 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach

What you expected to happen?

kubeadm joins the node to cluster

How to reproduce it (as minimally and precisely as possible)?

can reproduce, tear down the Kubernetes provisioning on node, then restart kubeadm init

Anything else we need to know?

I tried so many variations, also with latest kubernetes version.

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "control1.node:6443" # tested port, reached by nc
# apiServer:
#   extraArgs:
#     anonymous-auth: "false"
#     authorization-mode: Node,RBAC ## tried RBAC,Node variation
# networking:
#     podSubnet: 10.244.0.0/16 ## recommended subnet by calicio
etcd:
    external:
        endpoints:
        - https://172....193:2379
        - https://172....194:2379
        - https://172....195:2379
        - https://172....196:2379
        - https://172....197:2379
        caFile: /etc/kubernetes/pki/etcd/ca.crt
        certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
        keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 27 (16 by maintainers)

Most upvoted comments

also please share the kube-apiserver container log (as i requested that multiple times).