kubeadm: Setup control node on HA cluster hangs - upload CRI socket information
The control node did not join node with HA etcd cluster, it hangs on step Uploading the CRI Socket information
What keywords did you search in kubeadm issues before filing this one?
search around Error writing Crisocket information for the control-plane node
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version:
# kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:56:34Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Environment:
- Kubernetes version:
# kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:58:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
-
Cloud provider or hardware configuration: baremetal
-
OS (e.g. from /etc/os-release):
# cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.8 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.8"
PRETTY_NAME=RHEL
- Docker identically on all nodes:
# docker version
Client: Docker Engine - Community
Version: 19.03.12
API version: 1.40
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:46:54 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.12
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:45:28 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
- Kernel (e.g.
uname -a
):# uname -a Linux control1.node 3.10.0-1127.8.2.el7.x86_64 #1 SMP Thu May 7 19:30:37 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
- ETCD:
# docker run --rm -it --net host -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:${ETCD_TAG} etcdctl --cert /etc/kubernetes/pki/apiserver-etcd-client.crt --key /etc/kubernetes/pki/apiserver-etcd-client.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints https://${HOST0}:2379 endpoint health --cluster
https://172.....193:2379 is healthy: successfully committed proposal: took = 16.302685ms
https://172.....195:2379 is healthy: successfully committed proposal: took = 16.987001ms
https://172.....194:2379 is healthy: successfully committed proposal: took = 16.765991ms
https://172.....196:2379 is healthy: successfully committed proposal: took = 17.314884ms
https://172.....197:2379 is healthy: successfully committed proposal: took = 16.864282ms
What happened?
kubeadm init --config /root/kubeadmcfg.yaml --upload-certs --v=10
the procedure hangs Uploading the CRI Socket information
see output
I0825 22:30:16.940170 22079 round_trippers.go:449] Response Headers:
I0825 22:30:16.940176 22079 round_trippers.go:452] Cache-Control: no-cache, private
I0825 22:30:16.940182 22079 round_trippers.go:452] Content-Type: application/json
I0825 22:30:16.940200 22079 round_trippers.go:452] Content-Length: 228
I0825 22:30:16.940211 22079 round_trippers.go:452] Date: Tue, 25 Aug 2020 20:30:16 GMT
I0825 22:30:16.940249 22079 request.go:1068] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nodes \"control1.node\" not found","reason":"NotFound","details":{"name":"control1.node","kind":"nodes"},"code":404}
I0825 22:30:17.436902 22079 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.18.6 (linux/amd64) kubernetes/dff82dc" 'https://172.....211:6443/api/v1/nodes/control1.node?timeout=10s'
I0825 22:30:17.439747 22079 round_trippers.go:443] GET https://172.....211:6443/api/v1/nodes/control1.node?timeout=10s 404 Not Found in 2 milliseconds
# cat /root/kubeadmcfg.yaml
### control1.node == 172.1....211, used correct names and ips, only replaced here in this output
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "172.1....211:6443"
networking:
podSubnet: 10.244.0.0/16
etcd:
external:
endpoints:
- https://172....193:2379
- https://172....194:2379
- https://172....195:2379
- https://172....196:2379
- https://172....197:2379
caFile: /etc/kubernetes/pki/etcd/ca.crt
certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
I0825 22:38:35.058163 25434 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/containerd/containerd.sock" to the Node API object "control1.node" as an annotation
[kubelet-check] Initial timeout of 40s passed.
I0824 19:07:59.425173 15940 uploadconfig.go:127] [upload-config] Preserving the CRISocket information for the control-plane node
I0824 19:07:59.425183 15940 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "control1.node" as an annotation
containerd status
# systemctl status -l containerd
● containerd.service - containerd container runtime
Loaded: loaded (/usr/lib/systemd/system/containerd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2020-08-25 22:38:26 CEST; 10h ago
Docs: https://containerd.io
Main PID: 24878 (containerd)
Tasks: 77
Memory: 33.5M
CGroup: /system.slice/containerd.service
├─24878 /usr/bin/containerd
├─25090 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/f79c073fb0c55ed668cc3db605f9a9ae93a9fa0b9bb2957e268fef66792802e4 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
├─25104 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/b977b9415947ac477d1383499276dbfaa9104abd5ff413e17219932a2d4e1a50 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
├─25105 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/aa204bdf0f0239f132b31ae6944048ee714441e56cb79e01f93c02834bc75c9d -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
├─25193 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/248be8c76425ada5ff95cebccb6de735ed0a81b31c1e7cc8cdd81ce503a96e84 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
├─25200 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/c1c50fed28697b1f874be411147080c45aeffd0cd3592fb084c892134875722e -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
└─25207 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/5817bfbf3737767a5e32153881b16dfdda1a967924b15d0da77d8bffad38acd9 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
Aug 25 22:38:26 control.node containerd[24878]: time="2020-08-25T22:38:26.813233204+02:00" level=info msg="Start streaming server"
# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
/etc/systemd/system/kubelet.service.d
└─20-etcd-service-manager.conf
Active: active (running) since Tue 2020-08-25 22:38:32 CEST; 10h ago
Docs: https://kubernetes.io/docs/
Main PID: 25525 (kubelet)
Tasks: 16
Memory: 22.9M
CGroup: /system.slice/kubelet.service
└─25525 /usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd
Aug 26 09:01:16 control.node kubelet[25525]: I0826 09:01:16.090610 25525 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
What you expected to happen?
kubeadm joins the node to cluster
How to reproduce it (as minimally and precisely as possible)?
can reproduce, tear down the Kubernetes provisioning on node, then restart kubeadm init
Anything else we need to know?
I tried so many variations, also with latest kubernetes version.
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "control1.node:6443" # tested port, reached by nc
# apiServer:
# extraArgs:
# anonymous-auth: "false"
# authorization-mode: Node,RBAC ## tried RBAC,Node variation
# networking:
# podSubnet: 10.244.0.0/16 ## recommended subnet by calicio
etcd:
external:
endpoints:
- https://172....193:2379
- https://172....194:2379
- https://172....195:2379
- https://172....196:2379
- https://172....197:2379
caFile: /etc/kubernetes/pki/etcd/ca.crt
certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 27 (16 by maintainers)
also please share the kube-apiserver container log (as i requested that multiple times).