kubernetes: Kubeadm 1.22.x fails to stand up master node
What happened?
I’ve installed K8S 1.20.0 on these boxes previous, but when I re-imaged and tried installing 1.22.0, it leaves me in a constant crashing state of the K8S services. The command I’m running to install is just the “kubeadm init” command; no special switches or anything.
Here is a sample from system logs after I’ve used kubeadm:
Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.852682881-06:00" level=info msg="shim disconnected" id=833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac
Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.854676545-06:00" level=info msg="TearDown network for sandbox \"833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac\" successfully"
Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.855540521-06:00" level=info msg="StopPodSandbox for \"833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac\" returns successfully"
Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.857977203-06:00" level=info msg="StopPodSandbox for \"91705275e88d7c5de9e2ee4d38e10624442f938d6509b90c788bb3c799e9ec6c\""
Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.858231406-06:00" level=info msg="Container to stop \"63a731fcfa96a7e807e3290c0ec6f9934b132f16e1b55bb1b6cd2228708006d0\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.858549596-06:00" level=info msg="TearDown network for sandbox \"91705275e88d7c5de9e2ee4d38e10624442f938d6509b90c788bb3c799e9ec6c\" successfully"
Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.858612995-06:00" level=info msg="StopPodSandbox for \"91705275e88d7c5de9e2ee4d38e10624442f938d6509b90c788bb3c799e9ec6c\" returns successfully"
Nov 16 09:01:14 pz-k8s-node-master kubelet[137954]: E1116 09:01:14.909765 137954 kubelet.go:1701] "Failed creating a mirror pod for" err="Post \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods\": dial tcp 10.23.1.164:6443: connect: connection refused" pod="kube-system/kube-apiserver-pz-k8s-node-master"
Nov 16 09:01:15 pz-k8s-node-master kubelet[137954]: I1116 09:01:15.365616 137954 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac"
Nov 16 09:01:15 pz-k8s-node-master kubelet[137954]: I1116 09:01:15.368956 137954 status_manager.go:601] "Failed to get status for pod" podUID=e8784cf35b1a780ce7c98605ecf1f1fb pod="kube-system/kube-apiserver-pz-k8s-node-master" err="Get \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-pz-k8s-node-master\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:15 pz-k8s-node-master kubelet[137954]: I1116 09:01:15.369045 137954 scope.go:110] "RemoveContainer" containerID="63a731fcfa96a7e807e3290c0ec6f9934b132f16e1b55bb1b6cd2228708006d0"
Nov 16 09:01:15 pz-k8s-node-master kubelet[137954]: E1116 09:01:15.370130 137954 kubelet.go:1701] "Failed creating a mirror pod for" err="Post \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods\": dial tcp 10.23.1.164:6443: connect: connection refused" pod="kube-system/kube-apiserver-pz-k8s-node-master"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.371383050-06:00" level=info msg="StopPodSandbox for \"833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac\""
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.371606928-06:00" level=info msg="Container to stop \"95446d21f26117b93d11a1252a0167a279980274bb6b196b629bed12fc52157e\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.372661106-06:00" level=info msg="TearDown network for sandbox \"833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac\" successfully"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.372772261-06:00" level=info msg="StopPodSandbox for \"833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac\" returns successfully"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.374798557-06:00" level=info msg="StopPodSandbox for \"91705275e88d7c5de9e2ee4d38e10624442f938d6509b90c788bb3c799e9ec6c\""
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.375014549-06:00" level=info msg="Container to stop \"63a731fcfa96a7e807e3290c0ec6f9934b132f16e1b55bb1b6cd2228708006d0\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.376594183-06:00" level=info msg="TearDown network for sandbox \"91705275e88d7c5de9e2ee4d38e10624442f938d6509b90c788bb3c799e9ec6c\" successfully"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.376762150-06:00" level=info msg="StopPodSandbox for \"91705275e88d7c5de9e2ee4d38e10624442f938d6509b90c788bb3c799e9ec6c\" returns successfully"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.379082079-06:00" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-apiserver-pz-k8s-node-master,Uid:e8784cf35b1a780ce7c98605ecf1f1fb,Namespace:kube-system,Attempt:2,}"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.381599088-06:00" level=info msg="RemoveContainer for \"63a731fcfa96a7e807e3290c0ec6f9934b132f16e1b55bb1b6cd2228708006d0\""
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.412829229-06:00" level=info msg="RemoveContainer for \"63a731fcfa96a7e807e3290c0ec6f9934b132f16e1b55bb1b6cd2228708006d0\" returns successfully"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.513696567-06:00" level=info msg="starting signal loop" namespace=k8s.io path=/run/containerd/io.containerd.runtime.v2.task/k8s.io/2c80958a51574be951a0d51f241f554ad1d07591f6ed45b976630ab6eac65362 pid=138936
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.793499364-06:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-apiserver-pz-k8s-node-master,Uid:e8784cf35b1a780ce7c98605ecf1f1fb,Namespace:kube-system,Attempt:2,} returns sandbox id \"2c80958a51574be951a0d51f241f554ad1d07591f6ed45b976630ab6eac65362\""
Nov 16 09:01:15 pz-k8s-node-master kubelet[137954]: E1116 09:01:15.800993 137954 pod_workers.go:836] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-apiserver pod=kube-apiserver-pz-k8s-node-master_kube-system(e8784cf35b1a780ce7c98605ecf1f1fb)\"" pod="kube-system/kube-apiserver-pz-k8s-node-master" podUID=e8784cf35b1a780ce7c98605ecf1f1fb
Nov 16 09:01:16 pz-k8s-node-master kubelet[137954]: E1116 09:01:16.379462 137954 kubelet.go:1701] "Failed creating a mirror pod for" err="Post \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods\": dial tcp 10.23.1.164:6443: connect: connection refused" pod="kube-system/kube-apiserver-pz-k8s-node-master"
Nov 16 09:01:16 pz-k8s-node-master kubelet[137954]: I1116 09:01:16.379739 137954 scope.go:110] "RemoveContainer" containerID="95446d21f26117b93d11a1252a0167a279980274bb6b196b629bed12fc52157e"
Nov 16 09:01:16 pz-k8s-node-master kubelet[137954]: I1116 09:01:16.382880 137954 status_manager.go:601] "Failed to get status for pod" podUID=e8784cf35b1a780ce7c98605ecf1f1fb pod="kube-system/kube-apiserver-pz-k8s-node-master" err="Get \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-pz-k8s-node-master\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:16 pz-k8s-node-master kubelet[137954]: E1116 09:01:16.384116 137954 pod_workers.go:836] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-apiserver pod=kube-apiserver-pz-k8s-node-master_kube-system(e8784cf35b1a780ce7c98605ecf1f1fb)\"" pod="kube-system/kube-apiserver-pz-k8s-node-master" podUID=e8784cf35b1a780ce7c98605ecf1f1fb
Nov 16 09:01:17 pz-k8s-node-master kubelet[137954]: E1116 09:01:17.383644 137954 kubelet.go:1701] "Failed creating a mirror pod for" err="Post \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods\": dial tcp 10.23.1.164:6443: connect: connection refused" pod="kube-system/kube-apiserver-pz-k8s-node-master"
Nov 16 09:01:17 pz-k8s-node-master kubelet[137954]: I1116 09:01:17.383932 137954 scope.go:110] "RemoveContainer" containerID="95446d21f26117b93d11a1252a0167a279980274bb6b196b629bed12fc52157e"
Nov 16 09:01:17 pz-k8s-node-master kubelet[137954]: E1116 09:01:17.387155 137954 pod_workers.go:836] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-apiserver pod=kube-apiserver-pz-k8s-node-master_kube-system(e8784cf35b1a780ce7c98605ecf1f1fb)\"" pod="kube-system/kube-apiserver-pz-k8s-node-master" podUID=e8784cf35b1a780ce7c98605ecf1f1fb
Nov 16 09:01:17 pz-k8s-node-master kubelet[137954]: E1116 09:01:17.429157 137954 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://10.23.1.164:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/pz-k8s-node-master?timeout=10s": dial tcp 10.23.1.164:6443: connect: connection refused
Nov 16 09:01:17 pz-k8s-node-master kubelet[137954]: I1116 09:01:17.746328 137954 status_manager.go:601] "Failed to get status for pod" podUID=873dc75a62035fcd7566e2cffd107f6b pod="kube-system/kube-scheduler-pz-k8s-node-master" err="Get \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods/kube-scheduler-pz-k8s-node-master\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:17 pz-k8s-node-master kubelet[137954]: I1116 09:01:17.747313 137954 status_manager.go:601] "Failed to get status for pod" podUID=459c55e3-d1d3-4b1d-b750-2d3342ee3796 pod="kube-system/kube-proxy-q9lq5" err="Get \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods/kube-proxy-q9lq5\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:18 pz-k8s-node-master kubelet[137954]: E1116 09:01:18.282203 137954 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-scheduler-pz-k8s-node-master.16b80ee18141dd18", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-scheduler-pz-k8s-node-master", UID:"873dc75a62035fcd7566e2cffd107f6b", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{kube-scheduler}"}, Reason:"Unhealthy", Message:"Startup probe failed: Get \"https://127.0.0.1:10259/healthz\": net/http: TLS handshake timeout", Source:v1.EventSource{Component:"kubelet", Host:"pz-k8s-node-master"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc05d0fc28ecaf918, ext:34592649479, loc:(*time.Location)(0x77a8680)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc05d0fc28ecaf918, ext:34592649479, loc:(*time.Location)(0x77a8680)}}, Count:1, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://10.23.1.164:6443/api/v1/namespaces/kube-system/events": dial tcp 10.23.1.164:6443: connect: connection refused'(may retry after sleeping)
Nov 16 09:01:20 pz-k8s-node-master kubelet[137954]: E1116 09:01:20.556794 137954 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"pz-k8s-node-master\": Get \"https://10.23.1.164:6443/api/v1/nodes/pz-k8s-node-master?resourceVersion=0&timeout=10s\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:20 pz-k8s-node-master kubelet[137954]: E1116 09:01:20.559042 137954 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"pz-k8s-node-master\": Get \"https://10.23.1.164:6443/api/v1/nodes/pz-k8s-node-master?timeout=10s\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:20 pz-k8s-node-master kubelet[137954]: E1116 09:01:20.560144 137954 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"pz-k8s-node-master\": Get \"https://10.23.1.164:6443/api/v1/nodes/pz-k8s-node-master?timeout=10s\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:20 pz-k8s-node-master kubelet[137954]: E1116 09:01:20.561246 137954 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"pz-k8s-node-master\": Get \"https://10.23.1.164:6443/api/v1/nodes/pz-k8s-node-master?timeout=10s\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:20 pz-k8s-node-master kubelet[137954]: E1116 09:01:20.562195 137954 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"pz-k8s-node-master\": Get \"https://10.23.1.164:6443/api/v1/nodes/pz-k8s-node-master?timeout=10s\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:20 pz-k8s-node-master kubelet[137954]: E1116 09:01:20.562274 137954 kubelet_node_status.go:457] "Unable to update node status" err="update node status exceeds retry count"
The kube-proxy will just Error out:
NAME READY STATUS RESTARTS AGE
coredns-78fcd69978-gs4t6 0/1 Pending 0 2m16s
coredns-78fcd69978-sdvdg 0/1 Pending 0 2m16s
kube-proxy-q9lq5 0/1 Error 0 2m16s
kube-scheduler-pz-k8s-node-master 0/1 Pending 0 91s
These are the kube-proxy events/logs:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m40s default-scheduler Successfully assigned kube-system/kube-proxy-q9lq5 to pz-k8s-node-master
Warning FailedMount 3m28s (x5 over 3m36s) kubelet MountVolume.SetUp failed for volume "kube-api-access-2f5jg" : failed to fetch token: Post "https://10.23.1.164:6443/api/v1/namespaces/kube-system/serviceaccounts/kube-proxy/token": dial tcp 10.23.1.164:6443: connect: connection refused
Warning FailedMount 3m10s kubelet MountVolume.SetUp failed for volume "kube-api-access-2f5jg" : failed to fetch token: Post "https://10.23.1.164:6443/api/v1/namespaces/kube-system/serviceaccounts/kube-proxy/token": net/http: TLS handshake timeout
Normal Pulled 2m51s (x2 over 2m53s) kubelet Container image "k8s.gcr.io/kube-proxy:v1.22.3" already present on machine
Normal Created 2m51s (x2 over 2m53s) kubelet Created container kube-proxy
Normal Started 2m50s (x2 over 2m53s) kubelet Started container kube-proxy
Normal Killing 88s (x2 over 2m52s) kubelet Stopping container kube-proxy
Normal SandboxChanged 87s (x2 over 2m51s) kubelet Pod sandbox changed, it will be killed and re-created.
Warning BackOff 86s kubelet Back-off restarting failed container
[root@pz-k8s-node-master:/etc]:kubectl -n kube-system logs kube-proxy-q9lq5
I1116 15:03:57.589495 1 node.go:172] Successfully retrieved node IP: 10.23.1.164
I1116 15:03:57.589757 1 server_others.go:140] Detected node IP 10.23.1.164
W1116 15:03:57.589843 1 server_others.go:565] Unknown proxy mode "", assuming iptables proxy
I1116 15:03:57.738149 1 server_others.go:206] kube-proxy running in dual-stack mode, IPv4-primary
I1116 15:03:57.738268 1 server_others.go:212] Using iptables Proxier.
I1116 15:03:57.738323 1 server_others.go:219] creating dualStackProxier for iptables.
W1116 15:03:57.738386 1 server_others.go:479] detect-local-mode set to ClusterCIDR, but no cluster CIDR defined
W1116 15:03:57.738412 1 server_others.go:528] detect-local-mode: ClusterCIDR , defaulting to no-op detect-local
I1116 15:03:57.740636 1 server.go:649] Version: v1.22.3
I1116 15:03:57.773304 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1116 15:03:57.775772 1 config.go:315] Starting service config controller
I1116 15:03:57.776392 1 shared_informer.go:240] Waiting for caches to sync for service config
I1116 15:03:57.777805 1 config.go:224] Starting endpoint slice config controller
I1116 15:03:57.779140 1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I1116 15:03:57.877415 1 shared_informer.go:247] Caches are synced for service config
I1116 15:03:57.884960 1 shared_informer.go:247] Caches are synced for endpoint slice config
After about 3 minutes, the other services start crashing:
[root@pz-k8s-node-master:/etc]:kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-78fcd69978-gs4t6 0/1 Pending 0 4m21s
coredns-78fcd69978-sdvdg 0/1 Pending 0 4m21s
kube-apiserver-pz-k8s-node-master 1/1 Running 3 (110s ago) 42s
kube-controller-manager-pz-k8s-node-master 0/1 CrashLoopBackOff 3 (46s ago) 50s
kube-proxy-q9lq5 1/1 Running 3 (49s ago) 4m21s
kube-scheduler-pz-k8s-node-master 1/1 Running 2 (111s ago) 3m36s
Here are the logs from kube-controller-manager:
[root@pz-k8s-node-master:/etc]:kubectl -n kube-system logs kube-controller-manager-pz-k8s-node-master
Flag --port has been deprecated, This flag has no effect now and will be removed in v1.24.
I1116 15:04:24.815115 1 serving.go:347] Generated self-signed cert in-memory
I1116 15:04:28.144401 1 controllermanager.go:186] Version: v1.22.3
I1116 15:04:28.161880 1 secure_serving.go:200] Serving securely on 127.0.0.1:10257
I1116 15:04:28.168418 1 dynamic_cafile_content.go:155] "Starting controller" name="request-header::/etc/kubernetes/pki/front-proxy-ca.crt"
I1116 15:04:28.168936 1 dynamic_cafile_content.go:155] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
I1116 15:04:28.169679 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I1116 15:04:28.173155 1 leaderelection.go:248] attempting to acquire leader lease kube-system/kube-controller-manager...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 5m38s kubelet Container image "k8s.gcr.io/kube-controller-manager:v1.22.3" already present on machine
Normal Created 5m38s kubelet Created container kube-controller-manager
Warning Unhealthy 96s (x2 over 106s) kubelet Liveness probe failed: Get "https://127.0.0.1:10257/healthz": dial tcp 127.0.0.1:10257: connect: connection refused
Normal Pulled 86s (x3 over 4m46s) kubelet Container image "k8s.gcr.io/kube-controller-manager:v1.22.3" already present on machine
Normal Created 86s (x3 over 4m45s) kubelet Created container kube-controller-manager
Normal Started 85s (x3 over 4m45s) kubelet Started container kube-controller-manager
Normal Killing 77s (x3 over 4m48s) kubelet Stopping container kube-controller-manager
Normal SandboxChanged 76s (x3 over 4m47s) kubelet Pod sandbox changed, it will be killed and re-created.
Warning BackOff 54s (x8 over 2m21s) kubelet Back-off restarting failed container
So from looking at other issues filed, I think this message is sticking out:
Pod sandbox changed, it will be killed and re-created.
The specs of my machines are an Intel® Atom™ CPU D525 @ 1.80GHz with 4GB RAM. The CPU is two separage cores (no HT). I do notice load averages between 2-3 when master is being loaded.
What did you expect to happen?
I should have had a working cluster with no crashing kube-system pods.
How can we reproduce it (as minimally and precisely as possible)?
The only way would be to try it on a lower speced box. The only published requirement is 2 cores, but it doesn’t mention anything about the speed of those cores.
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ uname -a
Linux pz-k8s-node-master 5.10.0-8-amd64 #1 SMP Debian 5.10.46-4 (2021-08-03) x86_64 GNU/Linux
Install tools
Container runtime (CRI) and and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 31 (11 by maintainers)
hi All, i want to share this, if any of the above mentioned didnt fix your problem then do the following:
apparently, the new Linux system which by default use cgroup v2 such as Arch linux, bullseye, ubuntu 21 , etc causing problems when deploying Kubernetes cluster using kubeadm init
the workaround is to turn cgroup v1 back on , by using the following grub parameter systemd.unified_cgroup_hierarchy=0
add the above variable into GRUB_CMDLINE_LINUX_DEFAULT and then run grub-mkconfig -o /boot/grub/grub.cfg (on arch linux) on ubuntu you must go to /etc/default/grub.d and find GRUB_CMDLINE_LINUX_DEFAULT within the file that has higher number in my case 50-cloudimg-settings.cfg so the file would look like this: (NOTE: i only add the last parameter) GRUB_CMDLINE_LINUX=“console=tty1 console=ttyS0 earlyprintk=ttyS0 systemd.unified_cgroup_hierarchy=1”
then run update-grub (debian/ubuntu) for different linux distro please find from doc this will give you workaround for containerd but for crio for some reason still having issues on Arch linux
try and let me know
@IgorOhrimenko Thanks, I followed your steps and it finally fixed the issue. Based on your steps, I created ansible playbook if anybody also needs one.
@jcpuzic, thanks, it’s work for me. Debian 11 bullseye 5.10.0-14-amd64 5.10.113-1 (2022-04-29) Kubernetes 1.24.1
There are steps for installing work well kubernetes cluster
I spent a bit more time on this issue and found the line that should be added to /etc/containerd/config.toml referenced here:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd
was not in the correct place. Once I moved the line my pods stayed up. As a note, I did confirm Debian Bullseye is cgroup v2 enabled by default:
https://www.debian.org/releases/bullseye/amd64/release-notes/ch-whats-new.en.html#cgroupv2
I also found adding the ‘-v’ flag to Kubelet helped in finding potential issues or errors.