kubernetes: Kubeadm 1.22.x fails to stand up master node

What happened?

I’ve installed K8S 1.20.0 on these boxes previous, but when I re-imaged and tried installing 1.22.0, it leaves me in a constant crashing state of the K8S services. The command I’m running to install is just the “kubeadm init” command; no special switches or anything.

Here is a sample from system logs after I’ve used kubeadm:

Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.852682881-06:00" level=info msg="shim disconnected" id=833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac
Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.854676545-06:00" level=info msg="TearDown network for sandbox \"833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac\" successfully"
Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.855540521-06:00" level=info msg="StopPodSandbox for \"833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac\" returns successfully"
Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.857977203-06:00" level=info msg="StopPodSandbox for \"91705275e88d7c5de9e2ee4d38e10624442f938d6509b90c788bb3c799e9ec6c\""
Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.858231406-06:00" level=info msg="Container to stop \"63a731fcfa96a7e807e3290c0ec6f9934b132f16e1b55bb1b6cd2228708006d0\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.858549596-06:00" level=info msg="TearDown network for sandbox \"91705275e88d7c5de9e2ee4d38e10624442f938d6509b90c788bb3c799e9ec6c\" successfully"
Nov 16 09:01:14 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:14.858612995-06:00" level=info msg="StopPodSandbox for \"91705275e88d7c5de9e2ee4d38e10624442f938d6509b90c788bb3c799e9ec6c\" returns successfully"
Nov 16 09:01:14 pz-k8s-node-master kubelet[137954]: E1116 09:01:14.909765  137954 kubelet.go:1701] "Failed creating a mirror pod for" err="Post \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods\": dial tcp 10.23.1.164:6443: connect: connection refused" pod="kube-system/kube-apiserver-pz-k8s-node-master"
Nov 16 09:01:15 pz-k8s-node-master kubelet[137954]: I1116 09:01:15.365616  137954 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac"
Nov 16 09:01:15 pz-k8s-node-master kubelet[137954]: I1116 09:01:15.368956  137954 status_manager.go:601] "Failed to get status for pod" podUID=e8784cf35b1a780ce7c98605ecf1f1fb pod="kube-system/kube-apiserver-pz-k8s-node-master" err="Get \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-pz-k8s-node-master\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:15 pz-k8s-node-master kubelet[137954]: I1116 09:01:15.369045  137954 scope.go:110] "RemoveContainer" containerID="63a731fcfa96a7e807e3290c0ec6f9934b132f16e1b55bb1b6cd2228708006d0"
Nov 16 09:01:15 pz-k8s-node-master kubelet[137954]: E1116 09:01:15.370130  137954 kubelet.go:1701] "Failed creating a mirror pod for" err="Post \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods\": dial tcp 10.23.1.164:6443: connect: connection refused" pod="kube-system/kube-apiserver-pz-k8s-node-master"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.371383050-06:00" level=info msg="StopPodSandbox for \"833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac\""
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.371606928-06:00" level=info msg="Container to stop \"95446d21f26117b93d11a1252a0167a279980274bb6b196b629bed12fc52157e\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.372661106-06:00" level=info msg="TearDown network for sandbox \"833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac\" successfully"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.372772261-06:00" level=info msg="StopPodSandbox for \"833a6357d0a7e6ef2771fcb5fe9f6988ed312de5ab26c356243ec60a859afaac\" returns successfully"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.374798557-06:00" level=info msg="StopPodSandbox for \"91705275e88d7c5de9e2ee4d38e10624442f938d6509b90c788bb3c799e9ec6c\""
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.375014549-06:00" level=info msg="Container to stop \"63a731fcfa96a7e807e3290c0ec6f9934b132f16e1b55bb1b6cd2228708006d0\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.376594183-06:00" level=info msg="TearDown network for sandbox \"91705275e88d7c5de9e2ee4d38e10624442f938d6509b90c788bb3c799e9ec6c\" successfully"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.376762150-06:00" level=info msg="StopPodSandbox for \"91705275e88d7c5de9e2ee4d38e10624442f938d6509b90c788bb3c799e9ec6c\" returns successfully"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.379082079-06:00" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-apiserver-pz-k8s-node-master,Uid:e8784cf35b1a780ce7c98605ecf1f1fb,Namespace:kube-system,Attempt:2,}"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.381599088-06:00" level=info msg="RemoveContainer for \"63a731fcfa96a7e807e3290c0ec6f9934b132f16e1b55bb1b6cd2228708006d0\""
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.412829229-06:00" level=info msg="RemoveContainer for \"63a731fcfa96a7e807e3290c0ec6f9934b132f16e1b55bb1b6cd2228708006d0\" returns successfully"
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.513696567-06:00" level=info msg="starting signal loop" namespace=k8s.io path=/run/containerd/io.containerd.runtime.v2.task/k8s.io/2c80958a51574be951a0d51f241f554ad1d07591f6ed45b976630ab6eac65362 pid=138936
Nov 16 09:01:15 pz-k8s-node-master containerd[123432]: time="2021-11-16T09:01:15.793499364-06:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-apiserver-pz-k8s-node-master,Uid:e8784cf35b1a780ce7c98605ecf1f1fb,Namespace:kube-system,Attempt:2,} returns sandbox id \"2c80958a51574be951a0d51f241f554ad1d07591f6ed45b976630ab6eac65362\""
Nov 16 09:01:15 pz-k8s-node-master kubelet[137954]: E1116 09:01:15.800993  137954 pod_workers.go:836] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-apiserver pod=kube-apiserver-pz-k8s-node-master_kube-system(e8784cf35b1a780ce7c98605ecf1f1fb)\"" pod="kube-system/kube-apiserver-pz-k8s-node-master" podUID=e8784cf35b1a780ce7c98605ecf1f1fb
Nov 16 09:01:16 pz-k8s-node-master kubelet[137954]: E1116 09:01:16.379462  137954 kubelet.go:1701] "Failed creating a mirror pod for" err="Post \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods\": dial tcp 10.23.1.164:6443: connect: connection refused" pod="kube-system/kube-apiserver-pz-k8s-node-master"
Nov 16 09:01:16 pz-k8s-node-master kubelet[137954]: I1116 09:01:16.379739  137954 scope.go:110] "RemoveContainer" containerID="95446d21f26117b93d11a1252a0167a279980274bb6b196b629bed12fc52157e"
Nov 16 09:01:16 pz-k8s-node-master kubelet[137954]: I1116 09:01:16.382880  137954 status_manager.go:601] "Failed to get status for pod" podUID=e8784cf35b1a780ce7c98605ecf1f1fb pod="kube-system/kube-apiserver-pz-k8s-node-master" err="Get \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-pz-k8s-node-master\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:16 pz-k8s-node-master kubelet[137954]: E1116 09:01:16.384116  137954 pod_workers.go:836] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-apiserver pod=kube-apiserver-pz-k8s-node-master_kube-system(e8784cf35b1a780ce7c98605ecf1f1fb)\"" pod="kube-system/kube-apiserver-pz-k8s-node-master" podUID=e8784cf35b1a780ce7c98605ecf1f1fb
Nov 16 09:01:17 pz-k8s-node-master kubelet[137954]: E1116 09:01:17.383644  137954 kubelet.go:1701] "Failed creating a mirror pod for" err="Post \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods\": dial tcp 10.23.1.164:6443: connect: connection refused" pod="kube-system/kube-apiserver-pz-k8s-node-master"
Nov 16 09:01:17 pz-k8s-node-master kubelet[137954]: I1116 09:01:17.383932  137954 scope.go:110] "RemoveContainer" containerID="95446d21f26117b93d11a1252a0167a279980274bb6b196b629bed12fc52157e"
Nov 16 09:01:17 pz-k8s-node-master kubelet[137954]: E1116 09:01:17.387155  137954 pod_workers.go:836] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-apiserver pod=kube-apiserver-pz-k8s-node-master_kube-system(e8784cf35b1a780ce7c98605ecf1f1fb)\"" pod="kube-system/kube-apiserver-pz-k8s-node-master" podUID=e8784cf35b1a780ce7c98605ecf1f1fb
Nov 16 09:01:17 pz-k8s-node-master kubelet[137954]: E1116 09:01:17.429157  137954 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://10.23.1.164:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/pz-k8s-node-master?timeout=10s": dial tcp 10.23.1.164:6443: connect: connection refused
Nov 16 09:01:17 pz-k8s-node-master kubelet[137954]: I1116 09:01:17.746328  137954 status_manager.go:601] "Failed to get status for pod" podUID=873dc75a62035fcd7566e2cffd107f6b pod="kube-system/kube-scheduler-pz-k8s-node-master" err="Get \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods/kube-scheduler-pz-k8s-node-master\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:17 pz-k8s-node-master kubelet[137954]: I1116 09:01:17.747313  137954 status_manager.go:601] "Failed to get status for pod" podUID=459c55e3-d1d3-4b1d-b750-2d3342ee3796 pod="kube-system/kube-proxy-q9lq5" err="Get \"https://10.23.1.164:6443/api/v1/namespaces/kube-system/pods/kube-proxy-q9lq5\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:18 pz-k8s-node-master kubelet[137954]: E1116 09:01:18.282203  137954 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-scheduler-pz-k8s-node-master.16b80ee18141dd18", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-scheduler-pz-k8s-node-master", UID:"873dc75a62035fcd7566e2cffd107f6b", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{kube-scheduler}"}, Reason:"Unhealthy", Message:"Startup probe failed: Get \"https://127.0.0.1:10259/healthz\": net/http: TLS handshake timeout", Source:v1.EventSource{Component:"kubelet", Host:"pz-k8s-node-master"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc05d0fc28ecaf918, ext:34592649479, loc:(*time.Location)(0x77a8680)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc05d0fc28ecaf918, ext:34592649479, loc:(*time.Location)(0x77a8680)}}, Count:1, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://10.23.1.164:6443/api/v1/namespaces/kube-system/events": dial tcp 10.23.1.164:6443: connect: connection refused'(may retry after sleeping)
Nov 16 09:01:20 pz-k8s-node-master kubelet[137954]: E1116 09:01:20.556794  137954 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"pz-k8s-node-master\": Get \"https://10.23.1.164:6443/api/v1/nodes/pz-k8s-node-master?resourceVersion=0&timeout=10s\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:20 pz-k8s-node-master kubelet[137954]: E1116 09:01:20.559042  137954 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"pz-k8s-node-master\": Get \"https://10.23.1.164:6443/api/v1/nodes/pz-k8s-node-master?timeout=10s\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:20 pz-k8s-node-master kubelet[137954]: E1116 09:01:20.560144  137954 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"pz-k8s-node-master\": Get \"https://10.23.1.164:6443/api/v1/nodes/pz-k8s-node-master?timeout=10s\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:20 pz-k8s-node-master kubelet[137954]: E1116 09:01:20.561246  137954 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"pz-k8s-node-master\": Get \"https://10.23.1.164:6443/api/v1/nodes/pz-k8s-node-master?timeout=10s\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:20 pz-k8s-node-master kubelet[137954]: E1116 09:01:20.562195  137954 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"pz-k8s-node-master\": Get \"https://10.23.1.164:6443/api/v1/nodes/pz-k8s-node-master?timeout=10s\": dial tcp 10.23.1.164:6443: connect: connection refused"
Nov 16 09:01:20 pz-k8s-node-master kubelet[137954]: E1116 09:01:20.562274  137954 kubelet_node_status.go:457] "Unable to update node status" err="update node status exceeds retry count"

The kube-proxy will just Error out:

NAME                                READY   STATUS    RESTARTS   AGE
coredns-78fcd69978-gs4t6            0/1     Pending   0          2m16s
coredns-78fcd69978-sdvdg            0/1     Pending   0          2m16s
kube-proxy-q9lq5                    0/1     Error     0          2m16s
kube-scheduler-pz-k8s-node-master   0/1     Pending   0          91s

These are the kube-proxy events/logs:

Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       3m40s                  default-scheduler  Successfully assigned kube-system/kube-proxy-q9lq5 to pz-k8s-node-master
  Warning  FailedMount     3m28s (x5 over 3m36s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-2f5jg" : failed to fetch token: Post "https://10.23.1.164:6443/api/v1/namespaces/kube-system/serviceaccounts/kube-proxy/token": dial tcp 10.23.1.164:6443: connect: connection refused
  Warning  FailedMount     3m10s                  kubelet            MountVolume.SetUp failed for volume "kube-api-access-2f5jg" : failed to fetch token: Post "https://10.23.1.164:6443/api/v1/namespaces/kube-system/serviceaccounts/kube-proxy/token": net/http: TLS handshake timeout
  Normal   Pulled          2m51s (x2 over 2m53s)  kubelet            Container image "k8s.gcr.io/kube-proxy:v1.22.3" already present on machine
  Normal   Created         2m51s (x2 over 2m53s)  kubelet            Created container kube-proxy
  Normal   Started         2m50s (x2 over 2m53s)  kubelet            Started container kube-proxy
  Normal   Killing         88s (x2 over 2m52s)    kubelet            Stopping container kube-proxy
  Normal   SandboxChanged  87s (x2 over 2m51s)    kubelet            Pod sandbox changed, it will be killed and re-created.
  Warning  BackOff         86s                    kubelet            Back-off restarting failed container
  
[root@pz-k8s-node-master:/etc]:kubectl -n kube-system logs kube-proxy-q9lq5
I1116 15:03:57.589495       1 node.go:172] Successfully retrieved node IP: 10.23.1.164
I1116 15:03:57.589757       1 server_others.go:140] Detected node IP 10.23.1.164
W1116 15:03:57.589843       1 server_others.go:565] Unknown proxy mode "", assuming iptables proxy
I1116 15:03:57.738149       1 server_others.go:206] kube-proxy running in dual-stack mode, IPv4-primary
I1116 15:03:57.738268       1 server_others.go:212] Using iptables Proxier.
I1116 15:03:57.738323       1 server_others.go:219] creating dualStackProxier for iptables.
W1116 15:03:57.738386       1 server_others.go:479] detect-local-mode set to ClusterCIDR, but no cluster CIDR defined
W1116 15:03:57.738412       1 server_others.go:528] detect-local-mode: ClusterCIDR , defaulting to no-op detect-local
I1116 15:03:57.740636       1 server.go:649] Version: v1.22.3
I1116 15:03:57.773304       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1116 15:03:57.775772       1 config.go:315] Starting service config controller
I1116 15:03:57.776392       1 shared_informer.go:240] Waiting for caches to sync for service config
I1116 15:03:57.777805       1 config.go:224] Starting endpoint slice config controller
I1116 15:03:57.779140       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I1116 15:03:57.877415       1 shared_informer.go:247] Caches are synced for service config
I1116 15:03:57.884960       1 shared_informer.go:247] Caches are synced for endpoint slice config

After about 3 minutes, the other services start crashing:

[root@pz-k8s-node-master:/etc]:kubectl -n kube-system get pods
NAME                                         READY   STATUS             RESTARTS       AGE
coredns-78fcd69978-gs4t6                     0/1     Pending            0              4m21s
coredns-78fcd69978-sdvdg                     0/1     Pending            0              4m21s
kube-apiserver-pz-k8s-node-master            1/1     Running            3 (110s ago)   42s
kube-controller-manager-pz-k8s-node-master   0/1     CrashLoopBackOff   3 (46s ago)    50s
kube-proxy-q9lq5                             1/1     Running            3 (49s ago)    4m21s
kube-scheduler-pz-k8s-node-master            1/1     Running            2 (111s ago)   3m36s

Here are the logs from kube-controller-manager:

[root@pz-k8s-node-master:/etc]:kubectl -n kube-system logs kube-controller-manager-pz-k8s-node-master
Flag --port has been deprecated, This flag has no effect now and will be removed in v1.24.
I1116 15:04:24.815115       1 serving.go:347] Generated self-signed cert in-memory
I1116 15:04:28.144401       1 controllermanager.go:186] Version: v1.22.3
I1116 15:04:28.161880       1 secure_serving.go:200] Serving securely on 127.0.0.1:10257
I1116 15:04:28.168418       1 dynamic_cafile_content.go:155] "Starting controller" name="request-header::/etc/kubernetes/pki/front-proxy-ca.crt"
I1116 15:04:28.168936       1 dynamic_cafile_content.go:155] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
I1116 15:04:28.169679       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I1116 15:04:28.173155       1 leaderelection.go:248] attempting to acquire leader lease kube-system/kube-controller-manager...
Events:
  Type     Reason          Age                  From     Message
  ----     ------          ----                 ----     -------
  Normal   Pulled          5m38s                kubelet  Container image "k8s.gcr.io/kube-controller-manager:v1.22.3" already present on machine
  Normal   Created         5m38s                kubelet  Created container kube-controller-manager
  Warning  Unhealthy       96s (x2 over 106s)   kubelet  Liveness probe failed: Get "https://127.0.0.1:10257/healthz": dial tcp 127.0.0.1:10257: connect: connection refused
  Normal   Pulled          86s (x3 over 4m46s)  kubelet  Container image "k8s.gcr.io/kube-controller-manager:v1.22.3" already present on machine
  Normal   Created         86s (x3 over 4m45s)  kubelet  Created container kube-controller-manager
  Normal   Started         85s (x3 over 4m45s)  kubelet  Started container kube-controller-manager
  Normal   Killing         77s (x3 over 4m48s)  kubelet  Stopping container kube-controller-manager
  Normal   SandboxChanged  76s (x3 over 4m47s)  kubelet  Pod sandbox changed, it will be killed and re-created.
  Warning  BackOff         54s (x8 over 2m21s)  kubelet  Back-off restarting failed container

So from looking at other issues filed, I think this message is sticking out:

Pod sandbox changed, it will be killed and re-created.

The specs of my machines are an Intel® Atom™ CPU D525 @ 1.80GHz with 4GB RAM. The CPU is two separage cores (no HT). I do notice load averages between 2-3 when master is being loaded.

What did you expect to happen?

I should have had a working cluster with no crashing kube-system pods.

How can we reproduce it (as minimally and precisely as possible)?

The only way would be to try it on a lower speced box. The only published requirement is 2 cores, but it doesn’t mention anything about the speed of those cores.

Anything else we need to know?

No response

Kubernetes version

Tested with 1.20.0, but same behavior with higher patches.

Cloud provider

Bare Metal

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ uname -a
Linux pz-k8s-node-master 5.10.0-8-amd64 #1 SMP Debian 5.10.46-4 (2021-08-03) x86_64 GNU/Linux

Install tools

kubeadm

Container runtime (CRI) and and version (if applicable)

containerd

Related plugins (CNI, CSI, …) and versions (if applicable)

kube-router (latest as of 11.16.2021)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 31 (11 by maintainers)

Most upvoted comments

hi All, i want to share this, if any of the above mentioned didnt fix your problem then do the following:

apparently, the new Linux system which by default use cgroup v2 such as Arch linux, bullseye, ubuntu 21 , etc causing problems when deploying Kubernetes cluster using kubeadm init

the workaround is to turn cgroup v1 back on , by using the following grub parameter systemd.unified_cgroup_hierarchy=0

add the above variable into GRUB_CMDLINE_LINUX_DEFAULT and then run grub-mkconfig -o /boot/grub/grub.cfg (on arch linux) on ubuntu you must go to /etc/default/grub.d and find GRUB_CMDLINE_LINUX_DEFAULT within the file that has higher number in my case 50-cloudimg-settings.cfg so the file would look like this: (NOTE: i only add the last parameter) GRUB_CMDLINE_LINUX=“console=tty1 console=ttyS0 earlyprintk=ttyS0 systemd.unified_cgroup_hierarchy=1”

then run update-grub (debian/ubuntu) for different linux distro please find from doc this will give you workaround for containerd but for crio for some reason still having issues on Arch linux

try and let me know

@IgorOhrimenko Thanks, I followed your steps and it finally fixed the issue. Based on your steps, I created ansible playbook if anybody also needs one.

---
- hosts: all
  become: true
  tasks:
  - name: disable swap
    shell: |
      sudo swapoff -a
      sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
  
  - name: Add modules conf for k8s
    blockinfile:
      path: "/etc/modules-load.d/k8s.conf"
      block: |
            overlay
            br_netfilter
      create: yes

  - name: Add modules
    community.general.modprobe:
      name: "{{ item }}"
      state: present
    with_items:
      - overlay
      - br_netfilter

  - name: Set sysctl file and reload
    ansible.posix.sysctl:
      name: "{{ item }}"
      value: '1'
      state: present
      reload: yes
    with_items:
      - net.ipv4.ip_forward
      - net.bridge.bridge-nf-call-iptables
      - net.bridge.bridge-nf-call-ip6tables

  - name: Download containerd package
    get_url:
      url: https://github.com/containerd/containerd/releases/download/v1.6.4/containerd-1.6.4-linux-amd64.tar.gz
      dest: /home/vagrant/containerd-1.6.4-linux-amd64.tar.gz
      mode: '0777'
  
  - name: Extract containerd
    ansible.builtin.unarchive:
      src: /home/vagrant/containerd-1.6.4-linux-amd64.tar.gz
      dest: /usr/local
      remote_src: yes
  
  - name: Create containerd config.toml
    lineinfile:
      line: ""
      path: "/etc/containerd/config.toml"
      create: yes

  - name: Populate containerd config
    shell: containerd config default | tee /etc/containerd/config.toml
  
  - name: Set SystemdCgroup to true in containerd config
    replace:
      path: /etc/containerd/config.toml
      regexp: "SystemdCgroup = false"
      replace: "SystemdCgroup = true"

  - name: Download runc package
    get_url:
      url: https://github.com/opencontainers/runc/releases/download/v1.1.2/runc.amd64
      dest: /home/vagrant/runc.amd64
      mode: '0777'
  
  - name: Install runc
    shell: install -m 755 /home/vagrant/runc.amd64 /usr/local/sbin/runc

  - name: Download cni plugin
    get_url:
      url: https://github.com/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-amd64-v1.1.1.tgz
      dest: /home/vagrant/cni-plugins-linux-amd64-v1.1.1.tgz
      mode: '0777'

  - name: Create /opt/cni/bin dir
    file:
      path: /opt/cni/bin
      state: directory

  - name: Extract cni plugin
    ansible.builtin.unarchive:
      src: /home/vagrant/cni-plugins-linux-amd64-v1.1.1.tgz
      dest: /opt/cni/bin
      remote_src: yes

  - name: Download containerd service
    get_url:
      url: https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
      dest: /etc/systemd/system/containerd.service
      mode: '0777'
  
  - name: Issue daemon-reload to pick up config changes, restart containerd service
    ansible.builtin.systemd:
      state: restarted
      daemon_reload: yes
      name: containerd
  
  - name: Download flannel
    get_url:
      url: https://github.com/flannel-io/flannel/releases/download/v0.18.0/flannel-v0.18.0-linux-amd64.tar.gz
      dest: /home/vagrant/flannel-v0.18.0-linux-amd64.tar.gz
      mode: '0777'

  - name: Create /opt/bin dir
    file:
      path: /opt/bin
      state: directory

  - name: Extract flannel
    ansible.builtin.unarchive:
      src: /home/vagrant/flannel-v0.18.0-linux-amd64.tar.gz
      dest: /opt/bin
      remote_src: yes
    
  - name: Install packages that allow apt to be used over HTTPS
    apt:
      name: "{{ packages }}"
      state: present
      update_cache: yes
    vars:
      packages:
      - apt-transport-https
      - ca-certificates
      - curl

  - name: Add an apt signing key for Kubernetes
    apt_key:
      url: https://packages.cloud.google.com/apt/doc/apt-key.gpg
      state: present

  - name: Adding apt repository for Kubernetes
    apt_repository:
      repo: deb https://apt.kubernetes.io/ kubernetes-xenial main
      state: present
      filename: kubernetes.list

  - name: Install Kubernetes binaries
    apt: 
      name: "{{ packages }}"
      state: present
      update_cache: yes
    vars:
      packages:
        - kubelet 
        - kubeadm   
        - kubectl

  - name: Initialize the cluster
    shell: kubeadm init --pod-network-cidr=10.244.0.0/16

  - name: Create /home/vagrant/.kube dir
    file:
      path: /home/vagrant/.kube
      state: directory

  - name: Copies admin.conf to user's kube config
    copy:
      src: /etc/kubernetes/admin.conf
      dest: /home/vagrant/.kube/config
      remote_src: yes
      owner: vagrant

  - name:  Install flannel for k8s
    become: false
    command: kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml

I spent a bit more time on this issue and found the line that should be added to /etc/containerd/config.toml referenced here:

https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd

@jcpuzic, thanks, it’s work for me. Debian 11 bullseye 5.10.0-14-amd64 5.10.113-1 (2022-04-29) Kubernetes 1.24.1

There are steps for installing work well kubernetes cluster
swapoff --all

cat <<EOF | tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF
sysctl --system

wget https://github.com/containerd/containerd/releases/download/v1.6.4/containerd-1.6.4-linux-amd64.tar.gz
tar Cxzvf /usr/local containerd-1.6.4-linux-amd64.tar.gz

mkdir /etc/containerd/
containerd config default > /etc/containerd/config.toml
sed -i 's|SystemdCgroup = false|SystemdCgroup = true|' /etc/containerd/config.toml

wget https://github.com/opencontainers/runc/releases/download/v1.1.2/runc.amd64
install -m 755 runc.amd64 /usr/local/sbin/runc

wget https://github.com/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-amd64-v1.1.1.tgz
mkdir --parents /opt/cni/bin
tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.1.1.tgz

wget https://raw.githubusercontent.com/containerd/containerd/main/containerd.service --output-document=/etc/systemd/system/containerd.service
systemctl daemon-reload
systemctl enable --now containerd

wget https://github.com/flannel-io/flannel/releases/download/v0.18.0/flannel-v0.18.0-linux-amd64.tar.gz
mkdir /opt/bin
tar --directory=/opt/bin --extract --gzip --file=flannel-v0.18.0-linux-amd64.tar.gz flanneld

apt-get update && apt-get install -y apt-transport-https ca-certificates curl
curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | tee /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl

kubeadm init --pod-network-cidr=10.244.0.0/16
echo 'KUBECONFIG=/etc/kubernetes/admin.conf' >> /etc/environment
export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
kubectl get pods --all-namespaces

I spent a bit more time on this issue and found the line that should be added to /etc/containerd/config.toml referenced here:

https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd

was not in the correct place. Once I moved the line my pods stayed up. As a note, I did confirm Debian Bullseye is cgroup v2 enabled by default:

https://www.debian.org/releases/bullseye/amd64/release-notes/ch-whats-new.en.html#cgroupv2

I also found adding the ‘-v’ flag to Kubelet helped in finding potential issues or errors.