calico: Kubernetes cluster fails bringing up calico node

Our kubernetes clusters have been failing with the below error approximately from June 22nd 3am (IST) Using Calico v3.23.1 from https://docs.projectcalico.org/manifests/calico.yaml From calico node:

Warning  Failed     10s (x4 over 54s)  kubelet            Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/proc/1" to rootfs at "/initproc": change mount propagation through procfd: mount /initproc (via /proc/self/fd/6), flags: 0x44000: permission denied: unknown

From kubelet logs:

Jun 22 11:59:53 config1-1655881919-master kubelet[26537]: E0622 11:59:53.317459   26537 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"mount-bpffs\" with CrashLoopBackOff: \"back-off 2m40s restarting failed container=mount-bpffs pod=calico-node-465kz_kube-system(74560965-8be7-411c-9e00-a90dcd60f3d4)\"" pod="kube-system/calico-node-465kz" podUID=74560965-8be7-411c-9e00-a90dcd60f3d4

Expected Behavior

The calico node and controller pods must come up Running without error.

Current Behavior

The calico node pods are in Init:CrashLoopBackOff state and the calico controller is in ContainerCreating state.

Possible Solution

This seems to be caused by the below change made around the same time: https://github.com/projectcalico/calico/pull/6240

Steps to Reproduce (for bugs)

https://prow.ppc64le-cloud.org/view/gs/ppc64le-kubernetes/logs/periodic-kubernetes-containerd-conformance-test-ppc64le/1539370251861364736 Setup kubernetes cluster using the manifest of calico v3.23.1. We see the calico node pos failing with Error.

  • Calico version v3.23.1

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 20 (15 by maintainers)

Most upvoted comments

@mazdakn Yes, my issue for sure was caused by that change. I’ve just checked and everything is okay with all Ubuntu updates. Thus it was just coincidence – I deployed Ubuntu without updates via IaC after you had fixed the template that’s why everything was fine). By the way, as you could notice it also affected coredns pods and they were not up even after calico removal and coredns pods recreation)

@olekszhel can you check if your issue is fixed? the manifests are reverted back, so your issue should be fixed if caused by that change.