kubernetes: e2e flake: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: failed to write

Looks like we just got a spike of a new run failure message in master: https://storage.googleapis.com/k8s-triage/index.html?pr=1&text=unable to apply cgroup configuration&xjob=1-2

Seen in https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/109178/pull-kubernetes-conformance-kind-ga-only-parallel/1509397620936675328

s: "pod \"oidc-discovery-validator\" failed with status: {Phase:Failed Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-03-31 05:35:20 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-03-31 05:35:20 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [oidc-discovery-validator]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-03-31 05:35:20 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [oidc-discovery-validator]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-03-31 05:35:20 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:172.18.0.2 HostIPs:[{IP:172.18.0.2}] PodIP:10.244.1.130 PodIPs:[{IP:10.244.1.130}] StartTime:2022-03-31 05:35:20 +0000 UTC InitContainerStatuses:[] ContainerStatuses:[{Name:oidc-discovery-validator State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:128,Signal:0,Reason:StartError,Message:failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: failed to write 36721: write /sys/fs/cgroup/rdma/kubelet/kubepods/besteffort/pod4c5127ae-797f-4b89-9aa9-7f66226768cd/61b3e1f7568f23bb1503c2309e9e254c1ac0103d0de059958f9555ff6548b5c8/cgroup.procs: no such device: unknown,StartedAt:1970-01-01 00:00:00 +0000 UTC,FinishedAt:2022-03-31 05:35:21 +0000 UTC,ContainerID:containerd://61b3e1f7568f23bb1503c2309e9e254c1ac0103d0de059958f9555ff6548b5c8,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:k8s.gcr.io/e2e-test-images/agnhost:2.36 ImageID:k8s.gcr.io/e2e-test-images/agnhost@sha256:f5241226198f5a54d22540acf2b3933ea0f49458f90c51fc75833d0c428687b8 ContainerID:containerd://61b3e1f7568f23bb1503c2309e9e254c1ac0103d0de059958f9555ff6548b5c8 Started:0xc000d223ea}] QOSClass:BestEffort EphemeralContainerStatuses:[]}", } pod "oidc-discovery-validator" failed with status: {Phase:Failed Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-03-31 05:35:20 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-03-31 05:35:20 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [oidc-discovery-validator]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-03-31 05:35:20 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [oidc-discovery-validator]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-03-31 05:35:20 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:172.18.0.2 HostIPs:[{IP:172.18.0.2}] PodIP:10.244.1.130 PodIPs:[{IP:10.244.1.130}] StartTime:2022-03-31 05:35:20 +0000 UTC InitContainerStatuses:[] ContainerStatuses:[{Name:oidc-discovery-validator State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:128,Signal:0,Reason:StartError,Message:failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: failed to write 36721: write /sys/fs/cgroup/rdma/kubelet/kubepods/besteffort/pod4c5127ae-797f-4b89-9aa9-7f66226768cd/61b3e1f7568f23bb1503c2309e9e254c1ac0103d0de059958f9555ff6548b5c8/cgroup.procs: no such device: unknown,StartedAt:1970-01-01 00:00:00 +0000 UTC,FinishedAt:2022-03-31 05:35:21 +0000 UTC,ContainerID:containerd://61b3e1f7568f23bb1503c2309e9e254c1ac0103d0de059958f9555ff6548b5c8,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:k8s.gcr.io/e2e-test-images/agnhost:2.36 ImageID:k8s.gcr.io/e2e-test-images/agnhost@sha256:f5241226198f5a54d22540acf2b3933ea0f49458f90c51fc75833d0c428687b8 ContainerID:containerd://61b3e1f7568f23bb1503c2309e9e254c1ac0103d0de059958f9555ff6548b5c8 Started:0xc000d223ea}] QOSClass:BestEffort EphemeralContainerStatuses:[]}

/milestone v1.24 /sig node

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 52 (52 by maintainers)

Commits related to this issue

Most upvoted comments

One thing that is important to mention.

This issue is not related to vendor bump of runc 1.1, it is related to switching to use runc 1.1 binary (brought together with containerd 1.6) in CI. In other words, this is not related to kubernetes v1.24, and older versions (and newer versions) would fail the same way.

Proof is e.g. this CI job: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/104907/pull-kubernetes-conformance-kind-ga-only-parallel/1507773223985483776. It uses k/k master at revision c00975370a5bf81328dc56396ee05edc7306e238 (see https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/104907/pull-kubernetes-conformance-kind-ga-only-parallel/1507773223985483776/clone-log.txt), which has runc 1.0.3 in go.mod. Now, it says containerd 1.6.2 (in https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/104907/pull-kubernetes-conformance-kind-ga-only-parallel/1507773223985483776/artifacts/logs/kind-worker/containerd.log), which means the runc binary used in 1.1.x. Finally, if we take a look at kubelet.log (e.g. https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/104907/pull-kubernetes-conformance-kind-ga-only-parallel/1507773223985483776/artifacts/logs/kind-worker/kubelet.log), we can see a bunch of warnings like this:

Mar 26 17:56:08 kind-worker kubelet[259]: time=“2022-03-26T17:56:08Z” level=warning msg=“Failed to remove cgroup (will retry)” error=“rmdir /sys/fs/cgroup/rdma/kubelet/kubepods/podfb6a5428-0913-4745-8be0-73f8a5b90abf/3db9ab73c75925760aa294020dc8a77754c68b83f8745c983c111ea4ca5377b2: device or resource busy”

(interestingly, there are no issues with unified … must be added to inside the container later by some change in kind I guess).

Once runc 1.1 is used inside the container created using runc 1.0, we have this issue with inability to remove rdma and unified cgroups (which is noted in logs but is not fatal so not resulting in any failures), which eventually can lead to the issue described here – inability to put a process into an RDMA cgroup (because, I guess, we hit some kind of a kernel limit on a number of cgroups).

HTH

@kolyshkin how does one typically disable these two cgroups? /sys/fs/cgroup/rdma and /sys/fs/cgroup/unified? are there any examples of doing something similar in an entrypoint?

I look into what is done for cgroup v1 in entrypoint, and I see that

  1. current_cgroup is determined from the cpu contrroller (https://github.com/kubernetes-sigs/kind/blob/ee7ee0646385d78d2f02c64665990386507c5709/images/base/files/usr/local/bin/entrypoint#L217), assuming all the other controllers have a similar paths.
  2. cgroup_subsystems list is determined from the above current_cgroup path (https://github.com/kubernetes-sigs/kind/blob/ee7ee0646385d78d2f02c64665990386507c5709/images/base/files/usr/local/bin/entrypoint#L219)
  3. Something is done with every subsystem from the cgroup_subsystems.

I suspect that cgroup_subsystems list does not include neither rdma nor unified, because Docker is not aware of these subsystems and therefore does not create special mounts for it. Here’s how it looks on Ubuntu 21.10 (used by kind tests) with default docker.io installed from Ubuntu repos:

kir@ubu2110:~$ sudo docker run -it --rm alpine cat /proc/self/cgroup
13:pids:/docker/c28ce22889fe36135d51fec8a1489ff92529faf1bf799832658a22e8abdaf3f4
12:net_cls,net_prio:/docker/c28ce22889fe36135d51fec8a1489ff92529faf1bf799832658a22e8abdaf3f4
11:hugetlb:/docker/c28ce22889fe36135d51fec8a1489ff92529faf1bf799832658a22e8abdaf3f4
10:misc:/
9:freezer:/docker/c28ce22889fe36135d51fec8a1489ff92529faf1bf799832658a22e8abdaf3f4
8:devices:/docker/c28ce22889fe36135d51fec8a1489ff92529faf1bf799832658a22e8abdaf3f4
7:cpu,cpuacct:/docker/c28ce22889fe36135d51fec8a1489ff92529faf1bf799832658a22e8abdaf3f4
6:perf_event:/docker/c28ce22889fe36135d51fec8a1489ff92529faf1bf799832658a22e8abdaf3f4
5:memory:/docker/c28ce22889fe36135d51fec8a1489ff92529faf1bf799832658a22e8abdaf3f4
4:blkio:/docker/c28ce22889fe36135d51fec8a1489ff92529faf1bf799832658a22e8abdaf3f4
3:rdma:/
2:cpuset:/docker/c28ce22889fe36135d51fec8a1489ff92529faf1bf799832658a22e8abdaf3f4
1:name=systemd:/docker/c28ce22889fe36135d51fec8a1489ff92529faf1bf799832658a22e8abdaf3f4
0::/system.slice/containerd.service

As you can see, no subdirectory is created for rdma controller (and thus step 2 above does not result in having rdma in cgroup_subsystems. It’s more complicated for unified (the one that starts with 0::), but it is not mounted either.

Therefore, no setup is done for there cgroups.

The proper fix to this situation is to make Docker rdma- and unified-aware, which is definitely out of scope for this release (and/or project).

My guess is we can try something simple like

umount /sys/fs/cgroup/rdma || true
rmdir /sys/fs/cgroup/rdma || true
umount /sys/fs/cgroup/unified || true
rmdir /sys/fs/cgroup/unified || true

slightly catching up on this thread

Apparently runc comes from containerd.io package from Docker.com repo (https://download.docker.com/linux/ubuntu/), and containerd.io 1.5.11-1 comes with runc 1.0.3.

Yes, containerd.io from download.docker.com currently bundles runc, and for that defaults to using the version from scripts/setup/runc-version in the containerd repository. Technically we can override this version when building the packages but (I’ll spare the details), it’s a bit complicated to do this without a corresponding containerd release.

The “good news” is that I have opened backports for both the containerd v1.6 and v1.5 release branches to bring runc to version v1.1.1 (https://github.com/containerd/containerd/pull/6759, https://github.com/containerd/containerd/pull/6770). Those backports have been merged, but not yet released in a containerd release. Once that happens, we will publish updated packages of containerd.io on download.docker.com.

As an extra note; we currently publish packages of containerd v1.5.x, but once the next (v22.xx) release of Docker arrives, will be switching to containerd v1.6.x (a draft PR to verify that Docker 20.10 also works with that version can be found here; https://github.com/moby/moby/pull/43433, and a similar PR to verify against containerd v1.5 with runc v1.1.1 in https://github.com/moby/moby/pull/43433)

Forgive my ignorance, but does this go beyond test setup and mean consumers of 1.24 need to ensure they are running runc 1.1.x?

No, I believe we’re still compatible with runc 1.0. The only case is, if someone wants to run kubernetes inside a container (like we do in KIND tests here), they have to ensure that the runtime outside of that container is not older than the runtime inside the container.

Now that we’ve narrowed this down to an issue with kind/our test infrastructure setup

Forgive my ignorance, but does this go beyond test setup and mean consumers of 1.24 need to ensure they are running runc 1.1.x? Wondering if we need to include this in release notes or known issues (less as a release blocker and more as a potential action-required for users/admins)

Until the issue is understood, it should remain in the milestone