kind: cgroups misconfiguration

What happened: Failed to exec into a Pod with QOS defined when CPU manager is enabled. After checking cgroup configuration for the Pod, I see only c 136:* rwm is allowed.

What you expected to happen: I expected to be able to exec into the pod and get a shell and have the cgroup configuration set up correctly.

How to reproduce it (as minimally and precisely as possible):

  1. Create a cluster with the following config file:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry:5000"]
    endpoint = ["http://registry:5000"]
nodes:
- role: control-plane
- role: worker
  kubeadmConfigPatches:
  - |-
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        "feature-gates": "CPUManager=true"
        "cpu-manager-policy": "static"
        "kube-reserved": "cpu=500m"
        "system-reserved": "cpu=500m"
  extraMounts:
  - containerPath: /var/log/audit
    hostPath: /var/log/audit
    readOnly: true
  - containerPath: /dev/vfio/
    hostPath: /dev/vfio/
- role: worker
  kubeadmConfigPatches:
  - |-
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        "feature-gates": "CPUManager=true"
        "cpu-manager-policy": "static"
        "kube-reserved": "cpu=500m"
        "system-reserved": "cpu=500m"
  extraMounts:
  - containerPath: /var/log/audit
    hostPath: /var/log/audit
    readOnly: true
  - containerPath: /dev/vfio/
    hostPath: /dev/vfio/
kubeadmConfigPatches:
- |
  kind: ClusterConfiguration
  metadata:
    name: config
  etcd:
    local:
      dataDir: /tmp/kind-cluster-etcd
  1. Create a Pod with QOS:
# cat << EOF  | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: qos-demo
spec:                   
  containers:
  - name: qos-demo-ctr
    image: nginx
    resources:
      limits:
        memory: "200Mi"
        cpu: "700m"
      requests:
        memory: "200Mi"
        cpu: "700m"
EOF                
  1. Try to exec into the Pod:
# kubectl exec -it qos-demo bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "b81e55425e38f6a88c79fe45269a07e12573c9589410dc7f4a220e6d9012bce7": OCI runtime exec failed: exec failed: unable to start container process: open /dev/ptmx: operation not permitted: unknown

Any attempt to exec into other Pods fail from now on with the same reason.

Anything else we need to know?: SELinux is disabled.

This seems to be related to the change where kind uses systemd with 1.24/25 to manage cgroups.

This problem was not tested without CPU manager.

Environment:

  • kind version: (use kind version): 0.17.0
  • Kubernetes version: (use kubectl version): v1.25.3
  • Docker version: (use docker info): 20.10.21
  • OS (e.g. from /etc/os-release): RHEL 8.5

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 3
  • Comments: 25 (15 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks!

I believe we can close this now as fixed by the runc upgrade in v0.18+ images.

sorry this took so long 😅

Can confirm that this appears to have resolved my observed issues too. Thanks for the update!

I got time to look at this and I got the following. First I created the pod and then did kubectl exec -ti <pod_name> <command_that_doesnt_exist> in the loop. This allowed me to identify when the cgroup configuration gets ruined. It appears that once https://github.com/kubernetes/kubernetes/blob/64af1adaceba4db8d0efdb91453bce7073973771/pkg/kubelet/cm/cpumanager/cpu_manager.go#L513 is called all the devices are inaccessible for the container cgroup.

In case of kind I see (systemd log) cri-containerd-6cbc6412df51daf51dc9922233b5b9b3e510b08f4df8a2dc9e9f8536b70fd4b9.scope: No devices matched by device filter. whereas I don’t see this on the working setup. Before I dived into containerd/runc | systemd I tried the latest image as it has all these components up-to-date and I can’t reproduce the problem anymore.

So finally I just tried to update the runc in the old image and it seems to be working.

Note: Not confident but from a quick look I would say https://github.com/opencontainers/runc/commit/3b9582895b868561eb9260ac51b2ac6feb7798ae is the culprit. (This also explain the systemd log)

So the only question left is if we can update runc for 1.24> ? @BenTheElder

Whereas the same configuration on v0.18 does not have this even after a few minutes.

I’m attempting to minimally reproduce this being broken on v0.17 and confirm the runc upgrade solution in v0.18 without success so far:

I’m running this: $HOME/kind-test.yaml:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
kubeadmConfigPatches:
  - |-
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        "feature-gates": "CPUManager=true"
        "cpu-manager-policy": "static"
        "kube-reserved": "cpu=500m"
        "system-reserved": "cpu=500m"
kind create cluster --config=$HOME/kind-test.yaml

cat <<EOF  | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: qos-demo
spec:                   
  containers:
  - name: qos-demo-ctr
    image: nginx
    resources:
      limits:
        memory: "200Mi"
        cpu: "700m"
      requests:
        memory: "200Mi"
        cpu: "700m"
EOF

kubectl exec -it qos-demo -- bash

Which works fine.

Can you try the release / images in https://github.com/kubernetes-sigs/kind/releases/tag/v0.18.0?

We’re on runc 1.1.5 in the latest KIND release, which appears to contain opencontainers/runc@3b95828

Yes, that works just fine. Thank you. (Note for me, with new release there are new images)

I am seeing the same issue:

error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "fc487f320c6f37e3fa43ce201591370cee2e43567bf526ba3d15250955f84390": OCI runtime exec failed: exec failed: unable to start container process: open /dev/ptmx: operation not permitted: unknown

Here is some more info on my setup:

CPUManager is not enabled.

For CI with multiple k8s version in kind < 1.24 works fine, 1.24 fails with this error.

It seems to affect all devices. We see errors like this for jobs running inside affected pods:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.9/site-packages/ansible/executor/process/worker.py", line 148, in run
    sys.stdout = sys.stderr = open(os.devnull, 'w')
PermissionError: [Errno 1] Operation not permitted: '/dev/null'

stat looks normal

  File: /dev/null
  Size: 0         	Blocks: 0          IO Block: 4096   character special file
Device: 50007ah/5243002d	Inode: 6           Links: 1     Device type: 1,3
Access: (0666/crw-rw-rw-)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-12-22 06:26:20.656363572 +0000
Modify: 2022-12-22 06:26:20.656363572 +0000
Change: 2022-12-22 06:26:20.656363572 +0000

It doesn’t happen immediately. It only appears after around 20minutes after the cluster is started.

Hey @stmcginnis @BenTheElder , The minimum configuration is to enable CPU manager. I can reproduce with (docker info):

Server:
 Containers: 9
  Running: 3
  Paused: 0
  Stopped: 6
 Images: 23
 Server Version: 20.10.17
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 0197261a30bf81f1ee8e6a4dd2dea0ef95d67ccb
 runc version: v1.1.3-0-g6724737
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  selinux
  cgroupns
 Kernel Version: 5.18.17-200.fc36.x86_64
 Operating System: Fedora Linux 36 (Workstation Edition)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.2GiB
 Name: localhost.localdomain
 Docker Root Dir: /home/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false