kind: [master] kube-proxy doesn't start up due to "apply caps: operation not permitted" error

What happened:

The kube-proxy pod doesn’t start up

What you expected to happen: Should start.

How to reproduce it (as minimally and precisely as possible):

Set up

make && make install

(cd $GOPATH/src/k8s.io/kubernetes && git checkout v1.20.2)
docker build -t kindest/base:latest ./images/base
kind build node-image --base-image kindest/base:latest --type=bazel

kind create cluster --image kindest/node:latest

Result:

$ kubectl get pods -A
NAMESPACE            NAME                                         READY   STATUS             RESTARTS   AGE
kube-system          coredns-74ff55c5b-4w8n4                      0/1     Pending            0          39m
kube-system          coredns-74ff55c5b-q5vks                      0/1     Pending            0          39m
kube-system          etcd-kind-control-plane                      1/1     Running            0          39m
kube-system          kindnet-fqvxr                                1/1     Running            2          39m
kube-system          kube-apiserver-kind-control-plane            1/1     Running            0          39m
kube-system          kube-controller-manager-kind-control-plane   1/1     Running            0          39m
kube-system          kube-proxy-p72nb                             0/1     CrashLoopBackOff   6          39m
kube-system          kube-scheduler-kind-control-plane            1/1     Running            0          39m
local-path-storage   local-path-provisioner-78776bfc44-9wjrv      0/1     Pending            0          39m

$ kubectl get pods -n kube-system -o json kube-proxy-p72nb  | jq .status.containerStatuses
[
  {
    "containerID": "containerd://c9040c90a09b74ea044963a3ac5e57c26c951dacf21d7c77b0eed6ec6ab724bd",
    "image": "k8s.gcr.io/kube-proxy:v1.20.2",
    "imageID": "sha256:16fb33527f2df347f565f645eeb5dc20a371b7c7361c24eb20f9cb5ff3cb67f7",
    "lastState": {
      "terminated": {
        "containerID": "containerd://c9040c90a09b74ea044963a3ac5e57c26c951dacf21d7c77b0eed6ec6ab724bd",
        "exitCode": 128,
        "finishedAt": "2021-02-06T07:05:11Z",
        "message": "failed to create containerd task: OCI runtime create failed: container_linux.go:367: starting container process caused: apply caps: operation not permitted: unknown",
        "reason": "StartError",
        "startedAt": "1970-01-01T00:00:00Z"
      }
    },
    "name": "kube-proxy",
    "ready": false,
    "restartCount": 6,
    "started": false,
    "state": {
      "waiting": {
        "message": "back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-p72nb_kube-system(87e3cf9d-b3dc-4f5e-b3c3-d6af45d2a5a6)",
        "reason": "CrashLoopBackOff"
      }
    }
  }
]

Anything else we need to know?:

Possibly related to https://github.com/containerd/containerd/pull/4717

Environment:

  • kind version: (use kind version): acac774fe522f16ca81eda69027e86ae30475584
  • Kubernetes version: (use kubectl version): v1.20.2
  • Docker version: (use docker info): moby/moby@3e0025e2fc
  • OS (e.g. from /etc/os-release): Ubuntu 20.10, kernel 5.8.0-41-generic

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (13 by maintainers)

Most upvoted comments

I’m not sure if that’s reasonable on the containerd side. We generally only ask for changes that have a legitimate use orthogonally to KIND.

Aside from kind, minikube and k3d will need this fix.

I’m thinking of modifying GetAllCapabilities() to parse the CapEff field in /proc/self/status. https://github.com/containerd/containerd/blob/v1.5.0-beta.1/oci/spec_opts.go#L784-L800

@aojea perhaps related to situations as described in https://github.com/moby/moby/issues/42906 (addressed by https://github.com/moby/moby/pull/42933 on “master” in the Moby repo). The runc fix was to account for capabilities that were supported by the kernel, but not yet known to runc’s code, but there were some other situations where (e.g. in a docker-in-docker setup) the capabilities were restricted in other ways. A similar “fix” is implemented for runc in https://github.com/opencontainers/runc/pull/3240, but yet to be discussed.

related discussion in the runtime spec; https://github.com/opencontainers/runtime-spec/issues/1071

so, for docker, this could be resolved once runc v1.0.0-rc93 is included in our containerd packages (must be tested with containerd 1.4.x), but would require the minimum “required” version in packaging to be raised (as older versions will no longer be compatible)