sysbox: `procfd: operation not permitted` when running a Pod with `sysbox-runc`

I’ve installed Sysbox on a AKS following the instructions using the Sysbox daemonset (here). The error I am seeing is

create failed: time="2023-01-02T12:28:18Z" level=error msg="container_linux.go:425: starting container process caused: process_linux.go:607: container init caused: rootfs_linux.go:66: setting up rootfs mounts caused: rootfs_linux.go:1156: mounting \"sysfs\" to rootfs \"/var/lib/sysbox/shiftfs/f24948fc-9f27-43bd-8d8f-56947b850b7a\" at \"/sys\" caused: mount through procfd: operation not permitted"

The system info for the node is

│ System Info:                                                                                                                                                                                                             
│   Machine ID:                 20a5246312f9429094874ca4e41dbb97                                                                                                                                                           
│   System UUID:                91c263d5-db43-0946-aa45-e560c34470ac                                                                                                                                                       
│   Boot ID:                    72d05dd9-343d-4743-852c-59476bb8da42                                                                                                                                                      
│   Kernel Version:             5.4.0-1085-azure                                                                                                                                                                          
│   OS Image:                   Ubuntu 18.04.6 LTS                                                                                                                                                                        
│   Operating System:           linux                                                                                                                                                                                     
│   Architecture:               amd64                                                                                                                                                                                     
│   Container Runtime Version:  cri-o://1.22.4                                                                                                                                                                            
│   Kubelet Version:            v1.22.11                                                                                                                                                                                  
│   Kube-Proxy Version:         v1.22.11

Here’s the Pod state:

Pod
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2023-01-02T12:22:13Z"
  name: coder-niklasrosenstein-sysbox-test
  namespace: coder
  resourceVersion: "66569083"
  uid: 45132ce5-4b13-4766-9316-5ea47baf5eb5
spec:
  automountServiceAccountToken: true
  containers:
  - command:
    - sh
    - -c
    - "#!/usr/bin/env sh\nset -eux\n# Sleep for a good long while before exiting.\n#
      This is to allow folks to exec into a failed workspace and poke around to\n#
      troubleshoot.\nwaitonexit() {\n\techo \"=== Agent script exited with non-zero
      code. Sleeping 24h to preserve logs...\"\n\tsleep 86400\n}\ntrap waitonexit
      EXIT\nBINARY_DIR=$(mktemp -d -t coder.XXXXXX)\nBINARY_NAME=coder\nBINARY_URL=https://coder-dev.helsing-dev.ai/bin/coder-linux-amd64\ncd
      \"$BINARY_DIR\"\n# Attempt to download the coder agent.\n# This could fail for
      a number of reasons, many of which are likely transient.\n# So just keep trying!\nwhile
      :; do\n\t# Try a number of different download tools, as we don not know what
      we\n\t# will have available.\n\tstatus=\"\"\n\tif command -v curl >/dev/null
      2>&1; then\n\t\tcurl -fsSL --compressed \"${BINARY_URL}\" -o \"${BINARY_NAME}\"
      && break\n\t\tstatus=$?\n\telif command -v wget >/dev/null 2>&1; then\n\t\twget
      -q \"${BINARY_URL}\" -O \"${BINARY_NAME}\" && break\n\t\tstatus=$?\n\telif command
      -v busybox >/dev/null 2>&1; then\n\t\tbusybox wget -q \"${BINARY_URL}\" -O \"${BINARY_NAME}\"
      && break\n\t\tstatus=$?\n\telse\n\t\techo \"error: no download tool found, please
      install curl, wget or busybox wget\"\n\t\texit 127\n\tfi\n\techo \"error: failed
      to download coder agent\"\n\techo \"       command returned: ${status}\"\n\techo
      \"Trying again in 30 seconds...\"\n\tsleep 30\ndone\n\nif ! chmod +x $BINARY_NAME;
      then\n\techo \"Failed to make $BINARY_NAME executable\"\n\texit 1\nfi\n\nexport
      CODER_AGENT_AUTH=\"token\"\nexport CODER_AGENT_URL=\"https://coder-dev.helsing-dev.ai/\"\nexec
      ./$BINARY_NAME agent\n"
    env:
    - name: CODER_AGENT_TOKEN
      value: REDACTED
    image: codercom/enterprise-base:ubuntu
    imagePullPolicy: IfNotPresent
    name: dev
    resources: {}
    securityContext:
      allowPrivilegeEscalation: true
      privileged: false
      readOnlyRootFilesystem: false
      runAsNonRoot: false
      runAsUser: 1000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /home/coder
      mountPropagation: None
      name: home
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-jkncs
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: regcred
  nodeName: aks-default-40604188-vmss000000
  nodeSelector:
    sysbox-runtime: running
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  runtimeClassName: sysbox-runc
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000
    runAsNonRoot: false
    runAsUser: 1000
  serviceAccount: default
  serviceAccountName: default
  shareProcessNamespace: false
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: home
    persistentVolumeClaim:
      claimName: coder-niklasrosenstein-sysbox-test-home
  - name: kube-api-access-jkncs
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-01-02T12:22:16Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-01-02T12:22:16Z"
    message: 'containers with unready status: [dev]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-01-02T12:22:16Z"
    message: 'containers with unready status: [dev]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-01-02T12:22:16Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: codercom/enterprise-base:ubuntu
    imageID: ""
    lastState: {}
    name: dev
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        message: |
          container create failed: time="2023-01-02T12:28:18Z" level=error msg="container_linux.go:425: starting container process caused: process_linux.go:607: container init caused: rootfs_linux.go:66: setting up rootfs mounts caused: rootfs_linux.go:1156: mounting \"sysfs\" to rootfs \"/var/lib/sysbox/shiftfs/f24948fc-9f27-43bd-8d8f-56947b850b7a\" at \"/sys\" caused: mount through procfd: operation not permitted"
        reason: CreateContainerError
  hostIP: 10.79.129.249
  phase: Pending
  podIP: 10.79.130.33
  podIPs:
  - ip: 10.79.130.33
  qosClass: BestEffort
  startTime: "2023-01-02T12:22:16Z"

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 19 (9 by maintainers)

Most upvoted comments

I found this searching for a similar error creating a pod on GKE, and indeed, somehow I missed adding the annotation to my pod manifest @ctalledo. Adding that io.kubernetes.cri-o.userns-mode: "auto:size=65536" annotation got it running

Thanks @jamonation … yes it’s easy to miss. That annotation will become unnecessary once K8s and containerd formalize support for pods with user-namespaces (soon I believe).

Hi @NiklasRosenstein, thanks for trying Sysbox.

create failed: time="2023-01-02T12:28:18Z" level=error msg="container_linux.go:425: starting container process caused: process_linux.go:607: container init caused: rootfs_linux.go:66: setting up rootfs mounts caused: rootfs_linux.go:1156: mounting \"sysfs\" to rootfs \"/var/lib/sysbox/shiftfs/f24948fc-9f27-43bd-8d8f-56947b850b7a\" at \"/sys\" caused: mount through procfd: operation not permitted"

That error typically means the pod spec is missing the io.kubernetes.cri-o.userns-mode: "auto:size=65536" annotation:

apiVersion: v1
kind: Pod
metadata:
  name: ubu-bio-systemd-docker
  annotations:
    io.kubernetes.cri-o.userns-mode: "auto:size=65536"          <<<<<< THIS ONE
spec:
  runtimeClassName: sysbox-runc
  containers:
  - name: ubu-bio-systemd-docker
    image: registry.nestybox.com/nestybox/ubuntu-bionic-systemd-docker
    command: ["/sbin/init"]
  restartPolicy: Never

Could you double check and let me know?

Thanks!

I found this searching for a similar error creating a pod on GKE, and indeed, somehow I missed adding the annotation to my pod manifest @ctalledo. Adding that io.kubernetes.cri-o.userns-mode: "auto:size=65536" annotation got it running, did you get things working @NiklasRosenstein?