kubernetes: volumeDevices mapping ignored when container is privileged

What happened: When using the raw block PV feature, if your container is privileged, then the volumeDevices.devicePath specified in the Pod spec are silently ignored by the container runtime since all host devices are mapped into the container at /dev.

This is an issue with the raw block feature since its inception and impacts all volume plugins and most runtimes (docker, containerd to name a few)

What you expected to happen: At least when the container’s devicePath is non-conflicting with the host (ie not in /dev), it should be mapped correctly.

I’m on the fence about what should happen if the devicePath is “/dev”. Disallowing “/dev” entirely would be a breaking change (even though it’s pretty non-functional today). Overriding “/dev” could also be a breaking change?

How to reproduce it (as minimally and precisely as possible):

apiVersion: v1
kind: Pod 
metadata:
  name: test
spec:
  containers:
  - image: debian:latest
    imagePullPolicy: IfNotPresent
    name: nginx
    command: ["/bin/sh"]
    args:
      - "-c"
      - "while true; do sleep 10;done"
    securityContext:
      privileged: true
    volumeDevices:
    - devicePath: /my-disk
      name: disk1
  volumes:
  - name: disk1
    persistentVolumeClaim:
      claimName: disk1
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: disk1
spec:
  volumeMode: Block
  accessModes: [ "ReadWriteOnce" ]
  resources:
    requests:
      storage: 1Gi 

Anything else we need to know?: Current workaround for this issue is to have an unprivileged init container that copies the devices to an emptydir shared with the privileged container.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 45 (29 by maintainers)

Commits related to this issue

Most upvoted comments

@jingxu97

Suppose we have the following manifest

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
  - image: ubuntu:20.04
    securityContext:
      privileged: true
    command:
      - /usr/bin/sleep
    args:
      - infinity
    name: test-container
    volumeDevices:
    - devicePath: /disks/ssd1
      name: ssd1
    volumeMounts:
    - mountPath: /disks/nvme0n1
      name: nvme0n1
  volumes:
  - name: ssd1
    persistentVolumeClaim:
      claimName: ssd1
  - name: nvme0n1
    hostPath:
      path: /dev/disk/by-id/google-local-ssd-0
      type: BlockDevice

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ssd1
spec:
  storageClassName: ssd
  accessModes:
    - ReadWriteOnce
  volumeMode: Block
  resources:
    requests:
      storage: 1Gi

When I then run kubectl exec test-pod -- ls -l /disks I get

total 0
brw-rw---- 1 root disk 8, 16 Apr  5 23:13 nvme0n1

So ssd1 mount is ignored in privileged mode and the device is instead available directly at /dev/sdf, which is completely arbitrary - next time it may be /dev/sdg or /dev/sdh, depending on what other PDs are attached to the node.

➜  ~ kubectl exec test-pod -- lsblk
NAME    MAJ:MIN RM    SIZE RO TYPE MOUNTPOINT
loop0     7:0    0      1G  0 loop
sda       8:0    0      1T  0 disk
|-sda1    8:1    0 1019.9G  0 part /etc/resolv.conf
|-sda2    8:2    0     16M  0 part
|-sda3    8:3    0      2G  0 part
|-sda4    8:4    0     16M  0 part
|-sda5    8:5    0      2G  0 part
|-sda6    8:6    0    512B  0 part
|-sda7    8:7    0    512B  0 part
|-sda8    8:8    0     16M  0 part
|-sda9    8:9    0    512B  0 part
|-sda10   8:10   0    512B  0 part
|-sda11   8:11   0      8M  0 part
`-sda12   8:12   0     32M  0 part
sdb       8:16   0    375G  0 disk
sdc       8:32   0    375G  0 disk
sdd       8:48   0    375G  0 disk
sde       8:64   0    375G  0 disk
sdf       8:80   0      1G  0 disk
➜  ~ kubectl exec test-pod -- ls -l /dev/sdf
brw-rw---- 1 root disk 8, 80 Apr  7 01:43 /dev/sdf

However if I change privileged to false, ssd1 is mounted successfully

➜  ~ kubectl exec test-pod -- ls -l /disks
total 0
brw-rw---- 1 root disk 8, 16 Apr  5 23:13 nvme0n1
brw-rw---- 1 root disk 8, 80 Apr  7 01:35 ssd1

But in this case I can’t do anything with nvme0n1 because unprivileged pods are not allowed to write to block devices mounted as a hostPath (naturally, I need both devices writable).

➜  ~ kubectl exec test-pod -- dd bs=10M count=1 if=/dev/zero of=/disks/nvme0n1
dd: failed to open '/disks/nvme0n1': Operation not permitted