podman: `docker-compose`: Passing gpu with `driver: cdi` is not supported

Issue Description

May be related to https://github.com/containers/podman/issues/19330. See also https://github.com/NVIDIA/nvidia-container-toolkit/issues/126.

CC @elezar

CDI Support in Docker will be an experimental feature in the Docker 25 release.

It will support the following docker-compose.yml file:

version: '3.8'
services:
cdi_test: # this is not working
    image: ubuntu:20.04
    deploy:
      resources:
        reservations:
          devices:
           - driver: cdi
             device_ids:
                - nvidia.com/gpu=all
                - nvidia.com/gds=all
                - nvidia.com/mofed=all
    command:
      - bash
      - -c
      - |
        nvidia-smi -L

Steps to reproduce the issue

Steps to reproduce:

Prerequisites:

  • Have the NVIDIA Container Toolkit (e.g. v1.13.5) installed
  • Have a v25 Docker daemon configured in experimental mode
  • Have generated the CDI specification for NVIDIA GPUs by running sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml --device-name-strategy=uuid
  • Make sure you are using the podman-docker-emulation-daemon

Observe the following docker-compose.yml:

version: '3.8'
services:
  nvidia_test: # this is not working
    image: ubuntu:20.04
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu, utility, compute]
    command:
      - bash
      - -c
      - |
        nvidia-smi -L
  runtime_test: # this is working
    image: ubuntu:20.04
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
    command:
      - bash
      - -c
      - |
        nvidia-smi -L
  cdi_test: # this is not working
    image: ubuntu:20.04
    deploy:
      resources:
        reservations:
          devices:
           - driver: cdi
             device_ids:
                - nvidia.com/gpu=all
                - nvidia.com/gds=all
                - nvidia.com/mofed=all
    command:
      - bash
      - -c
      - |
        nvidia-smi -L

Launch via docker-compose up.

Describe the results you received

Only the runtime container works.

Describe the results you expected

All three containers should show the same output.

podman info output

host:
  arch: amd64
  buildahVersion: 1.30.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: /usr/bin/conmon is owned by conmon 1:2.1.7-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: f633919178f6c8ee4fb41b848a056ec33f8d707d'
  cpuUtilization:
    idlePercent: 98.59
    systemPercent: 0.38
    userPercent: 1.02
  cpus: 72
  databaseBackend: boltdb
  distribution:
    distribution: arch
    version: unknown
  eventLogger: journald
  hostname: argon
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.4.4-arch1-1
  linkmode: dynamic
  logDriver: journald
  memFree: 54521659392
  memTotal: 135022321664
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: /usr/bin/crun is owned by crun 1.8.5-1
    path: /usr/bin/crun
    version: |-
      crun version 1.8.5
      commit: b6f80f766c9a89eb7b1440c0a70ab287434b17ed
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns is owned by slirp4netns 1.2.0-1
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 42947567616
  swapTotal: 42947567616
  uptime: 21h 35m 45.00s (Approximately 0.88 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: /home/main/.config/containers/storage.conf
  containerStore:
    number: 3
    paused: 0
    running: 0
    stopped: 3
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/main/.local/share/containers/storage
  graphRootAllocated: 857601998848
  graphRootUsed: 271414001664
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 2
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/main/.local/share/containers/storage/volumes
version:
  APIVersion: 4.5.1
  Built: 1685139594
  BuiltTime: Sat May 27 00:19:54 2023
  GitCommit: 9eef30051c83f62816a1772a743e5f1271b196d7-dirty
  GoVersion: go1.20.4
  Os: linux
  OsArch: linux/amd64
  Version: 4.5.1

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

No response

Additional information

No response

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 2
  • Comments: 23 (4 by maintainers)

Most upvoted comments

Add a new comment.

For me, none of these docker-compose.yml approaches works, however I can get nVidia support in a container via a plain podman commandline.

Is this expected to work when using podman-compose? None of the three containers in the above docker-compose.yml seem to work for me, however I can get nVidia support in a container via a plain podman commandline:

$ podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 2070 (UUID: GPU-336c5a8a-839f-69a8-08cf-e3fdcd01e019)