podman: oci-nvidia-hook not working as expected with podman command

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

Description

Steps to reproduce the issue:

Set up the the hook as per the the Openshift guide: https://blog.openshift.com/use-gpus-with-device-plugin-in-openshift-3-9/

I have also tried the 1.0.0 Hook Schema Nvidia example documented in this repo for the oci-nvidia-hook.json file.

Run the test container suggested in the Openshift guide with podman:

sudo podman run -it --rm docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1

Describe the results you received: None of the Nvidia or CUDA tools are mounted and the test fails:

Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!

Describe the results you expected: For the vector-add test to pass as it does when using the fedora docker command.

Additional information you deem important (e.g. issue happens only occasionally): A work around is to use the nvidia-container-runtime with the podman --runtime option but I would rather use the native runc with the Nvidia hook if possible.

Output of podman version:

Version:       0.6.1-dev
Go Version:    go1.10.2
OS/Arch:       linux/amd64

Output of podman info:

host:
  MemFree: 181026816
  MemTotal: 8314044416
  SwapFree: 8455450624
  SwapTotal: 8455712768
  arch: amd64
  cpus: 8
  hostname: kfworkstation
  kernel: 4.16.12-200.fc27.x86_64
  os: linux
  uptime: 4h 29m 19.44s (Approximately 0.17 days)
insecure registries:
  registries: []
registries:
  registries:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
store:
  ContainerStore:
    number: 1
  GraphDriverName: overlay
  GraphOptions:
  - overlay.override_kernel_check=true
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
  ImageStore:
    number: 13
  RunRoot: /var/run/containers/storage

Additional environment details (AWS, VirtualBox, physical, etc.): Running on physical hardware with Quadro 1000M.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 20 (11 by maintainers)

Commits related to this issue

Most upvoted comments

The reason for the hanging seemed to be the omission of the version information:

{
  "version": "1.0.0",
  "hook": {
    "path": "/usr/bin/nvidia-container-runtime-hook",
    "args": ["nvidia-container-runtime-hook", "prestart"]
  },
  "when": {
    "always": true
  },
  "stages": ["prestart"]
}

The command now runs but the hook does not seem to be triggered as non of the Nvidia tools are mounted.