nvidia-container-toolkit: After updating to latest Debian package, docker containers doesn't work
I upgraded nvidia-container-toolkit on my Debian 11.6 and suddenly my nvidia-enabled Docker containers didn’t work anymore.
I start them using docker-compose. Here’s one of the docker-compose.yml
files:
version: "3"
services:
plex:
container_name: plex
restart: unless-stopped
entrypoint:
- /init
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=all
- TZ=CET
- PLEX_CLAIM=claim-FXncUm-C8zdJzxxdBEEz
- PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
- TERM=xterm
- LANG=C.UTF-8
- LC_ALL=C.UTF-8
- CHANGE_CONFIG_DIR_OWNERSHIP=true
- HOME=/config
expose:
- 1900/udp
- 3005/tcp
- 32400/tcp
- 32410/udp
- 32412/udp
- 32413/udp
- 32414/udp
- 32469/tcp
- 8324/tcp
hostname: plextower
image: plexinc/pms-docker:plexpass
# image: plexinc/pms-docker:latest
ipc: private
logging:
driver: json-file
options: {}
dns: 10.101.100.1
networks:
macvlan-plexdmz:
ipv4_address: 10.101.100.200
aliases:
- plextower
volumes:
- /mnt/cache/appdata/pms-docker:/config
- /mnt/cache/appdata/pms-docker-transcode:/tmp
- /mnt/user/media:/data
networks:
macvlan-plexdmz:
external: true
Here’s the output I get after executing docker-compose up -d
is:
Starting plex ... error
ERROR: for plex Cannot start service plex: failed to create shim task: OCI runtime create failed: failed to create NVIDIA Container Runtime: failed to construct OCI spec modifier: failed to construct discoverer: failed to create Xorg discoverer: failed to locate libcuda.so: pattern libcuda.so.*.*.* not found: unknown
ERROR: for plex Cannot start service plex: failed to create shim task: OCI runtime create failed: failed to create NVIDIA Container Runtime: failed to construct OCI spec modifier: failed to construct discoverer: failed to create Xorg discoverer: failed to locate libcuda.so: pattern libcuda.so.*.*.* not found: unknown
ERROR: Encountered errors while bringing up the project.
I have no clue whatsoever what’s wrong (although I guess the error messagesmakes sense if you know more about this stuff than me) and Google can’t help.
Thanks for any help!
/k
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 30 (14 by maintainers)
Commits related to this issue
- Nvidia Container Runtime Fix <https://github.com/NVIDIA/nvidia-container-toolkit/issues/59> — committed to NVIDIA/cloud-native-stack by angudadevops a year ago
- https://github.com/NVIDIA/nvidia-container-toolkit/issues/59#issuecomment-1603915868 — committed to NVIDIA/cloud-native-stack by angudadevops a year ago
Same issue here after installing
nvidia-container-toolkit=1.13.0-1
(note that I am working in a Jetson device)I temporary solved downgrading to
1.12.1-1
We have just published
v1.13.1
of our NVIDIA Container Toolkit packages. These should include the fix for the issues you were experiencing.Let me know if you’re still seeing problems.
For now, I will release
1.13.1
to fix the crash on Debian systems.What would be useful would be the link chain for how
libcuda.so.1
is resolved to the actual library (most likely/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.530.30.02
).I will also spin up a debian system to dig a bit further on my side, but this will be after the immediate crash is fixed.
@angudadevops the error message comes from us splitting the
nvidia-container-toolkit
package as part of the1.11.0
release, but after the1.11.0~rc1
that you have installed.The simplest solution is to first uninstall the
nvidia-contianer-toolkit=1.11.0~rc.1-1
package before installing the1.13.1
version (not that1.13.2
was released earlier this week).Another note. The
nvidia-container-runtime=3.13.0-1
is no longer required – and has not been for a number of NVIDIA Contianer Toolkit releases.My recommendation is thus:
nvidia-container-toolkit=1.11.0~rc.1-1
packagenvidia-container-runtime*
packagesInstall the NVIDIA Container Toolkit:
@obarisk we have not yet published the packages to the CUDA Downloads Repos for debian due to some internal tooling that needs to change. You can use the steps described in our docs to install the packages from our GitHub-Pages repositories though.
@obarisk ok. Thanks for the confirmation. The issue you’re seeing is because you’re using the ubuntu packages. The only functional difference between the Ubuntu and Debian packages is the config file that refers to
/sbin/ldconfig.real
instead of/sbin/ldconfig
. Installing the debian11 packages should address this.@obarisk which package did you install?
The issue is that the
ldconfig
entry in the/etc/nvidia-container-runtime/config.toml
seems incorrect for your distribution. Please replace/sbin/ldconfig.real
with/sbin/ldconfig
and try again.We have work in progress to generate distribution-specific configs instead of relying on hard-coded values.
Great. Thanks for the confirmation. I will get the release out later this week or early next week.
@elezar worked great for me on Debian 11/Proxmox – thanks!
The MR has a short sha of
2136266d
. To extract the packages built in this pipeline run:This will create an
nvct-packages
folder in the current folder:The
nvidia-container-toolkit-base_1.13.1-1_amd64.deb
is the package that contains thenvidia-container-runtime
binary with the fix for this issue.Note that this is an ubunut18.04 package. This is compatible with all newer Debian-based systems, but may require a modification to the
/etc/nvidia-container-runtime/config.toml
.On a Debian system the file should contain:
and not
I have been able to reproduce the behaviour on a test system and will continue working on improving things for Debian-based systesm.
If you want to build the packages yourself, you should be able to run:
which will build all packages (as above) in the
dist/debian10/amd64
folder. Note that the debian10 packages are compatible with debian11.Hi @klausagnoletti. Thanks for the update. Could you let me know where
libcuda.so.*
is located on your system?It would also be useful where the NVIDIA XOrg libraries are located (
libglxserver_nvidia.so.*
).Regardless, I will push out a patch release that handes this error more gracefully.