nvidia-container-toolkit: nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1

Hello,

I tried the different combinations of conda and pip packages that people suggest to get tensorflow running for the rtx 30 series. Thought it was working after utilizing the gpu with keras tutorial code but moved to a different type of model and something apparently broke.

Now I’m trying the docker route. docker run --gpus all -it --rm nvcr.io/nvidia/tensorflow:22.11-tf2-py3 docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown. There seems to be a lot of missing libraries.

3. Information to attach (optional if deemed irrelevant)

  • [ x] Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
  • I1202 15:15:34.407243 26518 nvc.c:376] initializing library context (version=1.11.0, build=) I1202 15:15:34.407353 26518 nvc.c:350] using root / I1202 15:15:34.407365 26518 nvc.c:351] using ldcache /etc/ld.so.cache I1202 15:15:34.407377 26518 nvc.c:352] using unprivileged user 1000:1000 I1202 15:15:34.407426 26518 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I1202 15:15:34.408137 26518 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment W1202 15:15:34.411623 26519 nvc.c:273] failed to set inheritable capabilities W1202 15:15:34.411736 26519 nvc.c:274] skipping kernel modules load due to failure I1202 15:15:34.412602 26520 rpc.c:71] starting driver rpc service I1202 15:15:34.433974 26521 rpc.c:71] starting nvcgo rpc service I1202 15:15:34.438005 26518 nvc_info.c:766] requesting driver information with ‘’ I1202 15:15:34.445181 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.520.56.06 I1202 15:15:34.445313 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.520.56.06 I1202 15:15:34.445952 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.520.56.06 I1202 15:15:34.446254 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.520.56.06 I1202 15:15:34.446554 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.520.56.06 I1202 15:15:34.446877 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.520.56.06 I1202 15:15:34.447241 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.520.56.06 I1202 15:15:34.447301 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.520.56.06 I1202 15:15:34.447405 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.520.56.06 I1202 15:15:34.447490 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.520.56.06 I1202 15:15:34.447550 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.520.56.06 I1202 15:15:34.447813 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.520.56.06 I1202 15:15:34.448099 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.520.56.06 I1202 15:15:34.448197 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.520.56.06 I1202 15:15:34.448693 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.520.56.06 I1202 15:15:34.448755 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.520.56.06 I1202 15:15:34.449075 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.520.56.06 I1202 15:15:34.449417 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.520.56.06 I1202 15:15:34.450211 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libcudadebugger.so.520.56.06 I1202 15:15:34.450273 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.520.56.06 I1202 15:15:34.450625 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.520.56.06 I1202 15:15:34.450896 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.520.56.06 I1202 15:15:34.451174 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.520.56.06 I1202 15:15:34.451236 26518 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.520.56.06 I1202 15:15:34.451580 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.520.56.06 I1202 15:15:34.451929 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.520.56.06 I1202 15:15:34.452169 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.520.56.06 I1202 15:15:34.452413 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-opencl.so.520.56.06 I1202 15:15:34.452680 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-ml.so.520.56.06 I1202 15:15:34.452975 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.520.56.06 I1202 15:15:34.453288 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.520.56.06 I1202 15:15:34.453571 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.520.56.06 I1202 15:15:34.453833 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-fbc.so.520.56.06 I1202 15:15:34.454141 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-encode.so.520.56.06 I1202 15:15:34.454359 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.520.56.06 I1202 15:15:34.455059 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-compiler.so.520.56.06 I1202 15:15:34.455764 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-allocator.so.520.56.06 I1202 15:15:34.456075 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvcuvid.so.520.56.06 I1202 15:15:34.456395 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libcuda.so.520.56.06 I1202 15:15:34.456750 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libGLX_nvidia.so.520.56.06 I1202 15:15:34.457050 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libGLESv2_nvidia.so.520.56.06 I1202 15:15:34.457314 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.520.56.06 I1202 15:15:34.457580 26518 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libEGL_nvidia.so.520.56.06 W1202 15:15:34.457645 26518 nvc_info.c:399] missing library libnvidia-nscq.so W1202 15:15:34.457659 26518 nvc_info.c:399] missing library libnvidia-fatbinaryloader.so W1202 15:15:34.457678 26518 nvc_info.c:399] missing library libnvidia-pkcs11.so W1202 15:15:34.457694 26518 nvc_info.c:399] missing library libvdpau_nvidia.so W1202 15:15:34.457709 26518 nvc_info.c:399] missing library libnvidia-ifr.so W1202 15:15:34.457722 26518 nvc_info.c:399] missing library libnvidia-cbl.so W1202 15:15:34.457740 26518 nvc_info.c:403] missing compat32 library libnvidia-cfg.so W1202 15:15:34.457753 26518 nvc_info.c:403] missing compat32 library libnvidia-nscq.so W1202 15:15:34.457768 26518 nvc_info.c:403] missing compat32 library libcudadebugger.so W1202 15:15:34.457780 26518 nvc_info.c:403] missing compat32 library libnvidia-fatbinaryloader.so W1202 15:15:34.457792 26518 nvc_info.c:403] missing compat32 library libnvidia-pkcs11.so W1202 15:15:34.457808 26518 nvc_info.c:403] missing compat32 library libnvidia-ngx.so W1202 15:15:34.457828 26518 nvc_info.c:403] missing compat32 library libvdpau_nvidia.so W1202 15:15:34.457843 26518 nvc_info.c:403] missing compat32 library libnvidia-ifr.so W1202 15:15:34.457860 26518 nvc_info.c:403] missing compat32 library libnvidia-rtcore.so W1202 15:15:34.457880 26518 nvc_info.c:403] missing compat32 library libnvoptix.so W1202 15:15:34.457894 26518 nvc_info.c:403] missing compat32 library libnvidia-cbl.so I1202 15:15:34.460121 26518 nvc_info.c:299] selecting /usr/bin/nvidia-smi I1202 15:15:34.460197 26518 nvc_info.c:299] selecting /usr/bin/nvidia-debugdump I1202 15:15:34.460243 26518 nvc_info.c:299] selecting /usr/bin/nvidia-persistenced I1202 15:15:34.460336 26518 nvc_info.c:299] selecting /usr/bin/nvidia-cuda-mps-control I1202 15:15:34.460409 26518 nvc_info.c:299] selecting /usr/bin/nvidia-cuda-mps-server W1202 15:15:34.460616 26518 nvc_info.c:425] missing binary nv-fabricmanager I1202 15:15:34.460810 26518 nvc_info.c:343] listing firmware path /usr/lib/firmware/nvidia/520.56.06/gsp.bin I1202 15:15:34.460876 26518 nvc_info.c:529] listing device /dev/nvidiactl I1202 15:15:34.460891 26518 nvc_info.c:529] listing device /dev/nvidia-uvm I1202 15:15:34.460904 26518 nvc_info.c:529] listing device /dev/nvidia-uvm-tools I1202 15:15:34.460915 26518 nvc_info.c:529] listing device /dev/nvidia-modeset I1202 15:15:34.460980 26518 nvc_info.c:343] listing ipc path /run/nvidia-persistenced/socket W1202 15:15:34.461036 26518 nvc_info.c:349] missing ipc path /var/run/nvidia-fabricmanager/socket W1202 15:15:34.461083 26518 nvc_info.c:349] missing ipc path /tmp/nvidia-mps I1202 15:15:34.461100 26518 nvc_info.c:822] requesting device information with ‘’ I1202 15:15:34.468056 26518 nvc_info.c:713] listing device /dev/nvidia0 (GPU-ba9fdcdb-8a2b-d2b6-f69c-5f2ac08dde8b at 00000000:01:00.0) NVRM version: 520.56.06 CUDA version: 11.8

Device Index: 0 Device Minor: 0 Model: NVIDIA GeForce RTX 3090 Ti Brand: GeForce GPU UUID: GPU-ba9fdcdb-8a2b-d2b6-f69c-5f2ac08dde8b Bus Location: 00000000:01:00.0 Architecture: 8.6 I1202 15:15:34.468151 26518 nvc.c:434] shutting down library context I1202 15:15:34.468317 26521 rpc.c:95] terminating nvcgo rpc service I1202 15:15:34.469397 26518 rpc.c:132] nvcgo rpc service terminated successfully I1202 15:15:34.474156 26520 rpc.c:95] terminating driver rpc service I1202 15:15:34.474599 26518 rpc.c:132] driver rpc service terminated successfully

  • [ x] Kernel version from uname -a
  • 5.15.0-53-generic NVIDIA/nvidia-docker#59-Ubuntu SMP Mon Oct 17 18:53:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • [ x] Driver information from nvidia-smi -a
  • ==============NVSMI LOG==============

Timestamp : Fri Dec 2 09:17:13 2022 Driver Version : 520.56.06 CUDA Version : 11.8

Attached GPUs : 1 GPU 00000000:01:00.0 Product Name : NVIDIA GeForce RTX 3090 Ti Product Brand : GeForce Product Architecture : Ampere Display Mode : Enabled Display Active : Enabled Persistence Mode : Enabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : N/A GPU UUID : GPU-ba9fdcdb-8a2b-d2b6-f69c-5f2ac08dde8b Minor Number : 0 VBIOS Version : 94.02.A0.00.2D MultiGPU Board : No Board ID : 0x100 GPU Part Number : N/A Module ID : 0 Inforom Version Image Version : G002.0000.00.03 OEM Object : 2.0 ECC Object : 6.16 Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x01 Device : 0x00 Domain : 0x0000 Device Id : 0x220310DE Bus Id : 00000000:01:00.0 Sub System Id : 0x88701043 GPU Link Info PCIe Generation Max : 4 Current : 1 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 1000 KB/s Rx Throughput : 0 KB/s Fan Speed : 0 % Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 24564 MiB Reserved : 310 MiB Used : 510 MiB Free : 23742 MiB BAR1 Memory Usage Total : 256 MiB Used : 13 MiB Free : 243 MiB Compute Mode : Default Utilization Gpu : 6 % Memory : 5 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : Disabled Pending : Disabled ECC Errors Volatile SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Aggregate SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 192 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 36 C GPU Shutdown Temp : 97 C GPU Slowdown Temp : 94 C GPU Max Operating Temp : 92 C GPU Target Temperature : 83 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 32.45 W Power Limit : 480.00 W Default Power Limit : 480.00 W Enforced Power Limit : 480.00 W Min Power Limit : 100.00 W Max Power Limit : 516.00 W Clocks Graphics : 210 MHz SM : 210 MHz Memory : 405 MHz Video : 555 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Max Clocks Graphics : 2115 MHz SM : 2115 MHz Memory : 10501 MHz Video : 1950 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 740.000 mV Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 2283 Type : G Name : /usr/lib/xorg/Xorg Used GPU Memory : 259 MiB GPU instance ID : N/A Compute instance ID : N/A Process ID : 2441 Type : G Name : /usr/bin/gnome-shell Used GPU Memory : 52 MiB GPU instance ID : N/A Compute instance ID : N/A Process ID : 3320 Type : G Name : /opt/docker-desktop/Docker Desktop --type=gpu-process --enable-crashpad --enable-crash-reporter=46721d59-e3cc-4241-8f96-57bab71f8674,no_channel --user-data-dir=/home/kanaka/.config/Docker Desktop --gpu-preferences=WAAAAAAAAAAgAAAIAAAAAAAAAAAAAAAAAABgAAAAAAA4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAIAAAAAAAAAABAAAAAAAAAAgAAAAAAAAACAAAAAAAAAAIAAAAAAAAAA== --shared-files --field-trial-handle=0,i,777493636119283380,17735576311253417080,131072 --disable-features=SpareRendererForSitePerProcess Used GPU Memory : 27 MiB GPU instance ID : N/A Compute instance ID : N/A Process ID : 4402 Type : C+G Name : /opt/google/chrome/chrome --type=gpu-process --enable-crashpad --crashpad-handler-pid=4367 --enable-crash-reporter=, --change-stack-guard-on-fork=enable --gpu-preferences=WAAAAAAAAAAgAAAIAAAAAAAAAAAAAAAAAABgAAEAAAA4AAAAAAAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAIAAAAAAAAAABAAAAAAAAAAgAAAAAAAAACAAAAAAAAAAIAAAAAAAAAA== --shared-files --field-trial-handle=0,i,1352372760819385498,10632265477078674372,131072 Used GPU Memory : 166 MiB

  • [ x] Docker version from docker version
  • Client: Docker Engine - Community Cloud integration: v1.0.29 Version: 20.10.21 API version: 1.41 Go version: go1.18.7 Git commit: baeda1f Built: Tue Oct 25 18:01:58 2022 OS/Arch: linux/amd64 Context: desktop-linux Experimental: true

Server: Docker Desktop 4.15.0 (93002) Engine: Version: 20.10.21 API version: 1.41 (minimum version 1.12) Go version: go1.18.7 Git commit: 3056208 Built: Tue Oct 25 18:00:19 2022 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.10 GitCommit: 770bd0108c32f3fb5c73ae1264f7e503fe7b2661 runc: Version: 1.1.4 GitCommit: v1.1.4-0-g5fd4c4d docker-init: Version: 0.19.0 GitCommit: de40ad0

  • [x ] NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*' -Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description ++±===================================-============================-============-======================================================================== un libgldispatch0-nvidia <none> <none> (no description available) ii libnvidia-cfg1-515:amd64 520.56.06-0lambda0.22.04.3 amd64 Transitional package for libnvidia-cfg1-520 ii libnvidia-cfg1-520:amd64 520.56.06-0lambda0.22.04.3 amd64 NVIDIA binary OpenGL/GLX configuration library un libnvidia-cfg1-any <none> <none> (no description available) un libnvidia-common <none> <none> (no description available) ii libnvidia-common-515 520.56.06-0lambda0.22.04.3 all Transitional package for libnvidia-common-520 ii libnvidia-common-520 520.56.06-0lambda0.22.04.3 all Shared files used by the NVIDIA libraries un libnvidia-compute <none> <none> (no description available) ii libnvidia-compute-515:amd64 520.56.06-0lambda0.22.04.3 amd64 Transitional package for libnvidia-compute-520 ii libnvidia-compute-515:i386 520.56.06-0lambda0.22.04.3 i386 Transitional package for libnvidia-compute-520 ii libnvidia-compute-520:amd64 520.56.06-0lambda0.22.04.3 amd64 NVIDIA libcompute package ii libnvidia-compute-520:i386 520.56.06-0lambda0.22.04.3 i386 NVIDIA libcompute package ii libnvidia-container-tools 1.11.0+dfsg-0lambda0.22.04.1 amd64 Package for configuring containers with NVIDIA hardware (CLI tool) ii libnvidia-container1:amd64 1.11.0+dfsg-0lambda0.22.04.1 amd64 Package for configuring containers with NVIDIA hardware (shared library) un libnvidia-decode <none> <none> (no description available) ii libnvidia-decode-515:amd64 520.56.06-0lambda0.22.04.3 amd64 Transitional package for libnvidia-decode-520 ii libnvidia-decode-515:i386 520.56.06-0lambda0.22.04.3 i386 Transitional package for libnvidia-decode-520 ii libnvidia-decode-520:amd64 520.56.06-0lambda0.22.04.3 amd64 NVIDIA Video Decoding runtime libraries ii libnvidia-decode-520:i386 520.56.06-0lambda0.22.04.3 i386 NVIDIA Video Decoding runtime libraries ii libnvidia-egl-wayland1:amd64 1:1.1.9-1.1 amd64 Wayland EGL External Platform library – shared library un libnvidia-encode <none> <none> (no description available) ii libnvidia-encode-515:amd64 520.56.06-0lambda0.22.04.3 amd64 Transitional package for libnvidia-encode-520 ii libnvidia-encode-515:i386 520.56.06-0lambda0.22.04.3 i386 Transitional package for libnvidia-encode-520 ii libnvidia-encode-520:amd64 520.56.06-0lambda0.22.04.3 amd64 NVENC Video Encoding runtime library ii libnvidia-encode-520:i386 520.56.06-0lambda0.22.04.3 i386 NVENC Video Encoding runtime library un libnvidia-encode1 <none> <none> (no description available) un libnvidia-extra <none> <none> (no description available) ii libnvidia-extra-515:amd64 520.56.06-0lambda0.22.04.3 amd64 Transitional package for libnvidia-extra-520 ii libnvidia-extra-520:amd64 520.56.06-0lambda0.22.04.3 amd64 Extra libraries for the NVIDIA driver ii libnvidia-extra-520:i386 520.56.06-0lambda0.22.04.3 i386 Extra libraries for the NVIDIA driver un libnvidia-fbc1 <none> <none> (no description available) ii libnvidia-fbc1-515:amd64 520.56.06-0lambda0.22.04.3 amd64 Transitional package for libnvidia-fbc1-520 ii libnvidia-fbc1-515:i386 520.56.06-0lambda0.22.04.3 i386 Transitional package for libnvidia-fbc1-520 ii libnvidia-fbc1-520:amd64 520.56.06-0lambda0.22.04.3 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library ii libnvidia-fbc1-520:i386 520.56.06-0lambda0.22.04.3 i386 NVIDIA OpenGL-based Framebuffer Capture runtime library un libnvidia-gl <none> <none> (no description available) un libnvidia-gl-390 <none> <none> (no description available) un libnvidia-gl-410 <none> <none> (no description available) un libnvidia-gl-470 <none> <none> (no description available) un libnvidia-gl-495 <none> <none> (no description available) ii libnvidia-gl-515:amd64 520.56.06-0lambda0.22.04.3 amd64 Transitional package for libnvidia-gl-520 ii libnvidia-gl-515:i386 520.56.06-0lambda0.22.04.3 i386 Transitional package for libnvidia-gl-520 ii libnvidia-gl-520:amd64 520.56.06-0lambda0.22.04.3 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD ii libnvidia-gl-520:i386 520.56.06-0lambda0.22.04.3 i386 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD un libnvidia-legacy-390xx-egl-wayland1 <none> <none> (no description available) un libnvidia-ml1 <none> <none> (no description available) un nvidia-common <none> <none> (no description available) un nvidia-compute-utils <none> <none> (no description available) ii nvidia-compute-utils-515 520.56.06-0lambda0.22.04.3 amd64 Transitional package for nvidia-compute-utils-520 ii nvidia-compute-utils-520 520.56.06-0lambda0.22.04.3 amd64 NVIDIA compute utilities un nvidia-contaienr-runtime <none> <none> (no description available) un nvidia-container-runtime <none> <none> (no description available) un nvidia-container-runtime-hook <none> <none> (no description available) ii nvidia-container-toolkit 1.11.0-0lambda0.22.04.1 amd64 OCI hook for configuring containers for NVIDIA hardware ii nvidia-container-toolkit-base 1.11.0-0lambda0.22.04.1 amd64 OCI hook for configuring containers for NVIDIA hardware ii nvidia-dkms-515 520.56.06-0lambda0.22.04.3 amd64 Transitional package for nvidia-dkms-520 ii nvidia-dkms-520 520.56.06-0lambda0.22.04.3 amd64 NVIDIA DKMS package un nvidia-dkms-kernel <none> <none> (no description available) un nvidia-driver <none> <none> (no description available) ii nvidia-driver-515 520.56.06-0lambda0.22.04.3 amd64 Transitional package for nvidia-driver-520 ii nvidia-driver-520 520.56.06-0lambda0.22.04.3 amd64 NVIDIA driver metapackage un nvidia-driver-binary <none> <none> (no description available) un nvidia-egl-wayland-common <none> <none> (no description available) un nvidia-kernel-common <none> <none> (no description available) ii nvidia-kernel-common-515 520.56.06-0lambda0.22.04.3 amd64 Transitional package for nvidia-kernel-common-520 ii nvidia-kernel-common-520 520.56.06-0lambda0.22.04.3 amd64 Shared files used with the kernel module un nvidia-kernel-source <none> <none> (no description available) ii nvidia-kernel-source-515 520.56.06-0lambda0.22.04.3 amd64 Transitional package for nvidia-kernel-source-520 ii nvidia-kernel-source-520 520.56.06-0lambda0.22.04.3 amd64 NVIDIA kernel source package un nvidia-libopencl1-dev <none> <none> (no description available) un nvidia-opencl-icd <none> <none> (no description available) un nvidia-persistenced <none> <none> (no description available) ii nvidia-prime 0.8.17.1 all Tools to enable NVIDIA’s Prime ii nvidia-settings 510.47.03-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver un nvidia-settings-binary <none> <none> (no description available) un nvidia-smi <none> <none> (no description available) un nvidia-utils <none> <none> (no description available) ii nvidia-utils-515 520.56.06-0lambda0.22.04.3 amd64 Transitional package for nvidia-utils-520 ii nvidia-utils-520 520.56.06-0lambda0.22.04.3 amd64 NVIDIA driver support binaries ii xserver-xorg-video-nvidia-515 520.56.06-0lambda0.22.04.3 amd64 Transitional package for xserver-xorg-video-nvidia-520 ii xserver-xorg-video-nvidia-520 520.56.06-0lambda0.22.04.3 amd64 NVIDIA binary Xorg driver

  • [ x] NVIDIA container library version from nvidia-container-cli -V

  • cli-version: 1.11.0 lib-version: 1.11.0 build date: 2022-10-25T22:10+00:00 build revision: build compiler: x86_64-linux-gnu-gcc-11 11.3.0 build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -Wdate-time -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -g -O2 -ffile-prefix-map=/build/libnvidia-container-956QFy/libnvidia-container-1.11.0+dfsg=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,–gc-sections -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 4
  • Comments: 38 (4 by maintainers)

Most upvoted comments

I had the same issue. For me a reinstall of docker fixed the issue:

I run as a bash script:

sudo apt-get update

sudo apt install apt-transport-https ca-certificates curl software-properties-common

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"

apt-cache policy docker-ce

sudo apt install docker-ce

Hi guys,

I hit same issue on Ubuntu 22.04 LTS, I followed instructions to reinstall as bellow (note I have installed as well docker-desktop initially)

sudo apt-get purge docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-ce-rootless-extras
sudo rm -rf /var/lib/docker
sudo rm -rf /var/lib/containerd
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl restart docker

And I was able to run this machine from above comment: docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark and I was finally capable to run RAPIDS: docker run --gpus all --pull always --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p 8888:8888 -p 8787:8787 -p 8786:8786 rapidsai/notebooks:23.12a-cuda11.2-py3.10

At moment the docker-desktop is uninstalled. I will try to install it again and run the tests.

@JosephKuchar try reinstalling docker - I had similar problem, with the issue being the missing runtime (see docker info). the solution was for me to reinstall docker https://github.com/NVIDIA/nvidia-docker/issues/1648#issuecomment-1785033393

The toolkit explicitly looks for libnvidia-ml.so.1 which should be symlinked to libnvidia-mk.so.<DRIVER_VERSION> after running ldconfig on your host. Since nvidia-smi works (and also uses libnvidia-ml.so.1), I would not expect this to be the case.

How is docker installed, could it be that it is installed as a snap and cannot load the system libraries because of this?

I actually managed to fix this. At some point in time we had uncommented the option root = “/run/nvidia/driver” in /etc/nvidia-container-runtime/config.toml (must have seen directions on this somewhere). My best guess is that we had updated something on the system that made this no longer be a viable option, and after a reboot, everything stopped working. I commented out that option and everything popped up.

To find it, I created a wrapper around nvidia-container-cli:

#!/bin/bash

echo "$@" > /var/tmp/debuginfo
/usr/bin/nvidia-container-cli.real "$@"

That showed me a working and a non-working systems’ option that were being passed.

Not working:

--root=/run/nvidia/driver --load-kmods --debug=/var/log/nvidia-container-toolkit.log configure [--ldconfig=@/sbin/ldconfig.real](mailto:--ldconfig=@/sbin/ldconfig.real) --device=all --compute --utility --require=cuda>=11.8 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=unknown,driver>=515,driver<516 brand=nvidia,driver>=515,driver<516 brand=nvidiartx,driver>=515,driver<516 brand=geforce,driver>=515,driver<516 brand=geforcertx,driver>=515,driver<516 brand=quadro,driver>=515,driver<516 brand=quadrortx,driver>=515,driver<516 brand=titan,driver>=515,driver<516 brand=titanrtx,driver>=515,driver<516 --pid=3895576 /var/lib/docker/overlay2/47f7deb4479aa6b8c26f3b6e3ad4a2cd9bd86304736bf9aed68ed4127fbc0d00/merged

Working:

--load-kmods configure [--ldconfig=@/sbin/ldconfig.real](mailto:--ldconfig=@/sbin/ldconfig.real) --device=all --compute --utility --require=cuda>=11.8 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=unknown,driver>=515,driver<516 brand=nvidia,driver>=515,driver<516 brand=nvidiartx,driver>=515,driver<516 brand=geforce,driver>=515,driver<516 brand=geforcertx,driver>=515,driver<516 brand=quadro,driver>=515,driver<516 brand=quadrortx,driver>=515,driver<516 brand=titan,driver>=515,driver<516 brand=titanrtx,driver>=515,driver<516 --pid=2830327 /var/lib/docker/overlay2/59206c16f5a12eadbe2e42287a7ff6aa3559b0666048d7578b29df90e3755d50/merged

Screenshot from 2022-12-05 22-35-34 not sure it helps - I had originally installed the driver from cuda 11.8 - but then when I did nvidia-docker2 install - the driver broke - so i reverted back to the system (auto install) driver.

UPDATE

reading through docs - for https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

this command works fine…


sudo docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
Unable to find image 'nvidia/cuda:11.8.0-base-ubuntu22.04' locally
11.8.0-base-ubuntu22.04: Pulling from nvidia/cuda
301a8b74f71f: Already exists 
35985d37d899: Already exists 
5b7513e7876e: Already exists 
bbf319bc026c: Already exists 
da5c9c5d5ac3: Already exists 
Digest: sha256:83493b3f150cc23f91fb0d2509e491204e33f062355d401662389a80a9091b82
Status: Downloaded newer image for nvidia/cuda:11.8.0-base-ubuntu22.04
Mon Dec  5 23:05:46 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   44C    P8    25W / 370W |    995MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+


ok

it’s basically a problem without using sudo…


docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi 
Unable to find image 'nvidia/cuda:11.8.0-base-ubuntu22.04' locally
11.8.0-base-ubuntu22.04: Pulling from nvidia/cuda
301a8b74f71f: Already exists 
35985d37d899: Already exists 
5b7513e7876e: Already exists 
bbf319bc026c: Already exists 
da5c9c5d5ac3: Already exists 
Digest: sha256:83493b3f150cc23f91fb0d2509e491204e33f062355d401662389a80a9091b82
Status: Downloaded newer image for nvidia/cuda:11.8.0-base-ubuntu22.04
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

UPDATE - FIXED. I don’t know if this helps - but on my installation I had cudnn-local-repo-ubuntu2204-8.6.0.163_1.0-1_amd64.deb + 11.8 cuda this is incorrect. i was using cog - and this didn’t find the error - just assumed it was all working correctly. updating to latest cudnn - resolved my original issue. cudnn-local-repo-ubuntu2204-8.7.0.84_1.0-1_amd64.deb

same problem ubuntu 22:04

Linux msi 5.15.0-56-generic NVIDIA/nvidia-docker#62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

docker desktop

can you unpack this?

The toolkit explicitly looks for libnvidia-ml.so.1 which should be symlinked to libnvidia-mk.so.<DRIVER_VERSION> after running ldconfig on your host. Since nvidia-smi works (and also uses libnvidia-ml.so.1), I would not expect this to be the case.

How is docker installed, could it be that it is installed as a snap and cannot load the system libraries because of this?

I installed

sudo apt-get install -y nvidia-docker2

successfully nvidia-docker2 is already the newest version (2.11.0-1).

Mon Dec  5 18:59:03 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   57C    P8    29W / 370W |   1010MiB / 24576MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1515      G   /usr/lib/xorg/Xorg                548MiB |
|    0   N/A  N/A      1649      G   /usr/bin/gnome-shell              234MiB |
|    0   N/A  N/A     19695      G   ...RendererForSitePerProcess       32MiB |
|    0   N/A  N/A     19769    C+G   ...192290595412440874,131072      191MiB |
+-----------------------------------------------------------------------------+




nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I1205 08:00:00.132727 24945 nvc.c:376] initializing library context (version=1.11.0, build=c8f267be0bac1c654d59ad4ea5df907141149977)
I1205 08:00:00.132797 24945 nvc.c:350] using root /
I1205 08:00:00.132806 24945 nvc.c:351] using ldcache /etc/ld.so.cache
I1205 08:00:00.132819 24945 nvc.c:352] using unprivileged user 29999:29999
I1205 08:00:00.132844 24945 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I1205 08:00:00.133009 24945 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
W1205 08:00:00.134346 24946 nvc.c:273] failed to set inheritable capabilities
W1205 08:00:00.134424 24946 nvc.c:274] skipping kernel modules load due to failure
I1205 08:00:00.134891 24947 rpc.c:71] starting driver rpc service
I1205 08:00:00.142782 24948 rpc.c:71] starting nvcgo rpc service
I1205 08:00:00.143811 24945 nvc_info.c:766] requesting driver information with ''
I1205 08:00:00.145644 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.525.60.11
I1205 08:00:00.145731 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.525.60.11
I1205 08:00:00.145778 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.525.60.11
I1205 08:00:00.145821 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.525.60.11
I1205 08:00:00.145877 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.525.60.11
I1205 08:00:00.145930 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.525.60.11
I1205 08:00:00.145970 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.525.60.11
I1205 08:00:00.146007 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.60.11
I1205 08:00:00.146066 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.525.60.11
I1205 08:00:00.146105 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.525.60.11
I1205 08:00:00.146144 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.525.60.11
I1205 08:00:00.146183 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.525.60.11
I1205 08:00:00.146236 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.525.60.11
I1205 08:00:00.146288 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.525.60.11
I1205 08:00:00.146325 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.525.60.11
I1205 08:00:00.146366 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.525.60.11
I1205 08:00:00.146418 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.525.60.11
I1205 08:00:00.146475 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.525.60.11
I1205 08:00:00.146752 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libcudadebugger.so.525.60.11
I1205 08:00:00.146788 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.525.60.11
I1205 08:00:00.146943 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.525.60.11
I1205 08:00:00.146977 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.525.60.11
I1205 08:00:00.147011 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.525.60.11
I1205 08:00:00.147046 24945 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.525.60.11
I1205 08:00:00.147106 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.525.60.11
I1205 08:00:00.147140 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.525.60.11
I1205 08:00:00.147186 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.525.60.11
I1205 08:00:00.147236 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-opencl.so.525.60.11
I1205 08:00:00.147271 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-ml.so.525.60.11
I1205 08:00:00.147319 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.525.60.11
I1205 08:00:00.147350 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.525.60.11
I1205 08:00:00.147385 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.525.60.11
I1205 08:00:00.147417 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-fbc.so.525.60.11
I1205 08:00:00.147465 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-encode.so.525.60.11
I1205 08:00:00.147515 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.525.60.11
I1205 08:00:00.147547 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-compiler.so.525.60.11
I1205 08:00:00.147582 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvcuvid.so.525.60.11
I1205 08:00:00.147649 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libcuda.so.525.60.11
I1205 08:00:00.147707 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libGLX_nvidia.so.525.60.11
I1205 08:00:00.147741 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libGLESv2_nvidia.so.525.60.11
I1205 08:00:00.147775 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.525.60.11
I1205 08:00:00.147811 24945 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libEGL_nvidia.so.525.60.11
W1205 08:00:00.147830 24945 nvc_info.c:399] missing library libnvidia-nscq.so
W1205 08:00:00.147836 24945 nvc_info.c:399] missing library libnvidia-fatbinaryloader.so
W1205 08:00:00.147842 24945 nvc_info.c:399] missing library libnvidia-pkcs11.so
W1205 08:00:00.147847 24945 nvc_info.c:399] missing library libvdpau_nvidia.so
W1205 08:00:00.147854 24945 nvc_info.c:399] missing library libnvidia-ifr.so
W1205 08:00:00.147859 24945 nvc_info.c:399] missing library libnvidia-cbl.so
W1205 08:00:00.147867 24945 nvc_info.c:403] missing compat32 library libnvidia-cfg.so
W1205 08:00:00.147873 24945 nvc_info.c:403] missing compat32 library libnvidia-nscq.so
W1205 08:00:00.147878 24945 nvc_info.c:403] missing compat32 library libcudadebugger.so
W1205 08:00:00.147887 24945 nvc_info.c:403] missing compat32 library libnvidia-fatbinaryloader.so
W1205 08:00:00.147893 24945 nvc_info.c:403] missing compat32 library libnvidia-allocator.so
W1205 08:00:00.147899 24945 nvc_info.c:403] missing compat32 library libnvidia-pkcs11.so
W1205 08:00:00.147904 24945 nvc_info.c:403] missing compat32 library libnvidia-ngx.so
W1205 08:00:00.147910 24945 nvc_info.c:403] missing compat32 library libvdpau_nvidia.so
W1205 08:00:00.147916 24945 nvc_info.c:403] missing compat32 library libnvidia-ifr.so
W1205 08:00:00.147921 24945 nvc_info.c:403] missing compat32 library libnvidia-rtcore.so
W1205 08:00:00.147926 24945 nvc_info.c:403] missing compat32 library libnvoptix.so
W1205 08:00:00.147932 24945 nvc_info.c:403] missing compat32 library libnvidia-cbl.so
I1205 08:00:00.148532 24945 nvc_info.c:299] selecting /usr/bin/nvidia-smi
I1205 08:00:00.148551 24945 nvc_info.c:299] selecting /usr/bin/nvidia-debugdump
I1205 08:00:00.148569 24945 nvc_info.c:299] selecting /usr/bin/nvidia-persistenced
I1205 08:00:00.148598 24945 nvc_info.c:299] selecting /usr/bin/nvidia-cuda-mps-control
I1205 08:00:00.148615 24945 nvc_info.c:299] selecting /usr/bin/nvidia-cuda-mps-server
W1205 08:00:00.148707 24945 nvc_info.c:425] missing binary nv-fabricmanager
W1205 08:00:00.148735 24945 nvc_info.c:349] missing firmware path /lib/firmware/nvidia/525.60.11/gsp.bin
I1205 08:00:00.148762 24945 nvc_info.c:529] listing device /dev/nvidiactl
I1205 08:00:00.148767 24945 nvc_info.c:529] listing device /dev/nvidia-uvm
I1205 08:00:00.148775 24945 nvc_info.c:529] listing device /dev/nvidia-uvm-tools
I1205 08:00:00.148781 24945 nvc_info.c:529] listing device /dev/nvidia-modeset
I1205 08:00:00.148809 24945 nvc_info.c:343] listing ipc path /run/nvidia-persistenced/socket
W1205 08:00:00.148831 24945 nvc_info.c:349] missing ipc path /var/run/nvidia-fabricmanager/socket
W1205 08:00:00.148847 24945 nvc_info.c:349] missing ipc path /tmp/nvidia-mps
I1205 08:00:00.148851 24945 nvc_info.c:822] requesting device information with ''
I1205 08:00:00.155221 24945 nvc_info.c:713] listing device /dev/nvidia0 (GPU-94c5d11e-e574-eefc-2db6-08e204f9e1a4 at 00000000:01:00.0)
NVRM version:   525.60.11
CUDA version:   12.0

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce RTX 3090
Brand:          GeForce
GPU UUID:       GPU-94c5d11e-e574-eefc-2db6-08e204f9e1a4
Bus Location:   00000000:01:00.0
Architecture:   8.6
I1205 08:00:00.155235 24945 nvc.c:434] shutting down library context
I1205 08:00:00.155296 24948 rpc.c:95] terminating nvcgo rpc service
I1205 08:00:00.155542 24945 rpc.c:135] nvcgo rpc service terminated successfully
I1205 08:00:00.156623 24947 rpc.c:95] terminating driver rpc service
I1205 08:00:00.156671 24945 rpc.c:135] driver rpc service terminated successfully

Hi guys,

I hit same issue on Ubuntu 22.04 LTS, I followed instructions to reinstall as bellow (note I have installed as well docker-desktop initially)

sudo apt-get purge docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-ce-rootless-extras
sudo rm -rf /var/lib/docker
sudo rm -rf /var/lib/containerd
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl restart docker

And I was able to run this machine from above comment: docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark and I was finally capable to run RAPIDS: docker run --gpus all --pull always --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p 8888:8888 -p 8787:8787 -p 8786:8786 rapidsai/notebooks:23.12a-cuda11.2-py3.10

At moment the docker-desktop is uninstalled. I will try to install it again and run the tests.

This worked for me. Thankyou so much

Looks like this just doesn’t work with docker desktop.

When you run the script that @bkocis shared - you’re installing docker-ce, most likely next to docker desktop. So the sudo version of docker runs the CE version, and the regular one will use your docker desktop version.

At least, this is what happens for me 😃

$ docker run --privileged --gpus all nvidia/cuda:12.2.2-runtime-ubuntu22.04 nvidia-smi
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
ERRO[0000] error waiting for container:                 
$ sudo docker run --privileged --gpus all nvidia/cuda:12.2.2-runtime-ubuntu22.04 nvidia-smi

==========
== CUDA ==
==========

CUDA Version 12.2.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Wed Oct 25 18:34:56 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  Off |
|  0%   51C    P0    72W / 450W |   1493MiB / 24564MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Before installing docker-ce, you’d get this error:

docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
See 'docker run --help'.

I have this issue unless I run as root. Using docker-desktop

❯ stat /usr/lib/libnvidia-ml.*
  File: /usr/lib/libnvidia-ml.so -> libnvidia-ml.so.1
  Size: 17              Blocks: 8          IO Block: 4096   symbolic link
Device: 0,26    Inode: 1753370     Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-09-01 13:46:31.971672402 -0600
Modify: 2023-08-22 11:37:15.000000000 -0600
Change: 2023-08-26 22:24:35.978217529 -0600
 Birth: 2023-08-26 22:24:35.978217529 -0600
  File: /usr/lib/libnvidia-ml.so.1 -> libnvidia-ml.so.535.104.05
  Size: 26              Blocks: 8          IO Block: 4096   symbolic link
Device: 0,26    Inode: 1753371     Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-09-01 13:46:31.971672402 -0600
Modify: 2023-08-22 11:37:15.000000000 -0600
Change: 2023-08-26 22:24:35.978217529 -0600
 Birth: 2023-08-26 22:24:35.978217529 -0600
  File: /usr/lib/libnvidia-ml.so.535.104.05
  Size: 1815872         Blocks: 3552       IO Block: 4096   regular file
Device: 0,26    Inode: 1753372     Links: 1
Access: (0777/-rwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-09-01 14:25:25.495551694 -0600
Modify: 2023-08-22 11:37:15.000000000 -0600
Change: 2023-09-01 14:24:06.427728737 -0600
 Birth: 2023-08-26 22:24:35.978217529 -0600

It also seems it is reproducable with the PKGBUILD i created here https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/issues/17#note_1530784413

here is my config.toml

disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false

[nvidia-container-cli]
#root = "/run/nvidia/driver"
path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
ldcache = "/etc/ld.so.cache"
load-kmods = true
no-cgroups = false
#user = "root:video"
ldconfig = "/sbin/ldconfig"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"

# Specify the runtimes to consider. This list is processed in order and the PATH
# searched for matching executables unless the entry is an absolute path.
runtimes = [
    "docker-runc",
    "runc",
]

mode = "auto"

    [nvidia-container-runtime.modes.csv]

    mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

All instructions were helpful, but I had to start docker, docker build, and docker run at root privileges to make it work!!! Even after repeated hard tries, unable to run at a user-level permissions.

I have the same error [nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1](https://github.com/NVIDIA/nvidia-container-toolkit/issues/154) when running docker without sudo.

Is there possible ways to get thing work without sudo?

All instructions were helpful, but I had to start docker, docker build, and docker run at root privileges to make it work!!! Even after repeated hard tries, unable to run at a user-level permissions.