nvidia-container-toolkit: nvidia-container-cli reports incorrect CUDA driver version on WSL2
1. Issue or feature description
nvidia-container-cli
on WSL2 is reporting CUDA 11.0 (and thus refusing to run containers with cuda>=11.1) even though CUDA toolkit 11.1 is installed in Linux. Windows 10 is build 20251.fe_release.201030-1438. Everything is installed as per the install guide, and CUDA containers do actually work (for example docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
successfully returns a benchmark).
Machine is a Dell XPS 15 9500 with an i9-10885H CPU, 64 GB RAM and an NVIDIA GeForce GTX 1650 Ti.
2. Steps to reproduce the issue
- Install Windows 10 on the insider program with a version at or later than 20251.fe_release.201030-1438
- Install the Windows CUDA drivers from here (this is 460.20 for me)
- Install Ubuntu 20.04, the CUDA toolkit 11.1 and the container runtime as per the nvidia docs
- Run nvidia-smi on the host - it should give a CUDA version of 11.2.
- Check
docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
correctly outputs benchmarks - In Linux, run
nvidia-container-cli info
. It incorrectly outputs CUDA version 11.0.
This command will also fail:
$ docker run --gpus all --rm -it nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 /bin/bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.1, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.
3. Information to attach (optional if deemed irrelevant)
-
Some nvidia-container information:
nvidia-container-cli -k -d /dev/tty info
ncc.txt -
Kernel version from
uname -a
Linux aphid 5.4.72-microsoft-standard-WSL2 NVIDIA/nvidia-docker#1 SMP Wed Oct 28 23:40:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
-
Any relevant kernel output lines from
dmesg
-
Driver information from
nvidia-smi -a
nvidia-smi.txt -
Docker version from
docker version
19.03.13
-
NVIDIA packages version from
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
packages.txt -
NVIDIA container library version from
nvidia-container-cli -V
ncc-version.txt -
NVIDIA container library logs (see troubleshooting)
-
Docker command, image and tag used
$ docker run --gpus all --rm -it nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 /bin/bash 2>&1 docker-run.txt
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.1, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 4
- Comments: 15 (5 by maintainers)
Hey @danfairs . Thanks for reporting the issue. We have a fix in progress to address the fact that we report CUDA version 11.0 on WSL.
In the meantime you could use the
NVIDIA_DISABLE_REQUIRE
environment to skip the CUDA version check.For reference: here is the merge request extending WSL support.
@psychofisch as a workaround please start the container with
NVIDIA_DISABLE_REQUIRE=true
:Sorry, but I’m not at all convinced
NVIDIA_DISABLE_REQUIRE
should be used. The container will start, true, but ML algos will fail to train the model later on (if they are properly directed to use the GPU, without automatic failover to the CPU). CUDA versions on the host and in the container must be in sync in my experience, just likeglibc
versions. IOW, CUDA Minor Versions Compatibility (as described in the docs here) is a bit of wishful thinking…The most precise error message resulting from the use of
NVIDIA_DISABLE_REQUIRE
is given by Catboost:@danfairs I solve my problems with upgrading my Win10 to version 20257.1. Follow official WSL2 guidelines.