nvidia-container-toolkit: nvidia-container-cli reports incorrect CUDA driver version on WSL2

1. Issue or feature description

nvidia-container-cli on WSL2 is reporting CUDA 11.0 (and thus refusing to run containers with cuda>=11.1) even though CUDA toolkit 11.1 is installed in Linux. Windows 10 is build 20251.fe_release.201030-1438. Everything is installed as per the install guide, and CUDA containers do actually work (for example docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark successfully returns a benchmark).

Machine is a Dell XPS 15 9500 with an i9-10885H CPU, 64 GB RAM and an NVIDIA GeForce GTX 1650 Ti.

2. Steps to reproduce the issue

  1. Install Windows 10 on the insider program with a version at or later than 20251.fe_release.201030-1438
  2. Install the Windows CUDA drivers from here (this is 460.20 for me)
  3. Install Ubuntu 20.04, the CUDA toolkit 11.1 and the container runtime as per the nvidia docs
  4. Run nvidia-smi on the host - it should give a CUDA version of 11.2.
  5. Check docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark correctly outputs benchmarks
  6. In Linux, run nvidia-container-cli info. It incorrectly outputs CUDA version 11.0.

This command will also fail:

$ docker run --gpus all --rm -it nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 /bin/bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.1, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.

3. Information to attach (optional if deemed irrelevant)

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info ncc.txt

  • Kernel version from uname -a Linux aphid 5.4.72-microsoft-standard-WSL2 NVIDIA/nvidia-docker#1 SMP Wed Oct 28 23:40:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • Any relevant kernel output lines from dmesg

  • Driver information from nvidia-smi -a nvidia-smi.txt

  • Docker version from docker version 19.03.13

  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*' packages.txt

  • NVIDIA container library version from nvidia-container-cli -V ncc-version.txt

  • NVIDIA container library logs (see troubleshooting)

  • Docker command, image and tag used

$ docker run --gpus all --rm -it nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 /bin/bash 2>&1 docker-run.txt
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.1, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 4
  • Comments: 15 (5 by maintainers)

Most upvoted comments

Hey @danfairs . Thanks for reporting the issue. We have a fix in progress to address the fact that we report CUDA version 11.0 on WSL.

In the meantime you could use the NVIDIA_DISABLE_REQUIRE environment to skip the CUDA version check.

docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi

For reference: here is the merge request extending WSL support.

@psychofisch as a workaround please start the container with NVIDIA_DISABLE_REQUIRE=true:

docker run --rm --name dallemini --gpus all -it -p 8888:8888 -v "${PWD}":/workspace -e NVIDIA_DISABLE_REQUIRE=true dalle-mini:latest

Sorry, but I’m not at all convinced NVIDIA_DISABLE_REQUIRE should be used. The container will start, true, but ML algos will fail to train the model later on (if they are properly directed to use the GPU, without automatic failover to the CPU). CUDA versions on the host and in the container must be in sync in my experience, just like glibc versions. IOW, CUDA Minor Versions Compatibility (as described in the docs here) is a bit of wishful thinking…

The most precise error message resulting from the use of NVIDIA_DISABLE_REQUIRE is given by Catboost:

CatBoostError: catboost/cuda/cuda_lib/cuda_base.h:281: CUDA error 803: system has unsupported display driver / cuda driver combination

@danfairs I solve my problems with upgrading my Win10 to version 20257.1. Follow official WSL2 guidelines.