nvidia-container-toolkit: nvidia-container-cli reports incorrect CUDA driver version on WSL2

1. Issue or feature description

nvidia-container-cli on WSL2 is reporting CUDA 11.0 (and thus refusing to run containers with cuda>=11.1) even though CUDA toolkit 11.1 is installed in Linux. Windows 10 is build 20251.fe_release.201030-1438. Everything is installed as per the install guide, and CUDA containers do actually work (for example docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark successfully returns a benchmark).

Machine is a Dell XPS 15 9500 with an i9-10885H CPU, 64 GB RAM and an NVIDIA GeForce GTX 1650 Ti.

2. Steps to reproduce the issue

Install Windows 10 on the insider program with a version at or later than 20251.fe_release.201030-1438
Install the Windows CUDA drivers from here (this is 460.20 for me)
Install Ubuntu 20.04, the CUDA toolkit 11.1 and the container runtime as per the nvidia docs
Run nvidia-smi on the host - it should give a CUDA version of 11.2.
Check docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark correctly outputs benchmarks
In Linux, run nvidia-container-cli info. It incorrectly outputs CUDA version 11.0.

This command will also fail:

$ docker run --gpus all --rm -it nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 /bin/bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.1, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.

3. Information to attach (optional if deemed irrelevant)

Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info ncc.txt
Kernel version from uname -a Linux aphid 5.4.72-microsoft-standard-WSL2 NVIDIA/nvidia-docker#1 SMP Wed Oct 28 23:40:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Any relevant kernel output lines from dmesg
Driver information from nvidia-smi -a nvidia-smi.txt
Docker version from docker version 19.03.13
NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*' packages.txt
NVIDIA container library version from nvidia-container-cli -V ncc-version.txt
NVIDIA container library logs (see troubleshooting)
Docker command, image and tag used

$ docker run --gpus all --rm -it nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 /bin/bash 2>&1 docker-run.txt
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.1, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.

About this issue

Original URL
State: open
Created 4 years ago
Reactions: 4
Comments: 15 (5 by maintainers)

Most upvoted comments

Hey @danfairs . Thanks for reporting the issue. We have a fix in progress to address the fact that we report CUDA version 11.0 on WSL.

In the meantime you could use the NVIDIA_DISABLE_REQUIRE environment to skip the CUDA version check.

docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi

For reference: here is the merge request extending WSL support.

+67

elezar on Feb 12, 2021

@psychofisch as a workaround please start the container with NVIDIA_DISABLE_REQUIRE=true:

docker run --rm --name dallemini --gpus all -it -p 8888:8888 -v "${PWD}":/workspace -e NVIDIA_DISABLE_REQUIRE=true dalle-mini:latest

elezar on Jun 14, 2022

Sorry, but I’m not at all convinced NVIDIA_DISABLE_REQUIRE should be used. The container will start, true, but ML algos will fail to train the model later on (if they are properly directed to use the GPU, without automatic failover to the CPU). CUDA versions on the host and in the container must be in sync in my experience, just like glibc versions. IOW, CUDA Minor Versions Compatibility (as described in the docs here) is a bit of wishful thinking…

The most precise error message resulting from the use of NVIDIA_DISABLE_REQUIRE is given by Catboost:

CatBoostError: catboost/cuda/cuda_lib/cuda_base.h:281: CUDA error 803: system has unsupported display driver / cuda driver combination

mirekphd on Jan 7, 2023

@danfairs I solve my problems with upgrading my Win10 to version 20257.1. Follow official WSL2 guidelines.

opptimus on Nov 20, 2020