tensorflow: GPU not detected on WSL2

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

2.16.1

Custom code

No

OS platform and distribution

WSL2 Ubuntu 22.04

Mobile device

No response

Python version

3.11

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Hi,

I am trying to install run keras3 on wsl2. But the output of device listing indicates that there is still some dynamic library missing.

I followed the doc to finish the installation

I am using the latest NV driver, version of which can be find on nvidia-smi output.

I have managed to install CUDA 11.8 and cudnn 8.6 on my device which are the exact the same version listed on doc.

Asides from installing cudnn using Local Installer for Ubuntu22.04 x86_64 (Deb) on this page, I have also tried to manually copy the cudnn header files and lib files to cuda-11.8/include and cuda-11.8/lib and the LD_LIBRARY_PATH is updated according to this artical and this one

Anything I missed in my steps?

Regards Sichao Hu

Standalone code to reproduce the issue

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Relevant log output

TF output:
2024-03-09 16:22:45.531685: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-09 16:22:45.787261: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-09 16:22:46.518153: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-09 16:22:47.403086: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-09 16:22:47.474378: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]


This is the output of nvidia-smi:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   44C    P8             18W /  285W |    1125MiB /  16376MiB |      5%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+


This is the output of nvcc -V:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

About this issue

  • Original URL
  • State: closed
  • Created 4 months ago
  • Reactions: 6
  • Comments: 17

Most upvoted comments

i install cuda 12.3.2 and cudnn .then when i install tensorflow via pip install tensorflow[and-cuda] will auto installtensorflow 2.16.1 and can not find my GPU. when i input pip install tensorflow==2.15 . It will autoremove 2.16.1 and install 2.15.then I can find my GPU

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

bash output

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-03-13 02:41:48.797545: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-13 02:41:48.797600: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-13 02:41:48.798179: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-13 02:41:48.801510: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-13 02:41:49.293727: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-13 02:41:49.828549: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:08:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-13 02:41:49.934333: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:08:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-13 02:41:49.934403: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:08:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Almost final and automated fix below

@Zhaopudark thank you for sharing. what path shoud I set for those environment variables ? Make sure $Env:CUDA_PATH is given correctly Make sure $Env:LD_LIBRARY_PATH is given correctly Make sure $Env:TF_CUDA_PATHS is given correctly

That depends on your specific installation method of cuda and cudnn. For me, since I use miniconda and conda Installation, so my setting is as:

# pwsh
# if `conda activate my_env `
# an env variable `$Env:CONDA_PREFIX` will be `/home/your_name/miniconda3/envs/my_env `
# So, the setting is as:
$Env:CUDA_PATH = $Env:CONDA_PREFIX
$Env:LD_LIBRARY_PATH="$Env:CONDA_PREFIX/lib"
$Env:TF_CUDA_PATHS="$Env:CONDA_PREFIX"

Actually, either package manager installation or conda Installation, the $Env:CUDA_PATH should be such a root path where you can see cuda’s installed directories, such as ./nvvm, ./nvvm, ./extras/Debugger/lib64/libcudacore.a and so on. And, $Env:TF_CUDA_PATHS should be equal to $Env:CUDA_PATH.

Does your setup detect GPU with tensorflow 2.16.1? mine works with 2.15 but not this one.

Thanks @zjm008 and @Zhaopudark for providing with workarounds. I certainly agree that these are ways to mitigate this problem

However, while following the pip-installation guide I can see that

pip install tensorflow[and-cuda]

do download many cuda libraries including cuDNN as well, which was set to path variables as soon as you activate a conda environment which is a very nice functionality up to TF 2.15

I believe the developers might’ve missed the same while moving to TF 2.16

Do let me know if you two agree?

@chaudharyachint08. This weekend, I have shared my steps in this post. Maybe you can check it and transfer it to your own environment.

@Zhaopudark Thank you for your comment.

Regarding the cudnn and cuda version, I searched around and finally found a matrix here.

I will update my result once I have installed newer cuda and cudnn.

@Zhaopudark thank you for sharing. what path shoud I set for those environment variables ? Make sure $Env:CUDA_PATH is given correctly Make sure $Env:LD_LIBRARY_PATH is given correctly Make sure $Env:TF_CUDA_PATHS is given correctly

That depends on your specific installation method of cuda and cudnn. For me, since I use miniconda and conda Installation, so my setting is as:

# pwsh
# if `conda activate my_env `
# an env variable `$Env:CONDA_PREFIX` will be `/home/your_name/miniconda3/envs/my_env `
# So, the setting is as:
$Env:CUDA_PATH = $Env:CONDA_PREFIX
$Env:LD_LIBRARY_PATH="$Env:CONDA_PREFIX/lib"
$Env:TF_CUDA_PATHS="$Env:CONDA_PREFIX"

Actually, either package manager installation or conda Installation, the $Env:CUDA_PATH should be such a root path where you can see cuda’s installed directories, such as ./nvvm, ./nvvm, ./extras/Debugger/lib64/libcudacore.a and so on. And, $Env:TF_CUDA_PATHS should be equal to $Env:CUDA_PATH.