tensorflow: GPU not detected on WSL2

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

Source

binary

TensorFlow version

2.16.1

Custom code

OS platform and distribution

WSL2 Ubuntu 22.04

Mobile device

No response

Python version

3.11

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Hi,

I am trying to install run keras3 on wsl2. But the output of device listing indicates that there is still some dynamic library missing.

I followed the doc to finish the installation

I am using the latest NV driver, version of which can be find on nvidia-smi output.

I have managed to install CUDA 11.8 and cudnn 8.6 on my device which are the exact the same version listed on doc.

Asides from installing cudnn using Local Installer for Ubuntu22.04 x86_64 (Deb) on this page, I have also tried to manually copy the cudnn header files and lib files to cuda-11.8/include and cuda-11.8/lib and the LD_LIBRARY_PATH is updated according to this artical and this one

Anything I missed in my steps?

Regards Sichao Hu

Standalone code to reproduce the issue

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Relevant log output

TF output:
2024-03-09 16:22:45.531685: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-09 16:22:45.787261: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-09 16:22:46.518153: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-09 16:22:47.403086: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-09 16:22:47.474378: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]


This is the output of nvidia-smi:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   44C    P8             18W /  285W |    1125MiB /  16376MiB |      5%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+


This is the output of nvcc -V:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

About this issue

Original URL
State: closed
Created 4 months ago
Reactions: 6
Comments: 17

Most upvoted comments

i install cuda 12.3.2 and cudnn .then when i install tensorflow via pip install tensorflow[and-cuda] will auto installtensorflow 2.16.1 and can not find my GPU. when i input pip install tensorflow==2.15 . It will autoremove 2.16.1 and install 2.15.then I can find my GPU

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

bash output

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-03-13 02:41:48.797545: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-13 02:41:48.797600: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-13 02:41:48.798179: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-13 02:41:48.801510: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-13 02:41:49.293727: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-13 02:41:49.828549: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:08:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-13 02:41:49.934333: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:08:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-13 02:41:49.934403: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:08:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

machenme on Mar 12, 2024

Almost final and automated fix below

Where I found the resolution
- TF 2.16.1 Fails to work with GPUs
  - Solution proposed by “sh-shahrokhi”, improved by “ChristofKaufmann”
  - See specially Comment by COntributor
- Related Issues
  - GPU not detected on WSL2, where I have post some comments
  - Tensorflow WSL GPU CUDA recognition issue RTX3090
  - Once gain: tf.2.16.1 fails to recognize GPUs
- Other mention on social media
  - Reddit issue
Exact solution
- Temporary fix (after activating environment in which Tensorflow 2.16.1 is installed)
```
  export NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
  export LD_LIBRARY_PATH=$(echo ${NVIDIA_DIR}/*/lib/ | sed -r 's/\s+/:/g')${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
  
```
- Automating the variable set/unset process with Anaconda (one-time setup)
  - Activate your environement in which TF 2.16.1 is installed
  - Two files to be created in “anaconda3/envs/<ENV_NAME>/etc/conda”
    - anaconda3/envs/<ENV_NAME>/etc/conda/activate.d/env_vars.sh
```
  #!/bin/sh
  export NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
  export LD_LIBRARY_PATH=$(echo ${NVIDIA_DIR}/*/lib/ | sed -r 's/\s+/:/g')${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
  
```
    - anaconda3/envs/<ENV_NAME>/etc/conda/deactivate.d/env_vars.sh
```
  #!/bin/sh
  unset NVIDIA_DIR
  unset LD_LIBRARY_PATH
  
```
  - Official documentation to do this via conda.io
  - Stack-overflow question where I got this Set environment vars when activating conda env
What else helped me
- How to list all variables names and their current values?
- Conda documenation for “env_vars.sh” for activate/deactivate

chaudharyachint08 on Mar 22, 2024

@Zhaopudark thank you for sharing. what path shoud I set for those environment variables ? Make sure $Env:CUDA_PATH is given correctly Make sure $Env:LD_LIBRARY_PATH is given correctly Make sure $Env:TF_CUDA_PATHS is given correctly

That depends on your specific installation method of cuda and cudnn. For me, since I use miniconda and conda Installation, so my setting is as:
# pwsh
# if `conda activate my_env `
# an env variable `$Env:CONDA_PREFIX` will be `/home/your_name/miniconda3/envs/my_env `
# So, the setting is as:
$Env:CUDA_PATH = $Env:CONDA_PREFIX
$Env:LD_LIBRARY_PATH="$Env:CONDA_PREFIX/lib"
$Env:TF_CUDA_PATHS="$Env:CONDA_PREFIX"
Actually, either package manager installation or conda Installation, the $Env:CUDA_PATH should be such a root path where you can see cuda’s installed directories, such as ./nvvm, ./nvvm, ./extras/Debugger/lib64/libcudacore.a and so on. And, $Env:TF_CUDA_PATHS should be equal to $Env:CUDA_PATH.

Does your setup detect GPU with tensorflow 2.16.1? mine works with 2.15 but not this one.

sh-shahrokhi on Mar 10, 2024

Thanks @zjm008 and @Zhaopudark for providing with workarounds. I certainly agree that these are ways to mitigate this problem

However, while following the pip-installation guide I can see that

pip install tensorflow[and-cuda]

do download many cuda libraries including cuDNN as well, which was set to path variables as soon as you activate a conda environment which is a very nice functionality up to TF 2.15

I believe the developers might’ve missed the same while moving to TF 2.16

Do let me know if you two agree?

chaudharyachint08 on Mar 17, 2024

@chaudharyachint08. This weekend, I have shared my steps in this post. Maybe you can check it and transfer it to your own environment.

Zhaopudark on Mar 17, 2024

@Zhaopudark Thank you for your comment.

Regarding the cudnn and cuda version, I searched around and finally found a matrix here.

I will update my result once I have installed newer cuda and cudnn.

hugolden on Mar 10, 2024

@Zhaopudark thank you for sharing. what path shoud I set for those environment variables ? Make sure $Env:CUDA_PATH is given correctly Make sure $Env:LD_LIBRARY_PATH is given correctly Make sure $Env:TF_CUDA_PATHS is given correctly

That depends on your specific installation method of cuda and cudnn. For me, since I use miniconda and conda Installation, so my setting is as:

# pwsh
# if `conda activate my_env `
# an env variable `$Env:CONDA_PREFIX` will be `/home/your_name/miniconda3/envs/my_env `
# So, the setting is as:
$Env:CUDA_PATH = $Env:CONDA_PREFIX
$Env:LD_LIBRARY_PATH="$Env:CONDA_PREFIX/lib"
$Env:TF_CUDA_PATHS="$Env:CONDA_PREFIX"

Actually, either package manager installation or conda Installation, the $Env:CUDA_PATH should be such a root path where you can see cuda’s installed directories, such as ./nvvm, ./nvvm, ./extras/Debugger/lib64/libcudacore.a and so on. And, $Env:TF_CUDA_PATHS should be equal to $Env:CUDA_PATH.

Zhaopudark on Mar 9, 2024