tensorflow: GPU not detected on WSL2
Issue type
Build/Install
Have you reproduced the bug with TensorFlow Nightly?
No
Source
binary
TensorFlow version
2.16.1
Custom code
No
OS platform and distribution
WSL2 Ubuntu 22.04
Mobile device
No response
Python version
3.11
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
Hi,
I am trying to install run keras3 on wsl2. But the output of device listing indicates that there is still some dynamic library missing.
I followed the doc to finish the installation
I am using the latest NV driver, version of which can be find on nvidia-smi output.
I have managed to install CUDA 11.8 and cudnn 8.6 on my device which are the exact the same version listed on doc.
Asides from installing cudnn using Local Installer for Ubuntu22.04 x86_64 (Deb) on this page, I have also tried to manually copy the cudnn header files and lib files to cuda-11.8/include and cuda-11.8/lib and the LD_LIBRARY_PATH is updated according to this artical and this one
Anything I missed in my steps?
Regards Sichao Hu
Standalone code to reproduce the issue
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Relevant log output
TF output:
2024-03-09 16:22:45.531685: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-09 16:22:45.787261: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-09 16:22:46.518153: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-09 16:22:47.403086: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-09 16:22:47.474378: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
This is the output of nvidia-smi:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01 Driver Version: 551.76 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 ... On | 00000000:01:00.0 On | N/A |
| 0% 44C P8 18W / 285W | 1125MiB / 16376MiB | 5% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
This is the output of nvcc -V:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
About this issue
- Original URL
- State: closed
- Created 4 months ago
- Reactions: 6
- Comments: 17
i install cuda 12.3.2 and cudnn .then when i install tensorflow via
pip install tensorflow[and-cuda]will auto installtensorflow 2.16.1and can not find my GPU. when i inputpip install tensorflow==2.15. It will autoremove 2.16.1 and install 2.15.then I can find my GPUbash output
Almost final and automated fix below
Where I found the resolution
Exact solution
export NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))) export LD_LIBRARY_PATH=$(echo ${NVIDIA_DIR}/*/lib/ | sed -r 's/\s+/:/g')${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}#!/bin/sh export NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))) export LD_LIBRARY_PATH=$(echo ${NVIDIA_DIR}/*/lib/ | sed -r 's/\s+/:/g')${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}What else helped me
Does your setup detect GPU with tensorflow 2.16.1? mine works with 2.15 but not this one.
Thanks @zjm008 and @Zhaopudark for providing with workarounds. I certainly agree that these are ways to mitigate this problem
However, while following the pip-installation guide I can see that
do download many cuda libraries including cuDNN as well, which was set to path variables as soon as you activate a conda environment which is a very nice functionality up to TF 2.15
I believe the developers might’ve missed the same while moving to TF 2.16
Do let me know if you two agree?
@chaudharyachint08. This weekend, I have shared my steps in this post. Maybe you can check it and transfer it to your own environment.
@Zhaopudark Thank you for your comment.
Regarding the cudnn and cuda version, I searched around and finally found a matrix here.
I will update my result once I have installed newer cuda and cudnn.
That depends on your specific installation method of cuda and cudnn. For me, since I use miniconda and conda Installation, so my setting is as:
Actually, either package manager installation or conda Installation, the
$Env:CUDA_PATHshould be such a root path where you can see cuda’s installed directories, such as./nvvm,./nvvm,./extras/Debugger/lib64/libcudacore.aand so on. And,$Env:TF_CUDA_PATHSshould be equal to$Env:CUDA_PATH.