tensorflow: TF 2.16.1 Fails to work with GPUs

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

TF 2.16.1

Custom code

No

OS platform and distribution

Linux Ubuntu 22.04.4 LTS

Mobile device

No response

Python version

3.10.12

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

12.4

GPU model and memory

No response

Current behavior?

I created a python venv in which I installed TF 2.16.1 following your instructions: pip install tensorflow When I run python, import tf, and issue tf.config.list_physical_devices(‘GPU’) I get an empty list [ ]

I created another python venv, installed TF 2.16.1, only this time with the instructions:

python3 -m pip install tensorflow[and-cuda]

When I run that version, import tensorflow as tf, and issue

tf.config.list_physical_devices(‘GPU’)

I also get an empty list.

BTW, I have no problems running on my box TF 2.15.1 with GPUs. Julia also works just fine with GPUs and so does PyTorch. the

Standalone code to reproduce the issue

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2024-03-09 19:15:45.018171: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-09 19:15:50.412646: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>> tf.__version__
'2.16.1'

tf.config.list_physical_devices('GPU') 
2024-03-09 19:16:28.923792: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-09 19:16:29.078379: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
>>>

Relevant log output

No response

About this issue

  • Original URL
  • State: open
  • Created 4 months ago
  • Reactions: 30
  • Comments: 95 (2 by maintainers)

Most upvoted comments

It’s just tensorflow can’t see the Cuda libraries.

Instal tensorflow[and-cuda] and add this to your .bashrc or conda activation script. Adjust python version in it according to your setup.

NVIDIA_PACKAGE_DIR=“$CONDA_PREFIX/lib/python3.12/site-packages/nvidia”

for dir in $NVIDIA_PACKAGE_DIR/*; do if [ -d “$dir/lib” ]; then export LD_LIBRARY_PATH=“$dir/lib:$LD_LIBRARY_PATH” fi done

You won’t need to install cuda or cudnn on the system. only the cuda libraries that are installed with $ pip install tensorflow[and-cuda] would be enough.

On Mon, Mar 11, 2024, 7:04 a.m. Juan E. Vargas @.***> wrote:

Hi Krzysztof

I visited the site

https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------

where I found an entry listed as " Local Installer for UBuntu22.04 x86_64(Deb)" which I downloaded. Unfortunately what I got is a package named “cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb” which is not the same as the name you suggest in your message, which is " libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb"

I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and the cuda12.2_amd64.deb separately and install both.

I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work with older versions of CUDA (12.2 or 12.3) because sooner or later the TF team will have to produce a version with the updated version of CUDA. IMHO, rather than us wasting time going back in versions, the TF beak should invest time going forward to update TF to the current CUDA version.

Thank you, Juan

On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski < @.***> wrote:

got it work 😃 first

https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------

then download Local Installer for Ubuntu22.04 x86_64 (Deb) < https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/>

unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb

`

sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb Selecting previously unselected package libcudnn8. (Reading database … 47318 files and directories currently installed.) Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb … Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) … Setting up libcudnn8 (8.9.7.29-1+cuda12.2) …

`

python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”

2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. [PhysicalDevice(name=‘/physical_device:GPU:0’, device_type=‘GPU’)]

— Reply to this email directly, view it on GitHub < https://github.com/tensorflow/tensorflow/issues/63362#issuecomment-1987970615>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU>

. You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/63362#issuecomment-1988394624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZRPJAGFJU5ZGBGHOSUK6DTYXWTU3AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGM4TINRSGQ . You are receiving this because you commented.Message ID: @.***>

I am closing this (unresolved issue) because I am told by the Keras/TF team that the issue is related to TF.

Almost final and automated fix below

I think the expected solution would be a new release that fixes this issue, so setting LD_LIBRARY_PATH is not needed like it is done in 2.15.1, it would be downgrade for users to do such workarounds it should just work with: pip install tensorflow[and-cuda]

Well, I wasted 8hr of my Sunday on this setting up another pc from scratch. Before reverting to the old version. Now looking to move off tensor flow.

Hi @JuanVargas ,

For GPU package you need to ensure the installation of CUDA driver which can be verified with nvidia-smi command. Then you need to install TF-cuda package with pip install tensorflow[and-cuda] which automatically installs required cuda/cudnn libraries.

I have checked in colab and able to detect GPU.Please refer attached gist.

I think the expected solution would be a new release that fixes this issue, so setting LD_LIBRARY_PATH is not needed like it is done in 2.15.1, it would be downgrade for users to do such workarounds it should just work with: pip install tensorflow[and-cuda]

@niko247 undoubtedly true. It is crystal clear that TF 2.16.1 does not work with the simple pip install tensorflow[and-cuda] command to actually utilize CUDA locally and no relative guidelines where provided yet to resolve this.

It seems practically impossible for someone owning a PC with CUDA-enabled GPU to perform deep learning experiments with TensorFlow version 2.16.1 and utilize his GPU locally without manually performing some extra steps not included (until today) in the official TensorFlow documentation of the standard installation procedure of TensorFlow for Linux users with GPUs at least as a temporal fix! That’s why I submitted the pull request in good faith and for the shake of all users as TensorFlow is "An Open Source Machine Learning Framework for Everyone".

Hope that the next patch version of TensorFlow will fix the bug as soon as possible!

Following on from the post by chaudharyachint08, I did the following to automate it on venv.

I editied bin/activate in the folder of my venv and added the two lines at the end of the file:

export NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
export LD_LIBRARY_PATH=$(echo ${NVIDIA_DIR}/*/lib/ | sed -r 's/\s+/:/g')${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Then while editing the same file, added the 2 unset lines inside the deactivate function (before the } closed curly bracket):

 unset NVIDIA_DIR
 unset LD_LIBRARY_PATH

I had tested it by entering the 2 lines in the terminal and my GPU was detected, so this was just the automation when the venv is activated.

I have the same problem with Ubuntu 22.04.4 with the following environment:

  • tensorflow==2.16.1
  • Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] on linux
  • cuDNN 8.6.0.163
  • gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

nvcc --version output:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Hello everyone, Had the same problem here, Managed to solve it! Thanks to @njzjz , setting TF_CPP_MAX_VLOG_LEVEL=3 shows more information:

soheil@soheil:~$ TF_CPP_MAX_VLOG_LEVEL=3 python -c "import tensorflow as tf;print(tf.config.list_physical_devices('GPU'))"
2024-03-18 00:32:05.073521: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcudart.so.12
2024-03-18 00:32:05.073737: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:855] GCS cache max size = 0 ; block size = 67108864 ; max staleness = 0
2024-03-18 00:32:05.073751: I external/local_tsl/tsl/platform/cloud/ram_file_block_cache.h:64] GCS file block cache is disabled
2024-03-18 00:32:05.073755: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:895] GCS DNS cache is disabled, because GCS_RESOLVE_REFRESH_SECS = 0 (or is not set)
2024-03-18 00:32:05.073758: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:925] GCS additional header DISABLED. No environment variable set.
2024-03-18 00:32:05.073762: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:306] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2024-03-18 00:32:05.073765: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:306] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2024-03-18 00:32:05.075498: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcudart.so.12
2024-03-18 00:32:05.103619: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:855] GCS cache max size = 0 ; block size = 67108864 ; max staleness = 0
2024-03-18 00:32:05.103641: I external/local_tsl/tsl/platform/cloud/ram_file_block_cache.h:64] GCS file block cache is disabled
2024-03-18 00:32:05.103645: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:895] GCS DNS cache is disabled, because GCS_RESOLVE_REFRESH_SECS = 0 (or is not set)
2024-03-18 00:32:05.103648: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:925] GCS additional header DISABLED. No environment variable set.
2024-03-18 00:32:05.103652: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:306] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2024-03-18 00:32:05.103655: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:306] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2024-03-18 00:32:05.103933: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-18 00:32:05.438768: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:855] GCS cache max size = 0 ; block size = 67108864 ; max staleness = 0
2024-03-18 00:32:05.438784: I external/local_tsl/tsl/platform/cloud/ram_file_block_cache.h:64] GCS file block cache is disabled
2024-03-18 00:32:05.438808: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:895] GCS DNS cache is disabled, because GCS_RESOLVE_REFRESH_SECS = 0 (or is not set)
2024-03-18 00:32:05.438812: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:925] GCS additional header DISABLED. No environment variable set.
2024-03-18 00:32:05.438839: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:306] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2024-03-18 00:32:05.438843: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:306] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2024-03-18 00:32:05.609776: I external/local_tsl/tsl/platform/default/dso_loader.cc:70] Could not load dynamic library 'libnvinfer.so.8.6.1'; dlerror: libnvinfer.so.8.6.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /var/lib:/usr/lib:/home/soheil/anaconda3/lib/python3.10/site-packages/nvidia/cudnn/lib/::/usr/local/cuda-12.4/lib64:/usr/local/cuda-12.4/lib64
2024-03-18 00:32:05.609875: I external/local_tsl/tsl/platform/default/dso_loader.cc:70] Could not load dynamic library 'libnvinfer_plugin.so.8.6.1'; dlerror: libnvinfer_plugin.so.8.6.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /var/lib:/usr/lib:/home/soheil/anaconda3/lib/python3.10/site-packages/nvidia/cudnn/lib/::/usr/local/cuda-12.4/lib64:/usr/local/cuda-12.4/lib64
2024-03-18 00:32:05.609882: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-18 00:32:05.937145: I external/local_xla/xla/parse_flags_from_env.cc:207] For env var TF_XLA_FLAGS found arguments:
2024-03-18 00:32:05.937182: I external/local_xla/xla/parse_flags_from_env.cc:209]   argv[0] = <argv[0]>
2024-03-18 00:32:05.937188: I external/local_xla/xla/parse_flags_from_env.cc:207] For env var TF_JITRT_FLAGS found arguments:
2024-03-18 00:32:05.937192: I external/local_xla/xla/parse_flags_from_env.cc:209]   argv[0] = <argv[0]>
2024-03-18 00:32:05.937215: I tensorflow/compiler/jit/xla_cpu_device.cc:46] Not creating XLA devices, tf_xla_enable_xla_devices not set and XLA device creation not requested
2024-03-18 00:32:05.937219: I tensorflow/compiler/jit/xla_gpu_device.cc:49] Not creating XLA devices, tf_xla_enable_xla_devices not set and XLA devices creation not required
2024-03-18 00:32:05.937662: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcuda.so.1
2024-03-18 00:32:05.983926: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:192] found DLL info with name: /lib/x86_64-linux-gnu/libcuda.so.1
2024-03-18 00:32:05.983958: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:197] found DLL info with resolved path: /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.14
2024-03-18 00:32:05.983978: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:97] version string "550.54.14" made value 550.54.14
2024-03-18 00:32:05.983999: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:968] trying to read NUMA node for device ordinal: 0
2024-03-18 00:32:05.984031: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-18 00:32:05.984184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2219] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.2GHz coreCount: 30 deviceMemorySize: 5.77GiB deviceMemoryBandwidth: 244.97GiB/s
2024-03-18 00:32:05.984205: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcudart.so.12
2024-03-18 00:32:06.002289: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcublas.so.12
2024-03-18 00:32:06.002375: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcublasLt.so.12
2024-03-18 00:32:06.004118: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcufft.so.11
2024-03-18 00:32:06.006168: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcusolver.so.11
2024-03-18 00:32:06.006254: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcusparse.so.12
2024-03-18 00:32:06.006336: I external/local_tsl/tsl/platform/default/dso_loader.cc:70] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /var/lib:/usr/lib:/home/soheil/anaconda3/lib/python3.10/site-packages/nvidia/cudnn/lib/::/usr/local/cuda-12.4/lib64:/usr/local/cuda-12.4/lib64
2024-03-18 00:32:06.006343: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

what I did, was searching for where cudnn is placed:

soheil@soheil:~$ sudo find / -name "*libcudnn.so*"
/home/soheil/.local/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn.so.8
/usr/lib/x86_64-linux-gnu/libcudnn.so.9
/usr/lib/x86_64-linux-gnu/libcudnn.so
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.0.0
find: ‘/run/user/1000/doc’: Permission denied
find: ‘/run/user/1000/gvfs’: Permission denied

adding /usr/lib/x86_64-linux-gnu to LD_LIBRARY_PATH solved my problem 😃

soheil@soheil:~$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu
soheil@soheil:~$ python -c "import tensorflow as tf;print(tf.config.list_physical_devices('GPU'))"
2024-03-18 00:35:24.292809: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-18 00:35:24.797787: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-18 00:35:25.127729: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-18 00:35:25.148460: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-18 00:35:25.148676: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Hope this helps

Not really a solution, but clarification and summary of what is happening so far. As per the investigations of @njzjz, we see that libcublas et al are loaded, whereas libcudnn is not. He also investigated inside a docker container and found that none of the libraries are loaded. He further updated the library path and found that the libraries were now loaded.

I have tried on my end and came to the same conclusion. I think what’s happening is that I have installed libcublas, libcudart, etcetera, from the Nvidia Fedora repo (which does not include libcudnn), which is why these libraries load but not libcudnn when I run after pip install tensorflow[and-cuda]. Even though all these libs exist in my venv lib dir. It seems that the libraries in /usr/local/cuda/lib64 are searched, but not the ones in the venv lib dir.

@njzjz’s and @sgkouzias’s solutions further support this. Clearly the current workaround is to follow their path altering advice.

I hope that this is useful to somebody. Hopefully there is a fix soon.

Hi Soheil,

I tried your suggestion of having LD_LIBRARY_PATH added, and then install tf.2.16.1. I am happy to report that your suggestion appears to work. I say “appears” only because I tested if tf can detect the GPUs via the command

  print(tf.config.list_physical_devices('GPU'))

which returns a non-empty list, so I assume this will work with commands too.

I hope the TF team fixes the issue soon.

Thank you !

Juan E. Vargas

@mihaimaruseac The least that Tensorflow team can do is to test and acknowledge the problem, or say it is not planned, or anything. Just ignoring the problem without even testing is not a supportive act at all.

I am of the opinion that just doing RC0 and then final release is not good testing. I hope the 2.16 situation was just a one off situation, to save time (it took the same amount of time for 2.16 with just one RC and final as for older releases with 3 RCs, multiple vulnerability fixes, etc.). I am no longer in the TensorFlow team, just helping here and there on the GitHub issues and PRs.

Thanks @sh-shahrokhi. I thought it was path related. Modified slightly to make it python version independent if you put it in your conda environment activation ([environment]/etc/activate.d/env_vars.sh).

NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
for dir in $NVIDIA_DIR/*; do
    if [ -d "$dir/lib" ]; then
        export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
    fi
done

This is not a resolution as this post install step should not be necessary.

Thanks for the workaround, but the path is wrong (conda is missing). Also the file name is arbitrary: ${CONDA_PREFIX}/etc/conda/activate.d/nvidia-lib-dirs.sh. Also the first line can be simplified to

NVIDIA_DIR=$(dirname $(python -c 'import nvidia;print(nvidia.__file__)'))

and if you don’t have spaces in your environment path, the rest can be simplified to:

export LD_LIBRARY_PATH=$(echo ${NVIDIA_DIR}/*/lib/ | sed -r 's/\s+/:/g')${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

got it work 😃 first https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------

then download Local Installer for Ubuntu22.04 x86_64 (Deb)

unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb

sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb   
                                                           
Selecting previously unselected package libcudnn8.
(Reading database ... 47318 files and directories currently installed.)
Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ...
Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ...
Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ...

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  

                             
2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

got it work 😃 first https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- then download Local Installer for Ubuntu22.04 x86_64 (Deb) unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb

sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb   
                                                           
Selecting previously unselected package libcudnn8.
(Reading database ... 47318 files and directories currently installed.)
Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ...
Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ...
Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ...
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  

                             
2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

works like a charm, thank you. 😃

@Sivabooshan congrats! However note that:

  1. @JuanVargas who raised the issue under discussion has a certain setup including CUDA version 12.4 which is not compatible with TensorFlow version 2.16.1 and that’s why he might need to install TensorFlow in a virtual environment so as to avoid downgrading CUDA and a potential systemic global pollution (if you install packages globally, they clutter your main Python installation and can potentially interfere with system processes. Virtual environments protect your system-wide environment from this),
  2. It turns out that when you pip install tensorflow[and-cuda] all required NVIDIA libraries are installed as well. You just need to configure manually the environment variables as appropriate in order to utilize them and run TensorFlow with GPU,
  3. Until today the officially documented TensorFlow standard installation procedure for Linux users with GPUs does not include the additional steps required to perform deep learning experiments with TensorFlow version 2.16.1 and utilize GPU locally. That’s why I submitted the pull request in good faith and for the shake of all users as TensorFlow is “An Open Source Machine Learning Framework for Everyone”.

Hope that the next patch version of TensorFlow will fix the bug as soon as possible!

There should also be instructions for venv users.

On Mon, Apr 8, 2024, 11:48 a.m. Sotiris Gkouzias @.***> wrote:

As I understand the issue it is clear from the discussion that users with Linux OS and CUDA-enabled GPUs in order to utilize their GPUs should manually perform some additional actions (namely: adjust the LD_LIBRARY_PATH environment variable to include the directory where cuDNN is located and locate a compatible version of ptxas in the site-packages directory of a Python installation, under a CUDA toolkit installation path virtual_environment/lib/python3.XX/site-packages/nvidia/cuda_nvcc/bin and add this specific path to the environment variables). @SuryanarayanaY https://github.com/SuryanarayanaY should that be officially communicated as part of the procedure to pip install tensorflow[and-cuda] for users with GPUs and Linux OS? Should it be fixed in the next versions of TensorFlow? I rest my case.

Agree. We used to have the instructions for setting the cuda path to the environment variable LD_LIBRARY_PATH in earlier versions. I think either we need to add these to documentation or atleast has to add a note in pip install guide that user has to setup the path for his own environment.

May be same instructions may not work for all environments and hence it might have discorded. Anyways adding a note on same in the installation guide should be must that can avoid confusions.

If anyone from here willing to contribute to add the required notes please feel free to help. The changes can be proposed here at this doc source https://github.com/tensorflow/docs/blob/master/site/en/install/pip.md which will be reflected in this page https://www.tensorflow.org/install/pip.

I created a respective pull request and it is pending review.

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/63362#issuecomment-2043324498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZRPJAEWMOXSOWF2BUOCU3DY4LJ7ZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBTGMZDINBZHA . You are receiving this because you were mentioned.Message ID: @.***>

Yes, please!

On Fri, Apr 5, 2024 at 12:35 PM Sotiris Gkouzias @.***> wrote:

As I understand the issue it is clear from the discussion that users with Linux OS and CUDA-enabled GPUs in order to utilize their GPUs should manually perform some additional actions (namely: adjust the LD_LIBRARY_PATH environment variable to include the directory where cuDNN is located and locate a compatible version of ptxas in the site-packages directory of a Python installation, under a CUDA toolkit installation path virtual_environment/lib/python3.10/site-packages/nvidia/cuda_nvcc/bin and add this specific path to the environment variables). @SuryanarayanaY https://github.com/SuryanarayanaY should that be officially communicated as part of the procedure to pip install tensorflow[and-cuda] for users with GPUs and Linux OS? Should it be fixed in the next versions of TensorFlow? I rest my case.

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/63362#issuecomment-2040228959, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGK34I2SYXBSCBDZNAQ6ATY33HFPAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBQGIZDQOJVHE . You are receiving this because you modified the open/close state.Message ID: @.***>

In colab and gpu VM and docker image, you have cuda cuda installed as a system lib. So tensorflow looks into /usr/lib and finds it. But the standard thing that has worked before 2.16, and the expexpeted thing from pip install tensorflow[and-cuda] is that tensorflow should also look into cuda libraries that were installed via pip. If it looks, it can find nvcc, cufft, cublas,… in there. The problem is that it just doesn’t consider them. Pytorch and tensorflow 2.15 do. You sure can install cuda as a system lib and 2.16 works, but it just would be unnecessary and impossible for users without admin rights. The env-var fix posted above just adds those pip installed cuda libraries to library path so tensorflow finds and uses them.

The fix is simple, you just needs to modify the logic that searches for cuda librariew. but it requires modifying cpp files and recompile tensorflow to be tested, which exceeds resources of most users.

On Fri, Apr 5, 2024, 5:49 a.m. Surya @.***> wrote:

As per this comment https://github.com/tensorflow/tensorflow/issues/62234#issuecomment-1997135428 for TF2.16v Cuda driver version >= 545 is needed. Please note that installation of cuda driver is manual and left it under user scope.

So, are we expecting a fix for the bug @SuryanarayanaY https://github.com/SuryanarayanaY ? Is it reported?

I think you are using CUDA driver 535 version. Can you upgrade it to higher as mentioned above.

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/63362#issuecomment-2039602159, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZRPJAGU5GVQ4HTMYNU6O2TY32FURAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZZGYYDEMJVHE . You are receiving this because you were mentioned.Message ID: @.***>

@SuryanarayanaY Steps to reproduce:

conda create -n tensorflow-test-new python=3.12 -y
conda activate tensorflow-test-new
pip install tensorflow[and-cuda]==2.16.1
python
import tensorflow as tf;print(tf.__version__);tf.config.list_physical_devices('GPU')

gives this message:

2024-04-05 13:54:50.483020: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-05 13:54:50.982460: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2.16.1
2024-04-05 13:54:51.320296: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-05 13:54:51.320653: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

I use rtx3080ti but I think it’s issue for all gpus nvidia-smi:

+---------------------------------------------------------------------------------------+
NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3   

But installing tensorflow==2.15.1 with python 3.10 works without additional steps

Hello everyone, Had the same problem here, Managed to solve it! Thanks to @njzjz , setting TF_CPP_MAX_VLOG_LEVEL=3 shows more information:

soheil@soheil:~$ TF_CPP_MAX_VLOG_LEVEL=3 python -c "import tensorflow as tf;print(tf.config.list_physical_devices('GPU'))"
2024-03-18 00:32:05.073521: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcudart.so.12
2024-03-18 00:32:05.073737: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:855] GCS cache max size = 0 ; block size = 67108864 ; max staleness = 0
2024-03-18 00:32:05.073751: I external/local_tsl/tsl/platform/cloud/ram_file_block_cache.h:64] GCS file block cache is disabled
2024-03-18 00:32:05.073755: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:895] GCS DNS cache is disabled, because GCS_RESOLVE_REFRESH_SECS = 0 (or is not set)
2024-03-18 00:32:05.073758: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:925] GCS additional header DISABLED. No environment variable set.
2024-03-18 00:32:05.073762: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:306] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2024-03-18 00:32:05.073765: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:306] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2024-03-18 00:32:05.075498: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcudart.so.12
2024-03-18 00:32:05.103619: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:855] GCS cache max size = 0 ; block size = 67108864 ; max staleness = 0
2024-03-18 00:32:05.103641: I external/local_tsl/tsl/platform/cloud/ram_file_block_cache.h:64] GCS file block cache is disabled
2024-03-18 00:32:05.103645: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:895] GCS DNS cache is disabled, because GCS_RESOLVE_REFRESH_SECS = 0 (or is not set)
2024-03-18 00:32:05.103648: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:925] GCS additional header DISABLED. No environment variable set.
2024-03-18 00:32:05.103652: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:306] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2024-03-18 00:32:05.103655: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:306] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2024-03-18 00:32:05.103933: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-18 00:32:05.438768: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:855] GCS cache max size = 0 ; block size = 67108864 ; max staleness = 0
2024-03-18 00:32:05.438784: I external/local_tsl/tsl/platform/cloud/ram_file_block_cache.h:64] GCS file block cache is disabled
2024-03-18 00:32:05.438808: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:895] GCS DNS cache is disabled, because GCS_RESOLVE_REFRESH_SECS = 0 (or is not set)
2024-03-18 00:32:05.438812: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:925] GCS additional header DISABLED. No environment variable set.
2024-03-18 00:32:05.438839: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:306] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2024-03-18 00:32:05.438843: I external/local_tsl/tsl/platform/cloud/gcs_file_system.cc:306] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2024-03-18 00:32:05.609776: I external/local_tsl/tsl/platform/default/dso_loader.cc:70] Could not load dynamic library 'libnvinfer.so.8.6.1'; dlerror: libnvinfer.so.8.6.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /var/lib:/usr/lib:/home/soheil/anaconda3/lib/python3.10/site-packages/nvidia/cudnn/lib/::/usr/local/cuda-12.4/lib64:/usr/local/cuda-12.4/lib64
2024-03-18 00:32:05.609875: I external/local_tsl/tsl/platform/default/dso_loader.cc:70] Could not load dynamic library 'libnvinfer_plugin.so.8.6.1'; dlerror: libnvinfer_plugin.so.8.6.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /var/lib:/usr/lib:/home/soheil/anaconda3/lib/python3.10/site-packages/nvidia/cudnn/lib/::/usr/local/cuda-12.4/lib64:/usr/local/cuda-12.4/lib64
2024-03-18 00:32:05.609882: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-18 00:32:05.937145: I external/local_xla/xla/parse_flags_from_env.cc:207] For env var TF_XLA_FLAGS found arguments:
2024-03-18 00:32:05.937182: I external/local_xla/xla/parse_flags_from_env.cc:209]   argv[0] = <argv[0]>
2024-03-18 00:32:05.937188: I external/local_xla/xla/parse_flags_from_env.cc:207] For env var TF_JITRT_FLAGS found arguments:
2024-03-18 00:32:05.937192: I external/local_xla/xla/parse_flags_from_env.cc:209]   argv[0] = <argv[0]>
2024-03-18 00:32:05.937215: I tensorflow/compiler/jit/xla_cpu_device.cc:46] Not creating XLA devices, tf_xla_enable_xla_devices not set and XLA device creation not requested
2024-03-18 00:32:05.937219: I tensorflow/compiler/jit/xla_gpu_device.cc:49] Not creating XLA devices, tf_xla_enable_xla_devices not set and XLA devices creation not required
2024-03-18 00:32:05.937662: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcuda.so.1
2024-03-18 00:32:05.983926: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:192] found DLL info with name: /lib/x86_64-linux-gnu/libcuda.so.1
2024-03-18 00:32:05.983958: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:197] found DLL info with resolved path: /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.14
2024-03-18 00:32:05.983978: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:97] version string "550.54.14" made value 550.54.14
2024-03-18 00:32:05.983999: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:968] trying to read NUMA node for device ordinal: 0
2024-03-18 00:32:05.984031: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-18 00:32:05.984184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2219] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.2GHz coreCount: 30 deviceMemorySize: 5.77GiB deviceMemoryBandwidth: 244.97GiB/s
2024-03-18 00:32:05.984205: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcudart.so.12
2024-03-18 00:32:06.002289: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcublas.so.12
2024-03-18 00:32:06.002375: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcublasLt.so.12
2024-03-18 00:32:06.004118: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcufft.so.11
2024-03-18 00:32:06.006168: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcusolver.so.11
2024-03-18 00:32:06.006254: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcusparse.so.12
2024-03-18 00:32:06.006336: I external/local_tsl/tsl/platform/default/dso_loader.cc:70] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /var/lib:/usr/lib:/home/soheil/anaconda3/lib/python3.10/site-packages/nvidia/cudnn/lib/::/usr/local/cuda-12.4/lib64:/usr/local/cuda-12.4/lib64
2024-03-18 00:32:06.006343: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

what I did, was searching for where cudnn is placed:

soheil@soheil:~$ sudo find / -name "*libcudnn.so*"
/home/soheil/.local/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn.so.8
/usr/lib/x86_64-linux-gnu/libcudnn.so.9
/usr/lib/x86_64-linux-gnu/libcudnn.so
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.0.0
find: ‘/run/user/1000/doc’: Permission denied
find: ‘/run/user/1000/gvfs’: Permission denied

adding /usr/lib/x86_64-linux-gnu to LD_LIBRARY_PATH solved my problem 😃

soheil@soheil:~$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu
soheil@soheil:~$ python -c "import tensorflow as tf;print(tf.config.list_physical_devices('GPU'))"
2024-03-18 00:35:24.292809: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-18 00:35:24.797787: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-18 00:35:25.127729: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-18 00:35:25.148460: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-18 00:35:25.148676: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Hope this helps

It does help! I had the same problem (Ubuntu 22.04, tensorflow 2.16.1): “Cannot dlopen some GPU libraries…”, even though the NVIDIA drivers were properly installed. Tensorflow could not find the libcudnn.so.8 shared library. Thanks to SoheilKhatibi for providing the solution that worked for me. In my case, I found the shared library in my Python virtual environment (named "venv1), under /venv1/lib/python3.10/site-packages/nvidia/cudnn/lib/. So adding a line in .bashrc (“export LD_LIBRARY_PATH=” pointing to the proper path) solved the problem. If you do this, don’t forget to reload .bashr using: source ~/.bashrc

My next problem was that, even though my tensorflow Python code worked fine when run from the terminal, the same script executed within PyCharm still threw the “Cannot dlopen some GPU libraries” error. To solve this, you need to go to PyCharm’s Run menu and select Edit configurations. In the left panel select the script for which you want to solve the GPU error, and open the Environment variables (right panel). Add the user environment variable LD_LIBRARY_PATH and give it the value corresponding to the path you’ve put in .bashr. Save and run, and the error is gone in Pycharm too!

Wanted to drop in and thank everyone for the sleuthing done up to this point. I had no idea about the environmental variable up there that allows you to debug the loading of the libraries so easily. That was neat.

Anyway, to amalgamate all of the suggestions in the thread, you can build an env_vars.sh script within envs/<your environment>/etc/conda/activate.d/ in an anaconda-like install that looks like the following:

NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))

export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib/:${LD_LIBRARY_PATH}
for dir in $(ls -1d $NVIDIA_DIR/*/); do
    if [ -d "${dir}lib" ]; then
        export LD_LIBRARY_PATH="${dir}lib:$LD_LIBRARY_PATH"
        if [[ $(basename $dir) == 'cuda_nvcc' ]] ; then
            export PATH="${dir}bin:$PATH"
        fi
    fi
done
export XLA_FLAGS=--xla_gpu_cuda_data_dir=${CONDA_PREFIX}/lib

You do indeed need the two dirname commands for TF 2.16. Similarly, you need to put the cuda_nvcc bin directory in the path so ptxas can be found. Doing it this way also generalizes to your specific conda environment. I’ve confirmed that you can perform the intro TF tutorial (https://www.tensorflow.org/tutorials/quickstart/beginner) with this workaround.

Having said all that, the TF Linux pip install page badly needs an update. The CUDA Toolkit and CUDNN version numbers are out of date. Since tensorflow[and-cuda] is now the recommended way to install everything for the GPU, the recommendation to install the other libraries independently is now useless as far as I can tell.

I had the same issue. Setting TF_CPP_MAX_VLOG_LEVEL=3 shows more information.

TF_CPP_MAX_VLOG_LEVEL=3 python -c "import tensorflow as tf;print(tf.config.list_physical_devices('GPU'))"
2024-03-13 17:53:09.566158: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcudart.so.12
2024-03-13 17:53:09.588990: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcublas.so.12
2024-03-13 17:53:09.589043: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcublasLt.so.12
2024-03-13 17:53:09.590693: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcufft.so.11
2024-03-13 17:53:09.593849: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcusolver.so.11
2024-03-13 17:53:09.593933: I external/local_tsl/tsl/platform/default/dso_loader.cc:59] Successfully opened dynamic library libcusparse.so.12
2024-03-13 17:53:09.594012: I external/local_tsl/tsl/platform/default/dso_loader.cc:70] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
Skipping registering GPU devices...
[]

It seems that TF 2.16 stops loading cuDNN from the Python site-packages directory (and other directories?), but other CUDA libraries are correctly loaded. However, it works after setting LD_LIBRARY_PATH manually

export LD_LIBRARY_PATH=~/anaconda3/lib/python3.10/site-packages/nvidia/cudnn/lib/:$LD_LIBRARY_PATH
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]

By following the work around my GPU was included in the list of physical devices indeed. However, I attempted to run a short deep learning model training script but failed with the following error message:

ptxas returned an error during compilation of ptx to sass: 'INTERNAL: ptxas 12.3.103 has a bug that we think can affect XLA. Please use a different version.' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

Success!! Ultimately, I located valid ptxas in ...lib/python3.10/site-packages/nvidia/cuda_nvcc/bin and manually added the specified path to my environment variables only within the conda virtual environment I created for TensorFlow version 2.16.1 It works like a charm!

Hi all,

If you have installed tensorflow[and-cuda] package it will install all the required & compatible cuda/cudnn related libraries. I am of opinion that installing tensorflow[and-cuda] package will take care of path setting also.If not please let us know. Only manual task is you need to install compatible cuda driver for the CUDA library. Please find the compatible cuda/cudnn versions required for TF2.16 version below.

Version Python version Compiler Build tools cuDNN CUDA tensorflow-2.16.1 3.9-3.12 Clang 17.0.6 Bazel 6.5.0 8.9 12.3 tensorflow-2.15.0 3.9-3.11 Clang 16.0.0 Bazel 6.1.0 8.9 12.2 As Mihai already commented here that TF will be tested against only one RC before release and later it will go ahead for forward compatibility.

It does not work unfortunately. It worked with TF<=2.15.1. You would need to test it on non-docker version, since that one actully installs cuda libraries in the container with apt, not pip. Colab is also not a good place to test, because if you do pip list, you can see that cuda libraries are not installed from pypi, but come from some other methods. For this, please test local linux or wsl installation that has only nvidia drivers, not anything else. run pip install tensorflow[and-cuda]==2.16.1in a conda / venv / … environment, and try to list the gpus. It will not work. Then test with 2.15.1 in another environment and see that it works.

@SuryanarayanaY @mihaimaruseac The least that Tensorflow team can do is to test and acknowledge the problem, or say it is not planned, or anything. Just ignoring the problem without even testing is not a supportive act at all.

I am having the same issue.

env: Ubuntu 22.04 + Python 3.10.13 + CUDA 12.4 + tensorflow 2.16.1

(dl-opt) zxiong@ws2:/mnt/work/dl-opt$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

(dl-opt) zxiong@ws2:/mnt/work/dl-opt$ nvidia-smi
Fri Mar 15 10:53:01 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080        Off |   00000000:01:00.0 Off |                  N/A |
|  0%   35C    P8             32W /  320W |       1MiB /  16376MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

(dl-opt) zxiong@ws2:/mnt/work/dl-opt$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-03-15 10:39:18.693721: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-15 10:39:18.717258: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-15 10:39:19.866184: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-15 10:39:21.671783: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

The thing is, JAX builds against multiple CUDA versions, whereas TF always pinned to just one version

I have given up on TensorRT. I guess I won’t be using it either.

This actually doesn’t change the fact that the new tensorflow version should be tested by google team before release, or the bugs should be fixed. It seems they only care about having a working docker image, not anything else.

Agreed. Installing TF has always been hit or miss and it seems that in the many years since I last used TF that hasn’t changed one bit.

Thanks @sh-shahrokhi. I thought it was path related. Modified slightly to make it python version independent if you put it in your conda environment activation ([environment]/etc/activate.d/env_vars.sh).

NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
for dir in $NVIDIA_DIR/*; do
    if [ -d "$dir/lib" ]; then
        export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
    fi
done

This is not a resolution as this post install step should not be necessary.

W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

I can’t seem to do similar tricks to resolve the TensorRT issues when installed similarly into the conda environment. Any ideas?

I don’t actually use TensorRT, but I would check if the required .so file for it is visible to tensorflow. Maybe I would need to find the name of required file in tensorflow source code.

This actually doesn’t change the fact that the new tensorflow version should be tested by google team before release, or the bugs should be fixed. It seems they only care about having a working docker image, not anything else.

Thanks @sh-shahrokhi. I thought it was path related. Modified slightly to make it python version independent if you put it in your conda environment activation ([environment]/etc/activate.d/env_vars.sh).

NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
for dir in $NVIDIA_DIR/*; do
    if [ -d "$dir/lib" ]; then
        export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
    fi
done

This is not a resolution as this post install step should not be necessary.

W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

I can’t seem to do similar tricks to resolve the TensorRT issues when installed similarly into the conda environment. Any ideas?

Hi Krzysztof

I visited the site https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------

where I found an entry listed as " Local Installer for UBuntu22.04 x86_64(Deb)" which I downloaded. Unfortunately what I got is a package named “cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb” which is not the same as the name you suggest in your message, which is " libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb"

I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and the cuda12.2_amd64.deb separately and install both.

I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work with older versions of CUDA (12.2 or 12.3) because sooner or later the TF team will have to produce a version with the updated version of CUDA. IMHO, rather than us wasting time going back in versions, the TF beak should invest time going forward to update TF to the current CUDA version.

Thank you, Juan

On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski < @.***> wrote:

got it work 😃 first

https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------

then download Local Installer for Ubuntu22.04 x86_64 (Deb) https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/

unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb

`

sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb Selecting previously unselected package libcudnn8. (Reading database … 47318 files and directories currently installed.) Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb … Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) … Setting up libcudnn8 (8.9.7.29-1+cuda12.2) …

`

python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”

2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. [PhysicalDevice(name=‘/physical_device:GPU:0’, device_type=‘GPU’)]

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/63362#issuecomment-1987970615, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU . You are receiving this because you were mentioned.Message ID: @.***>

Hello! I outlined this behavior in a duplicate ticket ( #65842 ). Torch also now installs its CUDA dependencies using the NVIDIA-managed pip packages. However, Torch doesn’t appear to require the LD_LIBRARY_PATH to be set for the linker, like TF still does. I assume this is because they’re manually sourcing libs from the venv. Is this functionality on the roadmap for TF?

Thank you!

I think the expected solution would be a new release that fixes this issue, so setting LD_LIBRARY_PATH is not needed like it is done in 2.15.1, it would be downgrade for users to do such workarounds it should just work with: pip install tensorflow[and-cuda]

Thanks! Downgrade to 2.15.1 works well for me too (with TF prob <=0.23).

conda create -n tmpenv python=3.11
conda activate tmpenv
pip install tensorflow[and-cuda]==2.15.1
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

pip install tensorflow-probability==0.23.0

There should also be instructions for venv users.

On Mon, Apr 8, 2024, 11:48 a.m. Sotiris Gkouzias @.***> wrote:

As I understand the issue it is clear from the discussion that users with Linux OS and CUDA-enabled GPUs in order to utilize their GPUs should manually perform some additional actions (namely: adjust the LD_LIBRARY_PATH environment variable to include the directory where cuDNN is located and locate a compatible version of ptxas in the site-packages directory of a Python installation, under a CUDA toolkit installation path virtual_environment/lib/python3.XX/site-packages/nvidia/cuda_nvcc/bin and add this specific path to the environment variables). @SuryanarayanaY https://github.com/SuryanarayanaY should that be officially communicated as part of the procedure to pip install tensorflow[and-cuda] for users with GPUs and Linux OS? Should it be fixed in the next versions of TensorFlow? I rest my case.

Agree. We used to have the instructions for setting the cuda path to the environment variable LD_LIBRARY_PATH in earlier versions. I think either we need to add these to documentation or atleast has to add a note in pip install guide that user has to setup the path for his own environment.

May be same instructions may not work for all environments and hence it might have discorded. Anyways adding a note on same in the installation guide should be must that can avoid confusions.

If anyone from here willing to contribute to add the required notes please feel free to help. The changes can be proposed here at this doc source https://github.com/tensorflow/docs/blob/master/site/en/install/pip.md which will be reflected in this page https://www.tensorflow.org/install/pip.

I created a respective pull request and it is pending review.

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/63362#issuecomment-2043324498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZRPJAEWMOXSOWF2BUOCU3DY4LJ7ZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBTGMZDINBZHA . You are receiving this because you were mentioned.Message ID: @.***>

I totally agree. Will try to update the pull request accordingly later on.

Updated the respective pull request (pending review) yesterday. The fix was successfully tested today by @weavermech as well.

Added instructions needed to resolve the ptxas issue.

There should also be instructions for venv users.

On Mon, Apr 8, 2024, 11:48 a.m. Sotiris Gkouzias @.***> wrote:

As I understand the issue it is clear from the discussion that users with Linux OS and CUDA-enabled GPUs in order to utilize their GPUs should manually perform some additional actions (namely: adjust the LD_LIBRARY_PATH environment variable to include the directory where cuDNN is located and locate a compatible version of ptxas in the site-packages directory of a Python installation, under a CUDA toolkit installation path virtual_environment/lib/python3.XX/site-packages/nvidia/cuda_nvcc/bin and add this specific path to the environment variables). @SuryanarayanaY https://github.com/SuryanarayanaY should that be officially communicated as part of the procedure to pip install tensorflow[and-cuda] for users with GPUs and Linux OS? Should it be fixed in the next versions of TensorFlow? I rest my case.

Agree. We used to have the instructions for setting the cuda path to the environment variable LD_LIBRARY_PATH in earlier versions. I think either we need to add these to documentation or atleast has to add a note in pip install guide that user has to setup the path for his own environment.

May be same instructions may not work for all environments and hence it might have discorded. Anyways adding a note on same in the installation guide should be must that can avoid confusions.

If anyone from here willing to contribute to add the required notes please feel free to help. The changes can be proposed here at this doc source https://github.com/tensorflow/docs/blob/master/site/en/install/pip.md which will be reflected in this page https://www.tensorflow.org/install/pip.

I created a respective pull request and it is pending review.

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/63362#issuecomment-2043324498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZRPJAEWMOXSOWF2BUOCU3DY4LJ7ZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBTGMZDINBZHA . You are receiving this because you were mentioned.Message ID: @.***>

I totally agree. Will try to update the pull request accordingly later on.

Updated the respective pull request (pending review) yesterday. The fix was successfully tested today by @weavermech as well.

I thank everyone here for leading me to a simple ‘fix’ for this issue…copy/paste. Sorry, but you guys are leaps and bounds above my understanding of Linux, tensorflow, and wsl2…I just wanted to tinker with a little ML using my rtx 3090. Since Windows apparently isn’t supported, here I am wondering what I’ve gotten myself into. If you are like me and can follow the most basic of instructions, here’s what I did after reading this thread…I used this command TF_CPP_MAX_VLOG_LEVEL=3 python -c “import tensorflow as tf;print(tf.config.list_physical_devices(‘GPU’))” and the output from that told me what libraries weren’t found. More importantly, it told me the one it did find. Because I use wsl2, Windows explorer was all I needed to find the unfindable library files, that were all there in their own nice little folders, including the one that was found. I simply copy/pasted all the files of the ones that weren’t found, into the same location of the one that was found. Worked for me, but one thing’s become abundantly clear…mileage may vary. Good luck. I can’t believe I spent two days trying to figure out all this command line stuff and bash, zsh, echo this and that…how is this even a thing?

As I understand the issue it is clear from the discussion that users with Linux OS and CUDA-enabled GPUs in order to utilize their GPUs should manually perform some additional actions (namely: adjust the LD_LIBRARY_PATH environment variable to include the directory where cuDNN is located and locate a compatible version of ptxas in the site-packages directory of a Python installation, under a CUDA toolkit installation path virtual_environment/lib/python3.XX/site-packages/nvidia/cuda_nvcc/bin and add this specific path to the environment variables). @SuryanarayanaY should that be officially communicated as part of the procedure to pip install tensorflow[and-cuda] for users with GPUs and Linux OS? Should it be fixed in the next versions of TensorFlow? I rest my case.

Agree. We used to have the instructions for setting the cuda path to the environment variable LD_LIBRARY_PATH in earlier versions. I think either we need to add these to documentation or atleast has to add a note in pip install guide that user has to setup the path for his own environment.

May be same instructions may not work for all environments and hence it might have discorded. Anyways adding a note on same in the installation guide should be must that can avoid confusions.

If anyone from here willing to contribute to add the required notes please feel free to help. The changes can be proposed here at this doc source which will be reflected in this page.

Wanted to stop by with an update. Tried a fresh install this morning via miniforge (anaconda) with python 3.11.8. Followed the instructions on the TF website with the simple pip install tensorflow[and-cuda]. Same thing – would not recognize the GPU. Adding in the env_vars.sh fix I noted above makes the GPU recognizable.

Output from nvidia-smi:

Fri Apr  5 09:27:21 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro RTX 4000 with Max...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   49C    P8              7W /   90W |       4MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2306      G   /usr/bin/gnome-shell                            2MiB |
+-----------------------------------------------------------------------------------------+

@SuryanarayanaY Steps to reproduce:

conda create -n tensorflow-test-new python=3.12 -y
conda activate tensorflow-test-new
pip install tensorflow[and-cuda]==2.16.1
python
import tensorflow as tf;print(tf.__version__);tf.config.list_physical_devices('GPU')

gives this message:

2024-04-05 13:54:50.483020: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-05 13:54:50.982460: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2.16.1
2024-04-05 13:54:51.320296: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-05 13:54:51.320653: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

I use rtx3080ti but I think it’s issue for all gpus nvidia-smi:

+---------------------------------------------------------------------------------------+
NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3   

But installing tensorflow==2.15.1 with python 3.10 works without additional steps

So, @SuryanarayanaY even with NVIDIA GPU driver version 545 still TF 2.16 could not find GPU devices. The bug should be reported and fixed.

Thanks for all hints. I have written all steps in my blog.

https://mobinshaterian.medium.com/use-gpu-in-tensorflow-on-ubuntu-22-04-f033e59cf5cb

With every year it becomes more and more complicated and difficult to setup tensorflow with nvidia-gpu support on windows. Following worked for me, which is a good example of very bad developer experience.

  1. Install CUDA-Toolkit 12.4 on host machine

  2. Setup Ubuntu 22.04.3 LTS from Windows Store and update it.

wsl --setdefault Ubuntu-22.04
wsl
sudo apt update
sudo apt upgrade
sudo apt install python3-pip
  1. Install tensorflow
python3 -m pip install tensorflow[and-cuda]==2.16.1

Downloading tensorflow-2.16.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Collecting nvidia-cuda-nvcc-cu12==12.3.107
Collecting nvidia-cusparse-cu12==12.2.0.103
Collecting nvidia-cuda-runtime-cu12==12.3.101
Collecting nvidia-curand-cu12==10.3.4.107
Collecting nvidia-cudnn-cu12==8.9.7.29
Collecting nvidia-cufft-cu12==11.0.12.1
Collecting nvidia-cuda-cupti-cu12==12.3.101
Collecting nvidia-cublas-cu12==12.3.4.1
Collecting nvidia-nccl-cu12==2.19.3
  1. Find the cuda libraries. Following doesn’t necessarily work. I had to find them by tracking where the tensorflow did the installation.
sudo find / -name "*libcudnn.so*"
sudo find / -name "*libcublas.so*"
sudo find / -name "*libcufft.so*"
sudo find / -name "*libcusolver.so*"
sudo find / -name "*libcusparse.so*"
  1. Use cuda libraries
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu;
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/python3.10/dist-packages/nvidia/cublas/lib;
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/python3.10/dist-packages/nvidia/cufft/lib;
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/python3.10/dist-packages/nvidia/cusolver/lib;
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/python3.10/dist-packages/nvidia/cusparse/lib;
  1. Run tensorflow check
TF_CPP_MAX_VLOG_LEVEL=3 python3 -c "import tensorflow as tf;print(tf.config.list_physical_devices('GPU'))"

Yes try downgrading TensorFlow with pip install tensorflow==2.15.0.post1 . At least this works fine for my Ubuntu20.04.

Thanks for the update.

On Sat, Mar 16, 2024, 12:51 James Paul Turner @.***> wrote:

Not really a solution, but clarification and summary of what is happening so far. As per the investigations of @njzjz https://github.com/njzjz, we see that libcublas et al are loaded, whereas libcudnn is not. He also investigated inside a docker container and found that none of the libraries are loaded. He further updated the library path and found that the libraries were now loaded.

I have tried on my end and came to the same conclusion. I think what’s happening is that I have installed libcublas, libcudart, etcetera, from the Nvidia Fedora repo (which does not include libcudnn), which is why these libraries load but not libcudnn when I run after pip install tensorflow[and-cuda]. Even though all these libs exist in my venv lib dir. It seems that the libraries in /usr/local/cuda/lib64 are searched, but not the ones in the venv lib dir.

@njzjz https://github.com/njzjz’s and @sgkouzias https://github.com/sgkouzias’s solutions further support this. Clearly the current workaround is to follow their path altering advice.

I hope that this is useful to somebody. Hopefully there is a fix soon.

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/63362#issuecomment-2002043884, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGK34M2VPGJUFFDNEXDVRTYYR2A7AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBSGA2DGOBYGQ . You are receiving this because you modified the open/close state.Message ID: @.***>

Seriously… I followed the instructions exactly. Yes I used tensorflow[and-cuda]. Under wsl and conda. I’m using vscode and it makes the conda environment in ./.conda Sticking with the previous pip release 2.15.1 then porting to plain pytorch and transformer.

On Fri, 15 Mar 2024, 15:09 Sotiris Gkouzias, @.***> wrote:

Hi all,

If you have installed tensorflow[and-cuda] package it will install all the required & compatible cuda/cudnn related libraries. I am of opinion that installing tensorflow[and-cuda] package will take care of path setting also.If not please let us know. Only manual task is you need to install compatible cuda driver for the CUDA library. Please find the compatible cuda/cudnn versions required for TF2.16 version below.

Version Python version Compiler Build tools cuDNN CUDA tensorflow-2.16.1 3.9-3.12 Clang 17.0.6 Bazel 6.5.0 8.9 12.3 tensorflow-2.15.0 3.9-3.11 Clang 16.0.0 Bazel 6.1.0 8.9 12.2 As Mihai already commented here https://github.com/tensorflow/tensorflow/issues/63362#issuecomment-1990133275 that TF will be tested against only one RC before release and later it will go ahead for forward compatibility.

Hi @SuryanarayanaY https://github.com/SuryanarayanaY

It is pretty obvious that installing tensorflow[and-cuda] package did not actually take care of path setting (I already mentioned the issue like everybody else did as well) ! As far for compatibility I am still trying to figure out how to deal with the error:

ptxas returned an error during compilation of ptx to sass: ‘INTERNAL: ptxas 12.3.103 has a bug that we think can affect XLA. Please use a different version.’ If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/63362#issuecomment-1999060539, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFZCXIPPKUWULEUPXES5Z6TYYKNCJAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJZGA3DANJTHE . You are receiving this because you commented.Message ID: @.***>

I believe that tensorflow[and-cuda] does not work as expected. For reproduction, I built a simple Docker image:

FROM python:3.11
RUN pip install tensorflow[and-cuda]

Build and run it:

docker build . -t test-tf216-cuda
docker run --gpus all test-tf216-cuda python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

The output shows the libraries cannot be found, as the same as the above.

2024-03-15 04:45:04.566105: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-15 04:45:05.342407: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-15 04:45:05.895918: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-15 04:45:05.896280: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-15 04:45:05.896898: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

Executing docker run --gpus all -e TF_CPP_MAX_VLOG_LEVEL=3 -e LD_DEBUG=libs test-tf216-cuda python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" will give more information, where I think the search path is entirely wrong. run.log

By following the work around my GPU was included in the list of physical devices indeed. However, I attempted to run a short deep learning model training script but failed with the following error message:

ptxas returned an error during compilation of ptx to sass: 'INTERNAL: ptxas 12.3.103 has a bug that we think can affect XLA. Please use a different version.' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

Actually you should have version 12.3.107, since it is defined in the extra [and-cuda]. Or did you install differently? Can you check which version you have with pip show nvidia-cuda-nvcc-cu12 and with conda list nvidia-nvcc? You should be able to update.

Same issue to me. Is there any official solution for it? I am using WSL2 just today, I don’t want to find and add the PATH manually. I am giving up TF.

Thanks @mihaimaruseac! I had it pinned, also tried with the SHA digest but then realized I had unpinned tensorflow-text in my requirements and that was upgrading TF as well 😛

Again, you are on the tensorflow repo. You closed the wrong issue.

You were asked to close the issue on keras, because it is not a keras issue. This is tensorflow, where proper detection of the cuda libs is definitely still an issue. Please reopen.

I closed the issue because I was asked to do so by a TF/Keras team member, for the reasons that I stated in the closing comment. Just like you (Leigh) I am disappointed, and Idecided not to waste more time. There are strong reasons for some of us (users) to prefer using our own hardware. I hope the TF/keras people get that message. Thanks, Juan

Message ID: @.***>

In general, we used to test RC versions before release. For example, we used to have RC0, RC1 and RC2 for TF 2.9. This gave people and downstream teams enough time to test and report issues.

It seems that 2.16.1 only had an RC0 (for 2.16.0).

The release process is (was?) like this:

  • cut the release branch (e.g., r2.17)
  • immediately trigger the release pipeline. This would create a few PRs to update version numbers, release notes, but after this step RC0 should be as close as possible to the version on master branch at the time the release branch has been cut. There should not be any code changes to the release branch at this point (except to maybe cherrypick fixes from master from hard bugs caused by cutting the branch at a wrong commit)
  • have at least a week of testing for downstream teams to test RC0
  • get fixes to discovered bugs landed on master, cherrypick them to release branch, after they are already tested on nightly releases
  • trigger RC1 pipeline. Again, no other code changes should occur now, except to fix bugs discovered during building
  • wait a week for downstream teams to test. If there are bugs, repeat the steps above for another RC, otherwise repeat the steps above for the final version.

Overall, this process would take number_of_RCs + 1 weeks with a possibility of a few more weeks of delay.

However, for 2.16 release, although the branch was cut on Feb 8th, there has been only one RC. Most likely issues can be solved by a patch release

I’m not sure if this is the root cause, but I resolved my own issue which also surfaced as a “Cannot dlopen some GPU libraries.” error when trying to run python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

To resolve my issue, I followed the tested build versions here: https://www.tensorflow.org/install/source#gpu

and I needed to update my existing installations from cuDNN 9 -> 8.9 and CUDA 12.4->12.3

When you’re on an NVIDIA download page like this one for CUDA Toolkit, don’t just download the latest version. See previous versions by hitting “Archive of Previous CUDA Releases”

@JuanVargas can you try uninstalling your existing CUDA installation to a tested build configuration for TF 2.16 by downgrading to CUDA 12.3?

I followed this post to uninstall my existing cuda installation: https://askubuntu.com/questions/530043/removing-nvidia-cuda-toolkit-and-installing-new-one

@DiegoMont can you try upgrading your cuDNN to 8.9 and CUDA to 12.3?