tensorflow: TensorFlow 2.7 does not detect CUDA installed through conda

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 2.7.0
Python version: 3.8
Bazel version (if compiling from source): N/A
GCC/Compiler version (if compiling from source): N/A
CUDA/cuDNN version: 11.2/8.1
GPU model and memory: GTX 2080Ti

Describe the current behavior

After installing cuda/cudnn through conda (conda install cudatoolkit=11.2 cudnn=8.1), TensorFlow 2.7 reports that it cannot find the cuda libraries.

2021-11-08 14:49:16.412959: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-08 14:49:16.413006: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-11-08 14:49:22.640508: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640617: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640698: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640776: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640853: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640941: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.641022: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.641099: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.641120: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and se
tup the required libraries for your platform.

Installing TensorFlow 2.6 (or earlier) in the same environment, with the same cuda/cudnn installation, doesn’t show any problem, it detects the libraries and GPU support works as expected.

The problem can be worked around by manually adding the conda lib directory to LD_LIBRARY_PATH (export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib). However, obviously this is not ideal, as it needs to be repeated/adjusted for every new conda environment. It would be better if TensorFlow just detected the conda installed libraries, as it did in TensorFlow < 2.7.

Describe the expected behavior

TensorFlow should detect cuda/cudnn libraries installed through conda, as it did in TensorFlow<2.7.

Contributing

Do you want to contribute a PR? (yes/no): no
Briefly describe your candidate solution(if contributing):

Standalone code to reproduce the issue

conda create -n tmp python=3.8
conda activate tmp
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1
pip install "tensorflow==2.7.0"
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  # displays []
LD_LIBRARY_PATH=LD_LIBRARY_PATH:$CONDA_PREFIX/lib python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  # displays [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
pip install "tensorflow<2.7.0"
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  # displays [[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]]

About this issue

Original URL
State: open
Created 3 years ago
Reactions: 27
Comments: 30 (10 by maintainers)

Most upvoted comments

I don’t want to be dismissive here, but there is a lack of understanding of the problem specifically introduced by TF 2.7:

A conda environment does install native libraries and does ensure they will be found by the os dynamic loader mechanism for the programs that want to find these libraries.
Until TF 2.7 this was the way it worked, like the gazillion other native apps (including cuda ones)
TF 2.7, not conda, specifically broke that by ignoring the os loading mechanism for an unknown/undocumented reason

This problem is not just a techie point, it does have deep implication for businesses that do real products. This method of working is the only reliable one for teams that work on more than one TF project, require multiple TF/CUDA/Python combinations on the same workstation (without root access). By the way, the CUDA stack from the official nvidia channel, like nvcc/ptxas perfectly work in conda and is recommended by Nvidia itself.

For my suffering peers, if you don’t have access to root, you can use this small poorly-documented feature in your environment.yml:

name: base-tf-cuda-env
channel:
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - python=3.8
# Install cuda libs + ptxas compiler from nvidia channel
# This will accelerate the compilation of kernels for your specific card
  - cudatoolkit=11
  - cudnn=8
  - cupti=11
  - cuda-nvcc
...
  - pip
  - pip:
     - tensorflow==2.7.*
variables:
  # In case you want to see your own logs and tame the TF loggorrhea
  TF_CPP_MIN_LOG_LEVEL: 3
  # Adjust to point to your local env path:
  LD_LIBRARY_PATH: /home/me/.conda/envs/thisenvname/lib

Upon conda activate, the env variables will be set for you, and unset on deactivation. Better than nothing, but might interfere with some other configuration…

+29

holongate on Dec 13, 2021

For anyone looking for a one-liner solution, you can do

conda env config vars set LD_LIBRARY_PATH=$CONDA_PREFIX/lib

(with the environment you want to modify activated). This has a similar effect as @jesusdpa1’s solution here https://github.com/tensorflow/tensorflow/issues/52988#issuecomment-984025384, it’ll set LD_LIBRARY_PATH when the environment is activated and unset it when it’s deactivated.

You still need to repeat that for every new conda environment though. It would be better if TensorFlow just detected the conda installed libraries, as it did in TensorFlow<=2.6.

+21

drasmuss on Jan 28, 2022

This seems to solve the issue:

conda activate ENVNAME

cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh

Edit ./etc/conda/activate.d/env_vars.sh as follows:

#!/bin/sh

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib

Edit ./etc/conda/deactivate.d/env_vars.sh as follows:

#!/bin/sh

unset LD_LIBRARY_PATH

Source

https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#macos-and-linux

+13

jesusdpa1 on Dec 1, 2021

I’m not installing tensorflow from conda, just cuda/cudnn. Tensorflow is being installed from pip like normal. And you can see in the reproduction steps I posted above that we’re starting from a new virtual environment (repeated below for convenience).

conda create -n tmp python=3.8
conda activate tmp
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1
pip install "tensorflow==2.7.0"
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  # displays []
LD_LIBRARY_PATH=LD_LIBRARY_PATH:$CONDA_PREFIX/lib python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  # displays [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
pip install "tensorflow<2.7.0"
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  # displays [[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]]

Also note that nothing has changed on the conda side of things; we’re still using the exact same environment with the same cuda/cudnn libraries, but it works in TF 2.6 and fails in TF 2.7. So I don’t think the issue is on the conda side, something has changed in TensorFlow that has made this stop working.

+12

drasmuss on Nov 9, 2021

Upon conda activate, the env variables will be set for you, and unset on deactivation. Better than nothing, but might interfere with some other configuration…

@holongate 's env is a good workaround and solves the problem for me.

I’m quite astonished by how little thought was given on the issue - which is clearly a problem with TF 2.7 itself, and not with conda - and by how much time you waste on commenting that conda installs are not supported by Google.

filippocastelli on Jan 26, 2022

@drasmuss @tbekolay

Btw… as someone who’s a bit involved in the conda-forge side, I can confidently say that the tensorflow version we ship (currently only up to 2.8.1) is a lot more performant than the one you get from PyPI and even more performant than the one you’d get from specialized containers (e.g. nvidia ngc). Give it a go.

If you need cuda, use this:

CONDA_OVERRIDE_CUDA="11.2" conda create -n cftf tensorflow==2.8.1=*cuda112* -c conda-forge

And will get you everything you need.

ngam on Jun 21, 2022

The official documentation suggests manually doing export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/ every time you want to use TensorFlow, which obviously isn’t really a feasible solution.

This is discussed above, but I’ll reiterate the main points here for anyone coming across this thread:

Currently the best solution is to use the community-maintained TensorFlow installation from conda-forge (e.g. conda install -c conda-forge tensorflow). Generally speaking that should just work.
If 1. isn’t possible/working for some reason (e.g. because you need to use a very recent release of TensorFlow that isn’t yet available on conda-forge), the easiest solution is here https://github.com/tensorflow/tensorflow/issues/52988#issuecomment-1024604306.
However, sometimes 2. can cause problems with other system packages, since you’re modifying the global LD_LIBRARY_PATH (note that this is also a problem with the approach recommended in the official documentation). If you run into issues like that, you can try this approach https://github.com/tensorflow/tensorflow/issues/52988#issuecomment-1160849524, with the caveats mentioned there that this might break in future updates.

I’ll reiterate again, that all of these solutions are a downgrade in the user experience from TensorFlow < 2.7, when TensorFlow just correctly detected the conda-installed CUDA libraries without any fiddling required from the user.

drasmuss on Nov 22, 2022

Upon conda activate, the env variables will be set for you, and unset on deactivation. Better than nothing, but might interfere with some other configuration…

@holongate 's env is a good workaround and solves the problem for me.

I’m quite astonished by how little thought was given on the issue - which is clearly a problem with TF 2.7 itself, and not with conda - and by how much time you waste on commenting that conda installs are not supported by Google.

Right. Quite sad to see there is an army of TF guardians of the orthodoxy to censor my remarks about lack of interest for this kind of issue but none to engage in a conversation. And writing anything about the P-devil competition is almost instantly ~~torched~~ deleted

holongate on Feb 26, 2022

Conda installs are not officially supported by Google

mihaimaruseac on Nov 16, 2021

As mentioned, CUDA is being installed through conda, so /usr/local/cuda- is not the correct path (the correct path is given in the original post: $CONDA_PREFIX/lib). However, hard coding that into .bashrc isn’t a solution, because $CONDA_PREFIX changes depending on which conda environment you have active.

drasmuss on Nov 12, 2021

I’m continuing to have issues related to this change. Using conda’s lib folder as the LD_LIBRARY_PATH affects too much of the system to be a good recommended solution. For me, when it renders my terminal useless becuase I can’t use less.

The comment that “Conda installs are not officially supported by Google” might have been true at one point, but the official Tensorflow installation instructions now tell you to blindly export LD_LIBRARY_PATH. The TensorFlow team should revert the change that was made in TensorFlow 2.7 and use the ld.so system correctly.

However, since the chances of that seem slim, here’s a workaround that is, in my opinion, better than the LD_LIBRARY_PATH because it only loads the cuda/cudnn libraries and not everything in the $CONDA_PREFIX/lib directory. There may be some downsides to this method, but so far it is working for me.

Building on https://github.com/tensorflow/tensorflow/issues/52988#issuecomment-1024604306:

conda env config vars set LD_PRELOAD=$CONDA_PREFIX/lib/libcudart.so:$CONDA_PREFIX/lib/libcublas.so:$CONDA_PREFIX/lib/libcublasLt.so:$CONDA_PREFIX/lib/libcufft.so:$CONDA_PREFIX/lib/libcurand.so:$CONDA_PREFIX/lib/libcusolver.so:$CONDA_PREFIX/lib/libcusparse.so:$CONDA_PREFIX/lib/libcudnn.so

This will need to be updated when the list of libraries used by TensorFlow changes.

tbekolay on Jun 20, 2022

Open the terminal and type

nano ~/.bashrc

at the end of the file add the following two lines

export PATH=$PATH:/usr/local/cuda-11.2/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2/lib64

ensure no spaces on both side of ‘=’ sign.

if it still does not works, try adding for version 11.0

export PATH=$PATH:/usr/local/cuda-11.0/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.0/lib64

pradyyadav on Nov 12, 2021

Yes, the behaviour is the same in 2.11 and 2.12.0rc1 (I wouldn’t expect it to change between rc1 and the full 2.12 release).

Note that in 2.12 the error message has changed, so it displays

2023-03-13 14:41:41.580759: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-03-13 14:41:41.602435: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.

instead of the old “Could not load dynamic library…” errors, but it’s the same issue.

drasmuss on Mar 13, 2023

Hi @SuryanarayanaY,

See https://github.com/tensorflow/tensorflow/issues/52988#issuecomment-1323825349 for a summary of the discussion in this thread. The short answer is that no, that solution doesn’t address the issue.

Longer answer: The solution you describe from the docs is basically a worse version of idea 2 from that summary above. Worse in that it’s more complicated, and it won’t unset LD_LIBRARY_PATH when the environment is deactivated. But as mentioned above, idea 2 is not really a viable solution because LD_LIBRARY_PATH is a global environment variable, and modifying it has negative side effects on lots of other system packages besides TensorFlow.

And, to reiterate again, all of these “solutions” are downgrades from the behaviour prior to TensorFlow 2.7, where TensorFlow just correctly detected the CUDA libraries without requiring any manual intervention from users.

drasmuss on Mar 1, 2023

I installed Tensorflow 2.7 on Windows with CUDA 11.2 and cuDNN 8.1 (no conda involved). I received the same Could not load dynamic library errors. I switched to CUDA to 11.0 and it worked. I am guessing that the pip packages for Tensorflow 2.7 were accidentally built against CUDA 11.0 instead of 11.2.

ddaspit on Nov 29, 2021