tensorflow: TensorFlow 2.7 does not detect CUDA installed through conda
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): 2.7.0
- Python version: 3.8
- Bazel version (if compiling from source): N/A
- GCC/Compiler version (if compiling from source): N/A
- CUDA/cuDNN version: 11.2/8.1
- GPU model and memory: GTX 2080Ti
Describe the current behavior
After installing cuda/cudnn through conda (conda install cudatoolkit=11.2 cudnn=8.1), TensorFlow 2.7 reports that it cannot find the cuda libraries.
2021-11-08 14:49:16.412959: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-08 14:49:16.413006: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-11-08 14:49:22.640508: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640617: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640698: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640776: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640853: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640941: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.641022: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.641099: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.641120: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and se
tup the required libraries for your platform.
Installing TensorFlow 2.6 (or earlier) in the same environment, with the same cuda/cudnn installation, doesn’t show any problem, it detects the libraries and GPU support works as expected.
The problem can be worked around by manually adding the conda lib directory to LD_LIBRARY_PATH (export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib). However, obviously this is not ideal, as it needs to be repeated/adjusted for every new conda environment. It would be better if TensorFlow just detected the conda installed libraries, as it did in TensorFlow < 2.7.
Describe the expected behavior
TensorFlow should detect cuda/cudnn libraries installed through conda, as it did in TensorFlow<2.7.
- Do you want to contribute a PR? (yes/no): no
- Briefly describe your candidate solution(if contributing):
Standalone code to reproduce the issue
conda create -n tmp python=3.8
conda activate tmp
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1
pip install "tensorflow==2.7.0"
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" # displays []
LD_LIBRARY_PATH=LD_LIBRARY_PATH:$CONDA_PREFIX/lib python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" # displays [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
pip install "tensorflow<2.7.0"
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" # displays [[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]]
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 27
- Comments: 30 (10 by maintainers)
I don’t want to be dismissive here, but there is a lack of understanding of the problem specifically introduced by TF 2.7:
This problem is not just a techie point, it does have deep implication for businesses that do real products. This method of working is the only reliable one for teams that work on more than one TF project, require multiple TF/CUDA/Python combinations on the same workstation (without root access). By the way, the CUDA stack from the official nvidia channel, like nvcc/ptxas perfectly work in conda and is recommended by Nvidia itself.
For my suffering peers, if you don’t have access to root, you can use this small poorly-documented feature in your environment.yml:
Upon conda activate, the env variables will be set for you, and unset on deactivation. Better than nothing, but might interfere with some other configuration…
For anyone looking for a one-liner solution, you can do
(with the environment you want to modify activated). This has a similar effect as @jesusdpa1’s solution here https://github.com/tensorflow/tensorflow/issues/52988#issuecomment-984025384, it’ll set
LD_LIBRARY_PATHwhen the environment is activated and unset it when it’s deactivated.You still need to repeat that for every new conda environment though. It would be better if TensorFlow just detected the conda installed libraries, as it did in TensorFlow<=2.6.
This seems to solve the issue:
conda activate ENVNAME
Edit ./etc/conda/activate.d/env_vars.sh as follows:
Edit ./etc/conda/deactivate.d/env_vars.sh as follows:
Source
https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#macos-and-linux
I’m not installing tensorflow from conda, just cuda/cudnn. Tensorflow is being installed from
piplike normal. And you can see in the reproduction steps I posted above that we’re starting from a new virtual environment (repeated below for convenience).Also note that nothing has changed on the conda side of things; we’re still using the exact same environment with the same cuda/cudnn libraries, but it works in TF 2.6 and fails in TF 2.7. So I don’t think the issue is on the conda side, something has changed in TensorFlow that has made this stop working.
@holongate 's env is a good workaround and solves the problem for me.
I’m quite astonished by how little thought was given on the issue - which is clearly a problem with TF 2.7 itself, and not with conda - and by how much time you waste on commenting that conda installs are not supported by Google.
@drasmuss @tbekolay
Btw… as someone who’s a bit involved in the conda-forge side, I can confidently say that the tensorflow version we ship (currently only up to 2.8.1) is a lot more performant than the one you get from PyPI and even more performant than the one you’d get from specialized containers (e.g. nvidia ngc). Give it a go.
If you need cuda, use this:
And will get you everything you need.
The official documentation suggests manually doing
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/every time you want to use TensorFlow, which obviously isn’t really a feasible solution.This is discussed above, but I’ll reiterate the main points here for anyone coming across this thread:
conda-forge(e.g.conda install -c conda-forge tensorflow). Generally speaking that should just work.conda-forge), the easiest solution is here https://github.com/tensorflow/tensorflow/issues/52988#issuecomment-1024604306.LD_LIBRARY_PATH(note that this is also a problem with the approach recommended in the official documentation). If you run into issues like that, you can try this approach https://github.com/tensorflow/tensorflow/issues/52988#issuecomment-1160849524, with the caveats mentioned there that this might break in future updates.I’ll reiterate again, that all of these solutions are a downgrade in the user experience from TensorFlow < 2.7, when TensorFlow just correctly detected the conda-installed CUDA libraries without any fiddling required from the user.
Right. Quite sad to see there is an army of TF guardians of the orthodoxy to censor my remarks about lack of interest for this kind of issue but none to engage in a conversation. And writing anything about the P-devil competition is almost instantly
torcheddeletedConda installs are not officially supported by Google
As mentioned, CUDA is being installed through conda, so
/usr/local/cuda-is not the correct path (the correct path is given in the original post:$CONDA_PREFIX/lib). However, hard coding that into.bashrcisn’t a solution, because$CONDA_PREFIXchanges depending on which conda environment you have active.I’m continuing to have issues related to this change. Using conda’s
libfolder as theLD_LIBRARY_PATHaffects too much of the system to be a good recommended solution. For me, when it renders my terminal useless becuase I can’t useless.The comment that “Conda installs are not officially supported by Google” might have been true at one point, but the official Tensorflow installation instructions now tell you to blindly
export LD_LIBRARY_PATH. The TensorFlow team should revert the change that was made in TensorFlow 2.7 and use theld.sosystem correctly.However, since the chances of that seem slim, here’s a workaround that is, in my opinion, better than the
LD_LIBRARY_PATHbecause it only loads the cuda/cudnn libraries and not everything in the$CONDA_PREFIX/libdirectory. There may be some downsides to this method, but so far it is working for me.Building on https://github.com/tensorflow/tensorflow/issues/52988#issuecomment-1024604306:
This will need to be updated when the list of libraries used by TensorFlow changes.
Open the terminal and type
nano ~/.bashrcat the end of the file add the following two lines
export PATH=$PATH:/usr/local/cuda-11.2/binexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2/lib64ensure no spaces on both side of ‘=’ sign.
if it still does not works, try adding for version 11.0
export PATH=$PATH:/usr/local/cuda-11.0/binexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.0/lib64Yes, the behaviour is the same in 2.11 and 2.12.0rc1 (I wouldn’t expect it to change between rc1 and the full 2.12 release).
Note that in 2.12 the error message has changed, so it displays
instead of the old “Could not load dynamic library…” errors, but it’s the same issue.
Hi @SuryanarayanaY,
See https://github.com/tensorflow/tensorflow/issues/52988#issuecomment-1323825349 for a summary of the discussion in this thread. The short answer is that no, that solution doesn’t address the issue.
Longer answer: The solution you describe from the docs is basically a worse version of idea 2 from that summary above. Worse in that it’s more complicated, and it won’t unset
LD_LIBRARY_PATHwhen the environment is deactivated. But as mentioned above, idea 2 is not really a viable solution becauseLD_LIBRARY_PATHis a global environment variable, and modifying it has negative side effects on lots of other system packages besides TensorFlow.And, to reiterate again, all of these “solutions” are downgrades from the behaviour prior to TensorFlow 2.7, where TensorFlow just correctly detected the CUDA libraries without requiring any manual intervention from users.
I installed Tensorflow 2.7 on Windows with CUDA 11.2 and cuDNN 8.1 (no conda involved). I received the same
Could not load dynamic libraryerrors. I switched to CUDA to 11.0 and it worked. I am guessing that the pip packages for Tensorflow 2.7 were accidentally built against CUDA 11.0 instead of 11.2.