tensorflow: CUDA 11.1 error on tf-nightly - libcusolver.so.10 not found
Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
- TensorFlow installed from (source or binary): pip install tf-nightly
- TensorFlow version: 2.4.0-dev20201011
- Python version: 3.8.3 (default, May 14 2020, 23:52:17)
- Installed using virtualenv? pip? conda?: pip
- Bazel version (if compiling from source): N/A
- GCC/Compiler version (if compiling from source): N/A
- CUDA/cuDNN version: 11.1 / 8.0.4.30
- GPU model and memory: RTX 2080 8GB Driver 455.23.05
Describe the problem
I’m trying to install tensorflow on a Linux machine with CUDA 11.1. I’m using tf-nightly, which supposedly supports CUDA 11 . It can find all libraries, except libcusolver.so.10
Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
Any idea if CUDA 11.1 should work with nightly, or is this still not supported?
I also tried to manually install libcusolver.so.10 from CUDA 10.0, reload ldconfig cache, etc, but still didn’t work; same error.
Thanks in advance!
Provide the exact sequence of commands / steps that you executed before running into the problem
Sample code:
import tensorflow as tf
import numpy as np
from tensorflow import keras
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
Any other info / logs
Output:
2020-10-11 21:31:34.848630: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-10-11 21:31:34.849049: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2020-10-11 21:31:34.870960: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-11 21:31:34.871193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 computeCapability: 7.5
coreClock: 1.8GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.23GiB/s
2020-10-11 21:31:34.871206: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2020-10-11 21:31:34.872541: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2020-10-11 21:31:34.873082: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2020-10-11 21:31:34.873266: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2020-10-11 21:31:34.873367: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2020-10-11 21:31:34.873738: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2020-10-11 21:31:34.873855: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2020-10-11 21:31:34.873863: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-10-11 21:31:34.874073: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-10-11 21:31:34.874568: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-10-11 21:31:34.874580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-11 21:31:34.874584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
$ dir /usr/local/cuda/lib64/libcusolver*
lrwxrwxrwx 1 root root 19 Oct 11 19:31 lib64/libcusolverMg.so -> libcusolverMg.so.11
lrwxrwxrwx 1 root root 26 Oct 11 19:31 lib64/libcusolverMg.so.11 -> libcusolverMg.so.11.0.0.74
-rw-r--r-- 1 root root 383M Sep 16 13:57 lib64/libcusolverMg.so.11.0.0.74
lrwxrwxrwx 1 root root 17 Oct 11 19:31 lib64/libcusolver.so -> libcusolver.so.11
lrwxrwxrwx 1 root root 24 Oct 11 19:31 lib64/libcusolver.so.11 -> libcusolver.so.11.0.0.74
-rw-r--r-- 1 root root 664M Sep 16 13:57 lib64/libcusolver.so.11.0.0.74
-rw-r--r-- 1 root root 187M Sep 16 13:57 lib64/libcusolver_static.a
Sun Oct 11 21:39:27 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 On | 00000000:01:00.0 On | N/A |
| 0% 39C P0 44W / 225W | 517MiB / 7979MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1920 G /usr/lib/xorg/Xorg 292MiB |
| 0 N/A N/A 2668 G /usr/bin/gnome-shell 115MiB |
| 0 N/A N/A 3067 G ...gAAAAAAAAA --shared-files 7MiB |
| 0 N/A N/A 3773 G ...AAAAAAAAA= --shared-files 74MiB |
+-----------------------------------------------------------------------------+
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 33
Links to this issue
Commits related to this issue
- Fix GPU detection on tensorflow Adds a symlink "libcusolver.so.10". See https://github.com/tensorflow/tensorflow/issues/43947#issuecomment-715295153 for further details. — committed to platiagro/kubeflow by fberanizo 3 years ago
- Try to upgrade cudnn to get libcusolver https://github.com/tensorflow/tensorflow/issues/43947#issuecomment-964749140 — committed to Kaggle/docker-rstats by Philmod 3 years ago
I had the same problem. Works (tested on a few modest Keras models) on CUDA 11.1 with this softlink: sudo ln -s /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.10
I will reply if the above quick fix breaks something.
OS Platform and Distribution: Ubuntu 20.04.1 (fresh installed today) TensorFlow installed from (source or binary): pip install tf-nightly TensorFlow version: 2.4.0-dev20201023 Python version: 3.8.5 Driver Version: 455.32.00
@Darqam
Awesome, thank you for sharing! To enhance,
python -c "import tensorflow.python as x; print(x.__path__[0])"
will give you the target dir, just make sure to use the correct Python executable (e.g. activate your venv etc). The modified one-liner that should work as is:For some reason the symlink did not work with the provided target, however placing it in the tensorflow site-packages worked properly
(Obviously replacing python3.8 with proper value as well as username)
Edit: For those finding their way here, I recommend seeing the extra details provided by hoefling below (https://github.com/tensorflow/tensorflow/issues/43947#issuecomment-727655093)
For the ones using (mini-) conda, the soft-links should be
The library seems to work without problem:
This works on my machine with RTX3090. Thanks!
Hi,
I hammered my python guideline straight and it worked perfectly … sudo ln -s /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11 /usr/local/lib/python3.8/dist-packages/tensorflow/python/libcusolver.so.10
I have the same error but the fix doesn’t work.
OS Platform and Distribution: Ubuntu 20.04.1 TensorFlow installed from (source or binary): pip install tf-nightly TensorFlow version: 2.4.0-dev20201023 Python version: 3.8.5 Driver Version: 455.38.00 CUDA Version: 11.1 Update 1 GPU: RTX 3070
The main problem is that I don’t have
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11
Thanks @JiazhengChai, that worked for me too. Here’s the specific command to install the newer cuda and cudnn libraries that avoided this issue for me: sudo apt-get install --no-install-recommends cuda-11-3 libcudnn8=8.2.1.32-1+cuda11.3 libcudnn8-dev=8.2.1.32-1+cuda11.3
To anyone still struggling with this - symlink should be placed in a folder which is included in
LD_LIBRARY_PATH
- doesn’t really matter which one you put it in. source is probably/usr/local/cuda-11.1/lib64/libcusolver.so.11
.so it should work for everyone as long as it is
ln -s /usr/local/cuda-11.1/lib64/libcusolver.so.11 [some path included in LD_LIBRARY_PATH]/libcusolver.so.10
. safest way is probably putting it in/usr/local/lib
since this folder should be included in LD_LIBRARY_PATH by default.Alternatively you can put a symlink to whatever directory you want and then append that directory to LD_LIBRARY_PATH with
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/my_folder/lib
If you have doubts (or don’t see TF output for searched library paths) just run
echo $LD_LIBRARY_PATH
to see which paths are searchedSpecially for users who aren’t administrator
My condition is: libcusolver.so.11 is not found but the I have the libcusolver.so.10. But the tricky problem is I am not the administrator for this system so I can’t create soft link for /usr/local/cuda/libcusolver.so.
Finally, I use
ln -s /usr/local/cuda/lib64/libcusolver.so.10 ~/anaconda3/envs/<my env>/lib/libcusolver.so.11
to fix that.Nice, Thx! For those running under Fedora 32 the following will do the trick:
sudo ln -s /usr/local/cuda-11.1/lib64/libcusolver.so.11 /usr/local/cuda-11.1/lib64/libcusolver.so.10
@cataluna84 Thanks. Exactly it was the problem caused by the
libcusolver.so.11
. It is not present in the CUDA 11.0 folder. This also indicates that the installation instructions on the official TF page should be updated to install CUDA 11.3 instead, which contains thelibcusolver.so.11
.It is also dubious whether the command
sudo ln -s
to make a softlink between thelibcusolver.so.10
and thelibcusolver.so.11
will work correctly. At least in my case, it gave the impression that it worked normally, but when I tried to run LSTM and CNN, an errorFail to find the dnn implementation.
occurred.In short, it would be nice if the Tensoflow team can update the installation instructions on the official page to save the user’s time.
Same issue with tf2.4 and cuda11-2. Thanks for the tips 😉
@guiambros Do you see the same error with CUDA 11.0. I think you need to downgrade your version of CUDA as the nightly only supports 11.0.Thanks!
Folks, please refer to the CUDA Toolkit Major Component Versions of the specific CUDA version you are using. I have played around with CUDA 11.0, and it had issues importing
libcusolver.so.11
since onlylibcusolver.so.10
is available there.Upgraded to CUDA 11.3 with cuDNN v8.2.0 plus the NVIDIA-465.31 driver, and everything worked like a charm with TensorFlow 2.5 !! 😍 🧡
Confirmation of my result:
ls /usr/local/cuda-11.3/lib64
This works for my:
sudo ln -s /opt/cuda/lib64/libcusolver.so.11 /opt/cuda/lib64/libcusolver.so.10
Environment:
Thanks @gowthamkpr. I removed CUDA 11.1 and installed CUDA 11.0 and now it’s working fine with nightly. I’ll keep on 11.0 for now. Thank you!