tensorflow: CUDA 11.1 error on tf-nightly - libcusolver.so.10 not found

Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
  • TensorFlow installed from (source or binary): pip install tf-nightly
  • TensorFlow version: 2.4.0-dev20201011
  • Python version: 3.8.3 (default, May 14 2020, 23:52:17)
  • Installed using virtualenv? pip? conda?: pip
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: 11.1 / 8.0.4.30
  • GPU model and memory: RTX 2080 8GB Driver 455.23.05

Describe the problem

I’m trying to install tensorflow on a Linux machine with CUDA 11.1. I’m using tf-nightly, which supposedly supports CUDA 11 . It can find all libraries, except libcusolver.so.10

Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory

Any idea if CUDA 11.1 should work with nightly, or is this still not supported?

I also tried to manually install libcusolver.so.10 from CUDA 10.0, reload ldconfig cache, etc, but still didn’t work; same error.

Thanks in advance!

Provide the exact sequence of commands / steps that you executed before running into the problem

Sample code:

import tensorflow as tf
import numpy as np
from tensorflow import keras
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])

Any other info / logs

Output:

2020-10-11 21:31:34.848630: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-10-11 21:31:34.849049: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2020-10-11 21:31:34.870960: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-11 21:31:34.871193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 computeCapability: 7.5
coreClock: 1.8GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.23GiB/s
2020-10-11 21:31:34.871206: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2020-10-11 21:31:34.872541: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2020-10-11 21:31:34.873082: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2020-10-11 21:31:34.873266: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2020-10-11 21:31:34.873367: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2020-10-11 21:31:34.873738: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2020-10-11 21:31:34.873855: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2020-10-11 21:31:34.873863: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-10-11 21:31:34.874073: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-10-11 21:31:34.874568: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-10-11 21:31:34.874580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-11 21:31:34.874584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
$ dir /usr/local/cuda/lib64/libcusolver*
lrwxrwxrwx 1 root root   19 Oct 11 19:31 lib64/libcusolverMg.so -> libcusolverMg.so.11
lrwxrwxrwx 1 root root   26 Oct 11 19:31 lib64/libcusolverMg.so.11 -> libcusolverMg.so.11.0.0.74
-rw-r--r-- 1 root root 383M Sep 16 13:57 lib64/libcusolverMg.so.11.0.0.74
lrwxrwxrwx 1 root root   17 Oct 11 19:31 lib64/libcusolver.so -> libcusolver.so.11
lrwxrwxrwx 1 root root   24 Oct 11 19:31 lib64/libcusolver.so.11 -> libcusolver.so.11.0.0.74
-rw-r--r-- 1 root root 664M Sep 16 13:57 lib64/libcusolver.so.11.0.0.74
-rw-r--r-- 1 root root 187M Sep 16 13:57 lib64/libcusolver_static.a
Sun Oct 11 21:39:27 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    On   | 00000000:01:00.0  On |                  N/A |
|  0%   39C    P0    44W / 225W |    517MiB /  7979MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1920      G   /usr/lib/xorg/Xorg                292MiB |
|    0   N/A  N/A      2668      G   /usr/bin/gnome-shell              115MiB |
|    0   N/A  N/A      3067      G   ...gAAAAAAAAA --shared-files        7MiB |
|    0   N/A  N/A      3773      G   ...AAAAAAAAA= --shared-files       74MiB |
+-----------------------------------------------------------------------------+

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 33

Commits related to this issue

Most upvoted comments

@guiambros did you find a way to make it work with CUDA 11.1?

I had the same problem. Works (tested on a few modest Keras models) on CUDA 11.1 with this softlink: sudo ln -s /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.10

I will reply if the above quick fix breaks something.

OS Platform and Distribution: Ubuntu 20.04.1 (fresh installed today) TensorFlow installed from (source or binary): pip install tf-nightly TensorFlow version: 2.4.0-dev20201023 Python version: 3.8.5 Driver Version: 455.32.00

@Darqam

For some reason the symlink did not work with the provided target, however placing it in the tensorflow site-packages worked properly

sudo ln -s /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11 /home/YourUsernameHere/.local/lib/python3.8/site-packages/tensorflow/python/libcusolver.so.10

(Obviously replacing python3.8 with proper value as well as username)

Awesome, thank you for sharing! To enhance, python -c "import tensorflow.python as x; print(x.__path__[0])" will give you the target dir, just make sure to use the correct Python executable (e.g. activate your venv etc). The modified one-liner that should work as is:

ln -s /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11 $(python -c "import tensorflow.python as x; print(x.__path__[0])")/libcusolver.so.10

For some reason the symlink did not work with the provided target, however placing it in the tensorflow site-packages worked properly

sudo ln -s /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11 /home/YourUsernameHere/.local/lib/python3.8/site-packages/tensorflow/python/libcusolver.so.10

(Obviously replacing python3.8 with proper value as well as username)

Edit: For those finding their way here, I recommend seeing the extra details provided by hoefling below (https://github.com/tensorflow/tensorflow/issues/43947#issuecomment-727655093)

For the ones using (mini-) conda, the soft-links should be

ln -s /usr/local/cuda/lib64/libcusolver.so.11 ~/miniconda3/envs/<env-name>/lib/libcusolver.so.10

The library seems to work without problem:

import tensorflow as tf
A = tf.random.normal((5, 5))
b = tf.random.normal((5,1))
tf.linalg.solve(A,b)

@guiambros did you find a way to make it work with CUDA 11.1?

I had the same problem. Works (tested on a few modest Keras models) on CUDA 11.1 with this softlink: sudo ln -s /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.10

I will reply if the above quick fix breaks something.

OS Platform and Distribution: Ubuntu 20.04.1 (fresh installed today) TensorFlow installed from (source or binary): pip install tf-nightly TensorFlow version: 2.4.0-dev20201023 Python version: 3.8.5 Driver Version: 455.32.00

This works on my machine with RTX3090. Thanks!

Hi,
I hammered my python guideline straight and it worked perfectly … sudo ln -s /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11 /usr/local/lib/python3.8/dist-packages/tensorflow/python/libcusolver.so.10

I have the same error but the fix doesn’t work.

OS Platform and Distribution: Ubuntu 20.04.1 TensorFlow installed from (source or binary): pip install tf-nightly TensorFlow version: 2.4.0-dev20201023 Python version: 3.8.5 Driver Version: 455.38.00 CUDA Version: 11.1 Update 1 GPU: RTX 3070

The main problem is that I don’t have /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11

Thanks @JiazhengChai, that worked for me too. Here’s the specific command to install the newer cuda and cudnn libraries that avoided this issue for me: sudo apt-get install --no-install-recommends cuda-11-3 libcudnn8=8.2.1.32-1+cuda11.3 libcudnn8-dev=8.2.1.32-1+cuda11.3

To anyone still struggling with this - symlink should be placed in a folder which is included in LD_LIBRARY_PATH - doesn’t really matter which one you put it in. source is probably /usr/local/cuda-11.1/lib64/libcusolver.so.11.

so it should work for everyone as long as it is ln -s /usr/local/cuda-11.1/lib64/libcusolver.so.11 [some path included in LD_LIBRARY_PATH]/libcusolver.so.10. safest way is probably putting it in /usr/local/lib since this folder should be included in LD_LIBRARY_PATH by default.

Alternatively you can put a symlink to whatever directory you want and then append that directory to LD_LIBRARY_PATH with export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/my_folder/lib

If you have doubts (or don’t see TF output for searched library paths) just run echo $LD_LIBRARY_PATH to see which paths are searched

Specially for users who aren’t administrator

My condition is: libcusolver.so.11 is not found but the I have the libcusolver.so.10. But the tricky problem is I am not the administrator for this system so I can’t create soft link for /usr/local/cuda/libcusolver.so.

Finally, I use ln -s /usr/local/cuda/lib64/libcusolver.so.10 ~/anaconda3/envs/<my env>/lib/libcusolver.so.11 to fix that.

@guiambros did you find a way to make it work with CUDA 11.1?

I had the same problem. Works (tested on a few modest Keras models) on CUDA 11.1 with this softlink: sudo ln -s /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.10

I will reply if the above quick fix breaks something.

OS Platform and Distribution: Ubuntu 20.04.1 (fresh installed today) TensorFlow installed from (source or binary): pip install tf-nightly TensorFlow version: 2.4.0-dev20201023 Python version: 3.8.5 Driver Version: 455.32.00

Nice, Thx! For those running under Fedora 32 the following will do the trick: sudo ln -s /usr/local/cuda-11.1/lib64/libcusolver.so.11 /usr/local/cuda-11.1/lib64/libcusolver.so.10

Folks, please refer to the CUDA Toolkit Major Component Versions of the specific CUDA version you are using. I have played around with CUDA 11.0, and it had issues importing libcusolver.so.11 since only libcusolver.so.10 is available there.

Upgraded to CUDA 11.3 with cuDNN v8.2.0 plus the NVIDIA-465.31 driver, and everything worked like a charm with TensorFlow 2.5 !! 😍 🧡

Confirmation of my result: ls /usr/local/cuda-11.3/lib64

libaccinj64.so                libcurand.so                libnppidei.so            libnpps.so.11
libaccinj64.so.11.3           libcurand.so.10             libnppidei.so.11         libnpps.so.11.3.3.44
libaccinj64.so.11.3.58        libcurand.so.10.2.4.58      libnppidei.so.11.3.3.44  libnpps_static.a
libcublasLt.so                libcurand_static.a          libnppidei_static.a      libnvblas.so
libcublasLt.so.11             libcusolverMg.so            libnppif.so              libnvblas.so.11
libcublasLt.so.11.4.2.10064   libcusolverMg.so.11         libnppif.so.11           libnvblas.so.11.4.2.10064
libcublasLt_static.a          libcusolverMg.so.11.1.1.58  libnppif.so.11.3.3.44    libnvjpeg.so
libcublas.so                  libcusolver.so              libnppif_static.a        libnvjpeg.so.11
libcublas.so.11               libcusolver.so.11           libnppig.so              libnvjpeg.so.11.4.1.58
libcublas.so.11.4.2.10064     libcusolver.so.11.1.1.58    libnppig.so.11           libnvjpeg_static.a
libcublas_static.a            libcusolver_static.a        libnppig.so.11.3.3.44    libnvptxcompiler_static.a
libcudadevrt.a                libcusparse.so              libnppig_static.a        libnvrtc-builtins.so
libcudart.so                  libcusparse.so.11           libnppim.so              libnvrtc-builtins.so.11.3
libcudart.so.11.0             libcusparse.so.11.5.0.58    libnppim.so.11           libnvrtc-builtins.so.11.3.58
libcudart.so.11.3.58          libcusparse_static.a        libnppim.so.11.3.3.44    libnvrtc.so
libcudart_static.a            liblapack_static.a          libnppim_static.a        libnvrtc.so.11.2
libcufft.so                   libmetis_static.a           libnppist.so             libnvrtc.so.11.3.58
libcufft.so.10                libnppc.so                  libnppist.so.11          libnvToolsExt.so
libcufft.so.10.4.2.58         libnppc.so.11               libnppist.so.11.3.3.44   libnvToolsExt.so.1
libcufft_static.a             libnppc.so.11.3.3.44        libnppist_static.a       libnvToolsExt.so.1.0.0
libcufft_static_nocallback.a  libnppc_static.a            libnppisu.so             libOpenCL.so
libcufftw.so                  libnppial.so                libnppisu.so.11          libOpenCL.so.1
libcufftw.so.10               libnppial.so.11             libnppisu.so.11.3.3.44   libOpenCL.so.1.0
libcufftw.so.10.4.2.58        libnppial.so.11.3.3.44      libnppisu_static.a       libOpenCL.so.1.0.0
libcufftw_static.a            libnppial_static.a          libnppitc.so             nvrtc-prev
libcuinj64.so                 libnppicc.so                libnppitc.so.11          stubs
libcuinj64.so.11.3            libnppicc.so.11             libnppitc.so.11.3.3.44
libcuinj64.so.11.3.58         libnppicc.so.11.3.3.44      libnppitc_static.a
libculibos.a                  libnppicc_static.a          libnpps.so

@cataluna84 Thanks. Exactly it was the problem caused by the libcusolver.so.11. It is not present in the CUDA 11.0 folder. This also indicates that the installation instructions on the official TF page should be updated to install CUDA 11.3 instead, which contains the libcusolver.so.11.

It is also dubious whether the command sudo ln -s to make a softlink between the libcusolver.so.10 and the libcusolver.so.11 will work correctly. At least in my case, it gave the impression that it worked normally, but when I tried to run LSTM and CNN, an error Fail to find the dnn implementation. occurred.

In short, it would be nice if the Tensoflow team can update the installation instructions on the official page to save the user’s time.

Same issue with tf2.4 and cuda11-2. Thanks for the tips 😉

@guiambros Do you see the same error with CUDA 11.0. I think you need to downgrade your version of CUDA as the nightly only supports 11.0.Thanks!

Folks, please refer to the CUDA Toolkit Major Component Versions of the specific CUDA version you are using. I have played around with CUDA 11.0, and it had issues importing libcusolver.so.11 since only libcusolver.so.10 is available there.

Upgraded to CUDA 11.3 with cuDNN v8.2.0 plus the NVIDIA-465.31 driver, and everything worked like a charm with TensorFlow 2.5 !! 😍 🧡

Confirmation of my result: ls /usr/local/cuda-11.3/lib64

libaccinj64.so                libcurand.so                libnppidei.so            libnpps.so.11
libaccinj64.so.11.3           libcurand.so.10             libnppidei.so.11         libnpps.so.11.3.3.44
libaccinj64.so.11.3.58        libcurand.so.10.2.4.58      libnppidei.so.11.3.3.44  libnpps_static.a
libcublasLt.so                libcurand_static.a          libnppidei_static.a      libnvblas.so
libcublasLt.so.11             libcusolverMg.so            libnppif.so              libnvblas.so.11
libcublasLt.so.11.4.2.10064   libcusolverMg.so.11         libnppif.so.11           libnvblas.so.11.4.2.10064
libcublasLt_static.a          libcusolverMg.so.11.1.1.58  libnppif.so.11.3.3.44    libnvjpeg.so
libcublas.so                  libcusolver.so              libnppif_static.a        libnvjpeg.so.11
libcublas.so.11               libcusolver.so.11           libnppig.so              libnvjpeg.so.11.4.1.58
libcublas.so.11.4.2.10064     libcusolver.so.11.1.1.58    libnppig.so.11           libnvjpeg_static.a
libcublas_static.a            libcusolver_static.a        libnppig.so.11.3.3.44    libnvptxcompiler_static.a
libcudadevrt.a                libcusparse.so              libnppig_static.a        libnvrtc-builtins.so
libcudart.so                  libcusparse.so.11           libnppim.so              libnvrtc-builtins.so.11.3
libcudart.so.11.0             libcusparse.so.11.5.0.58    libnppim.so.11           libnvrtc-builtins.so.11.3.58
libcudart.so.11.3.58          libcusparse_static.a        libnppim.so.11.3.3.44    libnvrtc.so
libcudart_static.a            liblapack_static.a          libnppim_static.a        libnvrtc.so.11.2
libcufft.so                   libmetis_static.a           libnppist.so             libnvrtc.so.11.3.58
libcufft.so.10                libnppc.so                  libnppist.so.11          libnvToolsExt.so
libcufft.so.10.4.2.58         libnppc.so.11               libnppist.so.11.3.3.44   libnvToolsExt.so.1
libcufft_static.a             libnppc.so.11.3.3.44        libnppist_static.a       libnvToolsExt.so.1.0.0
libcufft_static_nocallback.a  libnppc_static.a            libnppisu.so             libOpenCL.so
libcufftw.so                  libnppial.so                libnppisu.so.11          libOpenCL.so.1
libcufftw.so.10               libnppial.so.11             libnppisu.so.11.3.3.44   libOpenCL.so.1.0
libcufftw.so.10.4.2.58        libnppial.so.11.3.3.44      libnppisu_static.a       libOpenCL.so.1.0.0
libcufftw_static.a            libnppial_static.a          libnppitc.so             nvrtc-prev
libcuinj64.so                 libnppicc.so                libnppitc.so.11          stubs
libcuinj64.so.11.3            libnppicc.so.11             libnppitc.so.11.3.3.44
libcuinj64.so.11.3.58         libnppicc.so.11.3.3.44      libnppitc_static.a
libculibos.a                  libnppicc_static.a          libnpps.so

@guiambros did you find a way to make it work with CUDA 11.1?

I had the same problem. Works (tested on a few modest Keras models) on CUDA 11.1 with this softlink: sudo ln -s /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.10 I will reply if the above quick fix breaks something. OS Platform and Distribution: Ubuntu 20.04.1 (fresh installed today) TensorFlow installed from (source or binary): pip install tf-nightly TensorFlow version: 2.4.0-dev20201023 Python version: 3.8.5 Driver Version: 455.32.00

Nice, Thx! For those running under Fedora 32 the following will do the trick: sudo ln -s /usr/local/cuda-11.1/lib64/libcusolver.so.11 /usr/local/cuda-11.1/lib64/libcusolver.so.10

This works for my:

sudo ln -s /opt/cuda/lib64/libcusolver.so.11 /opt/cuda/lib64/libcusolver.so.10

Environment:

  • GPU: Rtx 3080
  • OS: Archlinux (x86_64)
  • Kernel: linux510-nvidia
  • Nvidia driver: 455.45.01-10.0
  • CUDA: 11.2.0-1
  • CUDNN: 8.0.5.39-1
  • Tensorflow: tf-nightly 2.5.0-dev20210106
  • Python: 3.8.5
  • Conda: 4.9.2

Thanks @gowthamkpr. I removed CUDA 11.1 and installed CUDA 11.0 and now it’s working fine with nightly. I’ll keep on 11.0 for now. Thank you!