tensorflow: Incompatibility between versions of TF and CUDA dynamic libraries

Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 Enterprise 64 bit, version 10.0.17763
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): pip install tf-nightly-gpu
  • TensorFlow version: tf-nightly-gpu 2.4.0.dev20201019
  • Python version: 3.8.6
  • Installed using virtualenv? pip? conda?: Pip (in conda env)
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: CUDA 11.1/cuDNN 8.4.0.30
  • GPU model and memory: NVIDIA Quadro P5000 (16 GB)

Describe the problem I installed MS Visual Studio Community 2019, CUDA (express installation) and, cuDNN, as per the respective instructions. I created a new conda env with Python 3.8.6 and activated it. I installed tf-nightly-gpu using pip. I launched Python and imported tensorflow, then listed GPU devices: all DLLs were found but one (cusolver64). The version of the library looked for by TF is 10 while CUDA 11.1 installs version 11.

Provide the exact sequence of commands / steps that you executed before running into the problem

`>>> import tensorflow as tf 2020-10-19 17:17:23.185914: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll

tf.config.list_physical_devices(‘GPU’) 2020-10-19 17:17:44.311308: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2020-10-19 17:17:44.320150: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll 2020-10-19 17:17:44.361274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:65:00.0 name: Quadro P5000 computeCapability: 6.1 coreClock: 1.7335GHz coreCount: 20 deviceMemorySize: 16.00GiB deviceMemoryBandwidth: 269.00GiB/s 2020-10-19 17:17:44.373653: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll 2020-10-19 17:17:44.392056: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll 2020-10-19 17:17:44.397830: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll 2020-10-19 17:17:44.407985: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll 2020-10-19 17:17:44.417194: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll 2020-10-19 17:17:44.424211: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library ‘cusolver64_10.dll’; dlerror: cusolver64_10.dll not found 2020-10-19 17:17:44.433236: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll 2020-10-19 17:17:44.440903: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll 2020-10-19 17:17:44.445862: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices… []`

Any other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 35 (4 by maintainers)

Most upvoted comments

While I am thankful for @gdrolet for providing a “solution” I do not like it one bit, because I simply do not understand how this solves the issue, also this is not really a “solution” but an incredible dirty one at best.

I am using a similar setup as the issue starter, however I am using a RTX 3090, which should be incompatible with CUDA 10 and lower. So I installed CUDA 11.1, the latest cuDnn version (all on Windows), switched to python 3.8 and the latest tf-nightly-gpu version (from today. November 10th).

I encountered the same issue as the OP, and I thought about the different methods described in similar issues (renaming the *_11.dll file that to the “10” version, downgrading although that should not work, etc), but then I simply decided to install 10.2 over the newer version.

This however WILL change the CUDA_PATH to the 10.2 version instead of the 11.1. The installation that does not have cudnn installed. In essence, tensorflow should have problems (it is incompatible with the RTX 3000 series GPUs and cudnn is missing). But it still does work. But I tested it with a few simple GAN training loops and everything looks in order. I did some testing (deleting the DLLs that need to be loaded, sometimes from cuda 11, sometimes from version 10), because I know nothing about how loads the necessary dlls and because I thought the CUDA_PATH matters. Interestingly it does not. My TF installation still loads the DLLs from CUDA 11.1 (CUDA_PATH_V11.1) and then automatically goes into the 10.2 folder (CUDA_PATH_V10.2) to load the solver DLL.

For my own sanity’s sake, I set the (normal) CUDA_PATH to version 11.1 because I am afraid other tools I use might use the wrong version.

Also @Saduf2019 do you know when this is actually solved for real, meaning it looks for the correct cusolver64_11.dll instead of the 10 version? This solution is not a clean one and it might cause problems down the line. When you have blog posts such as that one struggling with TF on RTX 3000, one gets worried that their models and computations are not 100% correct or might produce giberish.

@pkanwar23 that is not fully compatible with rtx 30

I have Windows 10 x64, CUDA 11.2, cuDNN 8, Tensorflow 2.4 I also have some problem I solved it with copied file cusolver64_10.dll into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin