tensorflow: Why is CUDA 10.1 not supported & strange error message?

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): N/A
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
TensorFlow installed from (source or binary): pip install tensorflow (2.0 version)
TensorFlow version (use command below): Tensorflow v2.0.0
Python version: 3.7
Bazel version (if compiling from source): N/A
GCC/Compiler version (if compiling from source): N/A
CUDA/cuDNN version: 10.1
GPU model and memory: Nvidia Quadro M1000M

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior

When I try to use Tensorflow 2.0 with CUDA 10.1 I run into some errors. Previously I haven’t seen these errors with older versions of keras/tensorflow.

Code:

tf.test.is_gpu_available( cuda_only=False, min_cuda_compute_capability=None )

Error:


Tensorflow v2.0.0
---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
<ipython-input-4-ee93c3bb3dbc> in <module>
      4 print(f'Tensorflow v{tf.__version__}')
      5 
----> 6 tf.test.is_gpu_available( cuda_only=False, min_cuda_compute_capability=None )

C:\ProgramData\Miniconda3\envs\tf2\lib\site-packages\tensorflow_core\python\framework\test_util.py in is_gpu_available(cuda_only, min_cuda_compute_capability)
   1430 
   1431   try:
-> 1432     for local_device in device_lib.list_local_devices():
   1433       if local_device.device_type == "GPU":
   1434         if (min_cuda_compute_capability is None or

C:\ProgramData\Miniconda3\envs\tf2\lib\site-packages\tensorflow_core\python\client\device_lib.py in list_local_devices(session_config)
     39   return [
     40       _convert(s)
---> 41       for s in pywrap_tensorflow.list_devices(session_config=session_config)
     42   ]

C:\ProgramData\Miniconda3\envs\tf2\lib\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py in list_devices(session_config)
   2247     return ListDevicesWithSessionConfig(session_config.SerializeToString())
   2248   else:
-> 2249     return ListDevices()
   2250 
   2251 

InternalError: cudaGetDevice() failed. Status: cudaGetErrorString symbol not found.

Describe the expected behavior

I would expect a minor release (10.0 to 10.1) which has been out for almost 2 years to work correctly.

Furthermore the error doesn’t specify anything regarding CUDA 10.1 not being supported. If this would be the case then that’s the error I would expect.

Code to reproduce the issue Install the latest tensorflow with pip install tensorflow into a Conda environment running python 3.7 with CUDA 10.1 installed. Then run the following command:

tf.test.is_gpu_available( cuda_only=False, min_cuda_compute_capability=None )

Other info / logs N/A

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 17 (7 by maintainers)

Most upvoted comments

Can one of the tensorflow project maintainers please finally answer a simple question which seems to pop up repeatedly on this very issue tracker:

Question: When will prebuilt tensorflow packages for CUDA 10.1 be available?

Alternative version of the question: When will building against CUDA 10.1 be supported?

Valid answers, in order of preference:

a specific date or release version
a rough time estimate
“never” with clarification as to what seems to be the issue

Invalid answers::

“build it yourself if you need it”
“downgrade to 10.0”

Thanks in advance.

+35

levicki on Nov 5, 2019

Yes I’m currently trying to install CUDA 10.0, it’s somewhat annoying that other frameworks support the latest CUDA 10.1. Is upgrading to CUDA 10.1 on the backlog? And if so, when do you expect it to be implemented?

CUDA 10.1 has been out for a long while so I would hope the newest Tensorflow could also get support for this.

devedse on Oct 7, 2019

I understand your concern. But as I mentioned before, this is a subjective decision, which will result in a lot of people being unhappy whatever position we choose. As majority of our github issues show that a lot of people have problems installing CUDA/nvidia drivers, we choose to update is as little as possible.

I am closing this issue, as the build on windows is now working, and the discussion is diverging.

gunan on Nov 13, 2019

To me that sounds like it would be very easy to upgrade the dependencies to CUDA 10.1. So coming back to my question, why is TF2 targeting an old version of CUDA?

devedse on Oct 9, 2019

@gunan Thanks for the response.

You say “We do not build the main packages with latest cuda version support as soon as they are out”, but CUDA 10.1 was out on February 27th, 2019.

In the meantime there were two binary compatible releases (10.1 update 1 and 10.1 update 2 released in August) which brought some important bugfixes and changes, including how CUDA DLLs are named, which compilers and systems are supported, performance improvements and proper support for RTX cards, and for Windows platform also nvJPEG library which wasn’t available until 10.1 update 2 and which is quite a boost for any batch image processing for ML.

Although my experience with installing or updating NVIDIA drivers under Linux was always problem-free I understand that you may personally find it inconvenient or even intimidating to update drivers, but a big part of CUDA runtime support is in the driver itself and that’s why it has to change in sync with it – if you want new features, better performance and support for new hardware then updating the drivers is the only way forward.

Some people use CUDA for other things, not just for TF and some of the features only exist or work properly in the latest version so it is not really easy to “just downgrade to 10.0”.

As for building from source, I did try that few versions ago on Windows and half way through bazel barfed a bunch of indecipherable errors. Not having a clue how the whole build system works and how to even begin to untangle it I gave up, hoping that the main TF package will one day catch up. Sadly, that day seems to remain forever in the future.

Finally, I hope I am not coming across as confrontational or ungrateful, but if the choice is between the TF devs updating drivers to make a new CUDA 10.1 based release and everyone else having to learn the intimate details of the TF’s bazel build system to be able to roll their own to me the former wins and the latter is quite unrealistic to expect even though it may be your official response.

Thank you for your time.

levicki on Nov 13, 2019

Building against cuda 10.1 is already supported. I am running continuous builds already and they are running just fine. Occasionally there has been issues, but we have fixed them.

We do not build the main packages with latest cuda version support as soon as they are out, because we want our prebuilt packages to “run on as many machines as possible, while still being performant”. It is a subjective and difficult line. And it is the kind of a choice which, whatever we pick, someone will be unhappy.

The reason new CUDA versions do not work with most machines is, every new CUDA version requires a driver version that is “too new” on majority of linux distributions. My current linux distro still has not blessed drivers that work with cuda 10.1. So I had to uninstall drivers that came with the system and jump through hoops to get the newer drivers. And as we realized it was difficult, we made the choice to not upgrade.

The above is the justification for not having the CUDA 10.1 support for the main TF releases yet. I am sorry if it is not a satisfactory answer. We are evaluating if we can upgrade to 10.1 for 2.1, but we are still just exploring.

Finally, as always, TF has details building from sources instructions for all operating systems. I know that it is difficult, but if you have a use case that cannot do without cuda 10.1, or cuda 9, or 9.2, our official answer is always going to be “you will need to build those packages yourself”, however unpopular it is.

gunan on Nov 13, 2019

[UPDATE] - Installing NIGHTLY version has solved this for me.

@ahtik - I tried installing to Windows 10 per your link as pip install tensorflow-gpu==2.1.0-rc1. However, I received the following: ERROR: Could not find a version that satisfies the requirement tensorflow-gpu-estimator<2.2.0,>=2.1.0rc0 (from tensorflow-gpu==2.1.0-rc1) - do you have any suggestion to overcome this issue?

I can pip install tensorflow-estimator==2.1.0rc0 but not tensorflow-gpu-estimator==2.1.0rc0 and in fact I cannot find tensorflow-gpu-estimator anywhere on line. However, after installing tensorflow-estimator-2.1.0rc0 the error installing pip install tensorflow-gpu==2.1.0-rc1 persists.

ChrisPalmerNZ on Dec 23, 2019

Huggingface is looking strong. They use Pytorch on CUDA 10.1 - we are thinking of switching over too.

birdmw on Dec 20, 2019