tensorflow: Why is CUDA 10.1 not supported & strange error message?
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): N/A
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
- TensorFlow installed from (source or binary): pip install tensorflow (2.0 version)
- TensorFlow version (use command below): Tensorflow v2.0.0
- Python version: 3.7
- Bazel version (if compiling from source): N/A
- GCC/Compiler version (if compiling from source): N/A
- CUDA/cuDNN version: 10.1
- GPU model and memory: Nvidia Quadro M1000M
You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the current behavior
When I try to use Tensorflow 2.0 with CUDA 10.1 I run into some errors. Previously I haven’t seen these errors with older versions of keras/tensorflow.
Code:
tf.test.is_gpu_available( cuda_only=False, min_cuda_compute_capability=None )
Error:
Tensorflow v2.0.0
---------------------------------------------------------------------------
InternalError Traceback (most recent call last)
<ipython-input-4-ee93c3bb3dbc> in <module>
4 print(f'Tensorflow v{tf.__version__}')
5
----> 6 tf.test.is_gpu_available( cuda_only=False, min_cuda_compute_capability=None )
C:\ProgramData\Miniconda3\envs\tf2\lib\site-packages\tensorflow_core\python\framework\test_util.py in is_gpu_available(cuda_only, min_cuda_compute_capability)
1430
1431 try:
-> 1432 for local_device in device_lib.list_local_devices():
1433 if local_device.device_type == "GPU":
1434 if (min_cuda_compute_capability is None or
C:\ProgramData\Miniconda3\envs\tf2\lib\site-packages\tensorflow_core\python\client\device_lib.py in list_local_devices(session_config)
39 return [
40 _convert(s)
---> 41 for s in pywrap_tensorflow.list_devices(session_config=session_config)
42 ]
C:\ProgramData\Miniconda3\envs\tf2\lib\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py in list_devices(session_config)
2247 return ListDevicesWithSessionConfig(session_config.SerializeToString())
2248 else:
-> 2249 return ListDevices()
2250
2251
InternalError: cudaGetDevice() failed. Status: cudaGetErrorString symbol not found.
Describe the expected behavior
I would expect a minor release (10.0 to 10.1) which has been out for almost 2 years to work correctly.
Furthermore the error doesn’t specify anything regarding CUDA 10.1 not being supported. If this would be the case then that’s the error I would expect.
Code to reproduce the issue
Install the latest tensorflow with pip install tensorflow
into a Conda environment running python 3.7 with CUDA 10.1 installed. Then run the following command:
tf.test.is_gpu_available( cuda_only=False, min_cuda_compute_capability=None )
Other info / logs N/A
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 17 (7 by maintainers)
Can one of the tensorflow project maintainers please finally answer a simple question which seems to pop up repeatedly on this very issue tracker:
Question: When will prebuilt tensorflow packages for CUDA 10.1 be available?
Alternative version of the question: When will building against CUDA 10.1 be supported?
Valid answers, in order of preference:
Invalid answers::
Thanks in advance.
Yes I’m currently trying to install CUDA 10.0, it’s somewhat annoying that other frameworks support the latest CUDA 10.1. Is upgrading to CUDA 10.1 on the backlog? And if so, when do you expect it to be implemented?
CUDA 10.1 has been out for a long while so I would hope the newest Tensorflow could also get support for this.
I understand your concern. But as I mentioned before, this is a subjective decision, which will result in a lot of people being unhappy whatever position we choose. As majority of our github issues show that a lot of people have problems installing CUDA/nvidia drivers, we choose to update is as little as possible.
I am closing this issue, as the build on windows is now working, and the discussion is diverging.
To me that sounds like it would be very easy to upgrade the dependencies to CUDA 10.1. So coming back to my question, why is TF2 targeting an old version of CUDA?
@gunan Thanks for the response.
You say “We do not build the main packages with latest cuda version support as soon as they are out”, but CUDA 10.1 was out on February 27th, 2019.
In the meantime there were two binary compatible releases (10.1 update 1 and 10.1 update 2 released in August) which brought some important bugfixes and changes, including how CUDA DLLs are named, which compilers and systems are supported, performance improvements and proper support for RTX cards, and for Windows platform also nvJPEG library which wasn’t available until 10.1 update 2 and which is quite a boost for any batch image processing for ML.
Although my experience with installing or updating NVIDIA drivers under Linux was always problem-free I understand that you may personally find it inconvenient or even intimidating to update drivers, but a big part of CUDA runtime support is in the driver itself and that’s why it has to change in sync with it – if you want new features, better performance and support for new hardware then updating the drivers is the only way forward.
Some people use CUDA for other things, not just for TF and some of the features only exist or work properly in the latest version so it is not really easy to “just downgrade to 10.0”.
As for building from source, I did try that few versions ago on Windows and half way through bazel barfed a bunch of indecipherable errors. Not having a clue how the whole build system works and how to even begin to untangle it I gave up, hoping that the main TF package will one day catch up. Sadly, that day seems to remain forever in the future.
Finally, I hope I am not coming across as confrontational or ungrateful, but if the choice is between the TF devs updating drivers to make a new CUDA 10.1 based release and everyone else having to learn the intimate details of the TF’s bazel build system to be able to roll their own to me the former wins and the latter is quite unrealistic to expect even though it may be your official response.
Thank you for your time.
Building against cuda 10.1 is already supported. I am running continuous builds already and they are running just fine. Occasionally there has been issues, but we have fixed them.
We do not build the main packages with latest cuda version support as soon as they are out, because we want our prebuilt packages to “run on as many machines as possible, while still being performant”. It is a subjective and difficult line. And it is the kind of a choice which, whatever we pick, someone will be unhappy.
The reason new CUDA versions do not work with most machines is, every new CUDA version requires a driver version that is “too new” on majority of linux distributions. My current linux distro still has not blessed drivers that work with cuda 10.1. So I had to uninstall drivers that came with the system and jump through hoops to get the newer drivers. And as we realized it was difficult, we made the choice to not upgrade.
The above is the justification for not having the CUDA 10.1 support for the main TF releases yet. I am sorry if it is not a satisfactory answer. We are evaluating if we can upgrade to 10.1 for 2.1, but we are still just exploring.
Finally, as always, TF has details building from sources instructions for all operating systems. I know that it is difficult, but if you have a use case that cannot do without cuda 10.1, or cuda 9, or 9.2, our official answer is always going to be “you will need to build those packages yourself”, however unpopular it is.
[UPDATE] - Installing NIGHTLY version has solved this for me.
@ahtik - I tried installing to Windows 10 per your link as
pip install tensorflow-gpu==2.1.0-rc1
. However, I received the following:ERROR: Could not find a version that satisfies the requirement tensorflow-gpu-estimator<2.2.0,>=2.1.0rc0 (from tensorflow-gpu==2.1.0-rc1)
- do you have any suggestion to overcome this issue?I can pip install
tensorflow-estimator==2.1.0rc0
but nottensorflow-gpu-estimator==2.1.0rc0
and in fact I cannot findtensorflow-gpu-estimator
anywhere on line. However, after installingtensorflow-estimator-2.1.0rc0
the error installingpip install tensorflow-gpu==2.1.0-rc1
persists.Huggingface is looking strong. They use Pytorch on CUDA 10.1 - we are thinking of switching over too.