tensorflow: Crash: Could not create cuDNN handle when convnets are used
Tensorflow (GPU) was imported successfully, but when running a session that involves a convolutional neural network (CNN), Python crashes with the following message:
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
The problem persists on any combination of CUDA toolkit 7.5/8.0 and Tensorflow installed from pip/source. Test sessions that do not use CNNs are run successfully.
What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?
The issue is similar to https://github.com/tensorflow/tensorflow/issues/6586, where I first commented. But since I experience the problem on a Mac, I was suggested to open a separate issue.
Environment info
Operating System: macOS Sierra 10.12.2 Xcode version 8.2 (8C38) (When I later tried CUDA 7.5, I installed Command Line Tools version 7.3.1 because CUDA 7.5 lacked support of the more recent compilers.) Python 3.5.2 (anaconda)
Installed version of CUDA: tried both 8.0 (initially) and 7.5 (reported here, toolkit only – the driver is still 8.0)
Installed version of cuDNN: 5.1 (different installations according to CUDA versions)
(please attach the output of ls -l /path/to/cuda/lib/libcud*
):
lrwxr-xr-x 1 root wheel 33 5 Jan 20:33 /usr/local/cuda/lib/libcuda.1.dylib -> /usr/local/cuda/lib/libcuda.dylib
-rwxr-xr-x@ 1 root wheel 8280 13 Apr 2016 /usr/local/cuda/lib/libcuda.dylib
lrwxr-xr-x@ 1 root wheel 45 13 Apr 2016 /usr/local/cuda/lib/libcudadevrt.a -> /Developer/NVIDIA/CUDA-7.5/lib/libcudadevrt.a
lrwxr-xr-x@ 1 root wheel 50 13 Apr 2016 /usr/local/cuda/lib/libcudart.7.5.dylib -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart.7.5.dylib
lrwxr-xr-x@ 1 root wheel 46 13 Apr 2016 /usr/local/cuda/lib/libcudart.dylib -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart.dylib
lrwxr-xr-x@ 1 root wheel 49 13 Apr 2016 /usr/local/cuda/lib/libcudart_static.a -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart_static.a
lrwxr-xr-x 1 root wheel 16 5 Jan 17:14 /usr/local/cuda/lib/libcudnn.5 -> libcudnn.5.dylib
-rwxr-xr-x@ 1 ymfa staff 58975112 10 Jun 2016 /usr/local/cuda/lib/libcudnn.5.dylib
lrwxr-xr-x@ 1 ymfa staff 16 10 Jun 2016 /usr/local/cuda/lib/libcudnn.dylib -> libcudnn.5.dylib
lrwxr-xr-x 1 root wheel 16 5 Jan 17:14 /usr/local/cuda/lib/libcudnn5.dylib -> libcudnn.5.dylib
-rw-r--r--@ 1 ymfa staff 56392320 10 Jun 2016 /usr/local/cuda/lib/libcudnn_static.a
I tried both installing from pip and source. I first installed from binary pip package:
- A link to the pip package you installed:
tensorflow-gpu
- The output from
python -c "import tensorflow; print(tensorflow.__version__)"
.0.12.head
Later I installed from source (the pip package was uninstalled):
-
The commit hash (
git rev-parse HEAD
)d67c09d98a576e1fbf2f3609ddb842e53890f31c
-
The output of
bazel version
Build label: 0.4.3-homebrew Build target: bazel-out/local-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar Build time: Thu Dec 22 15:20:15 2016 (1482420015) Build timestamp: 1482420015 Build timestamp as int: 1482420015
If possible, provide a minimal reproducible example
I made a minimal example by simplifying the network and reducing the training data to only twenty images and two classes for classification. issue.zip contains the Python code and the data. I wrote two convolutional layers because I found the network with only one convolutional layer runs without problem.
Complete log using CUDA 7.5 and Tensorflow compiled from source
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.7.5.dylib locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.9
pciBusID 0000:01:00.0
Total memory: 1023.69MiB
Free memory: 740.18MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Complete log using CUDA 8.0 and Tensorflow installed from pip
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.9
pciBusID 0000:01:00.0
Total memory: 1023.69MiB
Free memory: 590.00MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E tensorflow/stream_executor/cuda/cuda_dnn.cc:392] error retrieving driver version: Invalid argument: expected %d.%d or %d.%d.%d form for driver version; got ""
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 30
- Comments: 147 (9 by maintainers)
Here is a bit more info on how I temporarily resolved it. I believe these issues are all related to GPU memory allocation and have nothing to do with the errors being reported. There were other errors before this indicating some sort of memory allocation problem but the program continued to progress, eventually giving the cudnn errors that everyone is getting. The reason I believe it works sometimes is that if you use the gpu for other things besides tensorflow such as your primary display, the available memory fluctuates. Sometimes you can allocate what you need and other times it can’t.
From the API https://www.tensorflow.org/versions/r0.12/how_tos/using_gpu/ “By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation.”
I think this default allocation is broken in some way that causes this erratic behavior and certain situations to work and others to fail.
I have resolved this issue by changing the default behavior of TF to allocate a minimum amount of memory and grow as needed as detailed in the webpage. config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config, …)
I have also tried the alternate way and was able to get it to work and fail with experimentally choosing a percentage that worked. In my case it ended up being about .7.
config = tf.ConfigProto() config.gpu_options.per_process_gpu_memory_fraction = 0.4 session = tf.Session(config=config, …)
Still no word from anyone on the TF team confirming this but it is worth a shot to see if others can confirm similar behavior.
I was able to get a program to work by limiting the gpu usage. In my case with a 3gb gtx 1060 on ubuntu 16.04, if I set gpu option per_process_gpu_memory_fraction to .7 it works. Anything higher, I get these errors
E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
It could be a case of bad error reporting by tensorflow. Seems completely unrelated. Maybe it is a clue to getting this resolved in a better manner?
Just for those who are driven mad by this:
I occasionally got a CUBLAS error as well. So I did this:
cd /usr/local/cuda/samples/7_CUDALibraries/simpleCUBLAS
make
./simpleCUBLAS
and discovered that I could not initialise CUBLAS
So next I did this (based on advice)
sudo rm -f ~/.nv
And it worked. Cheers…thats 4 days wasted. Hope this saves someone else
If it helps anyone, seems there are sometimes zombie processes left which prevent from tf to start again properly and gave me this error. killing them work around the issue.
Same issue too. I’m on Windows 10, GTX1070, CUDA 8.0, cuDNN 5.1.
I faced this issue after accidentally upgrading tensorflow-gpu from version 1.6.0 to 1.18.0. This caused instability due to the versions both of CUDA and cuDNN. The solution was rolling back to tensorflow-gpu 1.6.0.
This was the solution to my problems:
https://stackoverflow.com/questions/50622525/which-tensorflow-and-cuda-version-combinations-are-compatible
Whenever you start facing facing this kind of issues, before you upgrade your NVIDIA dependencies, ALWAYS try to solve the problem by uninstalling the versions of tensorflow and installing a version compatible with your CUDA dependencies first.
Step 1: Check your tensorflow packages versions. If you have GPU, I recommend uninstalling the cpu-version of tensorflow in order to avoid conflicts.
pip list | grep tensorflow
Step 2: Uninstalling tensorflow-gpu.
pip uninstall tensorflow
Step 3: Check your CUDA and cuDNN versions. You may need to adjust these paths.
– CUDA
cat /usr/local/cuda/version.txt
In case this fails, find your cuda version text file using:sudo find / -name version.txt
– cuDNN
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
In case this fails, find your cuda version text file using:sudo find / -name cudnn.h
Step 4: Check if your tensorflow-gpu, cuda and cudnn versions match this table.
In my case, I needed tensorflow-gpu 1.6.0 in order to match the other requirements.
So I installed this version using:
pip install tensorflow-gpu==1.6.0
these are the specifications that worked!OS: Ubuntu 16.04 CUDA Version: 9.0, V9.0.176 cuDNN Version: 7.0 Tensorflow-gpu Version: 1.6.0 Python Version: 3.5.0
Good luck!
I’ve met the same issue. However, I found that after I installed CUDA 9.0, my driver will not be the latest version. SO, try to update your Nvdia driver to the latest version and restart your PC. It works for me!
I have the same problem with GTX 960m, cudnn5.1.5 and cuda-8.0.44.
I am also getting the
CUDNN_STATUS_NOT_INITIALIZED
error. Here is the full error log:I am on Windows 10, CUDA 8.0, cuDNN 5.1 . Can anything be done to avoid these? I was able to run earlier some other tensorflow tests and it worked fine (including conv op), but now it doesn’t work on this new test…
@serans1 What zombie processes are you referring to?
Please let me know if there is a workaround for this. Thank you!
EDIT This might have been a newbie mistake, but I will just mention it here, in case someone else runs in the same issue: My problem was that I already had running an instance of a Jupyter Python Notebook (whose cells were all ran already, hence loaded in the memory), and also some other process that was taking up GPU memory (minimized video game). Therefore, when I checked the memory usage on my GPU, it was already at around 4+GB (50+%). I closed the Jupyter Notebook and the other application, and re-ran my tensorflow test. Now everything ran smoothly 😃 Also, while running I noticed that at peak it uses up to 90% of my GPU memory, and thus it makes sense why it couldn’t initialize CUDNN when it had less than 50% available in my initial situation.
Sorry again for my mistake! I’m just at the beginning of playing around with this 😃
run this fix the issue.
sudo rm -rf ~/.nv
it worked for me when adding these lines of code to the begining of script @Codersadis
add the following code to the very beginning of the .py file, which solves my problem.
from future import print_function, division import tensorflow as tf from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
set_session(tf.Session(config=config))
Adding this in the begining of the file worked for me:
config = tf.ConfigProto() config.gpu_options.allow_growth = True sess = tf.Session(config=config)
I have exactly same issue. But I can run my codes with root access(with sudo). Currently I’m working on Ubuntu 16.04 with GTX 960. My CUDA version is 8.0 and I’m using tensorflow 1.01
I had the same issue with gtx1060, win8.1, cuda8.0.60, cudnn5.0. Upgraded to the latest stable tensorflow-gpu nightly build (currently http://ci.tensorflow.org/job/nightly-win/133/) and cudnn5.1. Problem solved.
I was getting the following error with tensorflow 2.0 in my conda environment.
so i added the following code to my CNN
My output is now
As everyone suggested it is due to tensorflow using all of the GPU/GPUs. My CNN trains without error now.
I have similar issue: CUDNN_STATUS_ALLOC_FAILED. I broke my head for 3-4 hours. Finally fixed. this indeed works, as mentioned above by many : config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config)
But the key is to write it immediately below “import tensorflow as tf” which I wasn’t doing. I had written it after all the imports.
In my case, this happened because other tensorflow instances were holding the GPU. (Other scripts running.)
Could I propose a better error messages? Say, “Error: other tensorflow instances running, while only a single one is supported.”
If you are using the latest tensorflow and keras. Try this from here, it worked for me:
GTX1070 CUDA9.0 CUDNN7.1 for CUDA9.0 TensorFlow 1.10.1 Runing a simple tensorflow like hello world without problem. Nowhere to know why this happen…
In my case (Windows 10), this problem was caused by using the wrong version of cuDNN. Although I followed TensorFlow’s official instructions closely, I accidentally had downloaded version 7.0.5 for CUDA 9.1, while TF calls explicitly for CUDA 9.0.
As soon as I corrected the cuDNN mistake, my convnets started working 💯 👍 🥇 😃
This problem is generally related to the version of cuda and GPU memory, if former, the easiest way is to change your cuda version by Anaconda!if later, you can find some ways to solve in other answers. 这个问题一般与显存和cuda版本有关,如果尝试了上面的更改GPU memory的方法无效,考虑更改cuda版本,最简单的方法是不用去管系统装了什么cuda版本,直接在Anaconda中的项目环境下修改cuda版本即可,亲测有效。
Got the same problem with Win10/Anaconda3/tf-1.3/keras-2.1.3 add the following code to the very beginning of the .py file, which solves my problem.
I had the same problem in Ubuntu 16.04 and cuda-8.0 (with GTX1080Ti). I’d just like to inform any of you with the same problem that the solution given by @SimonWalsh1000 worked for me perfectly (i.e., the CUBLAS initialisation problem was solved by
sudo rm -rf ~/.nv/
). So, many thanks @SimonWalsh1000, it did cost me some hours…Confirming @strickon 's suggestion works for me.
Am running https://github.com/awjuliani/DeepRL-Agents/blob/master/Double-Dueling-DQN.ipynb and was getting the failures mentioned in this thread on the first call to sess.run within the update block ( The line:
Q1 = sess.run(mainQN.predict,feed_dict={mainQN.scalarInput:np.vstack(trainBatch[:,3])})
.Adding the allow_growth flag (as per below) got me past this bump - the code is currently running in the background, we’ll see how far it goes.
Stack:
I’d be fine with dumping more stats on request.
I’m encountering the same problem. The graph will run fine when forced to the cpu, but crashed on the gpu.
Environment
OS: macOS 10.12.2 GPU: GeForce GT 750M TF: 0.12.1 (pip install) Python: 3.6.0 CUDA: 8.0 cuDNN: 5.1
(output of
ls -l /path/to/cuda/lib/libcud*
):Example
The minimal example provided by @ymfa both fails and succeeds on my setup. The following are three outputs that have been produced. fail(1)
fail(2)
pass
I can confirm that @ymfa minimal example fails on MacOS NVidia 750, but also same example works on Linux/Titan X
E tensorflow/stream_executor/cuda/cuda_dnn.cc:353] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
I had this issue with 10.1 Cuda+cuDNN7.5 and TF 1.11 compiled from source with cuda. The script I was trying to use needed these lines inserted somewhere:
config = tf.ConfigProto() config.gpu_options.allow_growth = True
and then later:
sess = tf.Session(graph=detection_graph,config=config)
This done, a lot of “GPU out of memory errors” - but detection goes on very quickly as I suppose it should when we’re using GPU. Thanks for sharing!
Great,when i decrease the gpu_memory_fraction from 0.8 to 0.7,it start working!
I got this error on windows 10 with CUDA 9.0 and GTX 1060. python 3.5 tensorflow-gpu 1.5.0 I find a easy way to solve it : update my NVIDIA Display Driver to the newest version,reboot PC then it worked!
In my case the same issue was resolved by updating the NVIDIA gpu driver.
having the same problem with gtx650, ubuntu 16.04, CUDA Version 8.0.61, TF version 1.0.0 it was working just now, but giving some low memory warnings. However, it was running Now it doesn’t run at all, giving me same Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) error
Have the same problem windows 10 cudnn 5.1 cuda 8 gtx 1060. Program works on cpu version of tensor flow but get these same errors with the gpu version.
Have the same problem with centOS, titan X
This also resolved the issue for me.
GeForce GTX 1050, CUDA 10.0
Note: this is the only thing I can find that works in TF 2.0 for now. Thanks!
I was facing the same problem when using the community supported version of tensorflow inside a conda environment (i.e. using > conda install tensorflow-gpu )
Turns out this version is not actually good in all situations (even though I’ve been using it on other machines). The best version to use is the pip installable version https://www.tensorflow.org/install/pip inside a conda environment. When I did this everything worked.
I faced this same problem. In my case i was running Jupyter notebook while training my network. Closing Jupyter notebook fixed my problem.
(I think it might have to do something with too high demands of my GPU)
Hope this helped!
In my case, I forgot to close jupyter notebook when I started to run another piece of code in VS code, Close jupyter notebook fixed the problem.
For me the problem was using wrong cudnn lib I used cudnn for cuda 9.1 when I had cuda 9.0. So i reinstalled cudnn for cuda 9.0 and everything worked.
Using: cudnn-9.0-windows10-x64-v7 and tensorflow-gpu==1.7.0
tutorials\image\imagenet>python classify_image.py fails with error: could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Adding the three lines of code from ggranum above solves the problem
Hi Guys,
I have just got the same problem " E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) "
and solved by: 1- Updating the NVIDIA Geforce920M’s driver 2- Setting properly the tf session as follows: config = tf.ConfigProto() config.gpu_options.allow_growth = True sess = tf.Session(config=config) 3- Restarting the Pc
After that I got a more precised error message: “cuDNN7.1 found, but cuDNN7.0 expected. Upgrade”
And solved by: instead of upgrading the rest(tf,cuda,…) to meet cuDNN, I rather downgraded cuDNN7.0 to meet the rest. (downgrading cuDNN from 7.1 to 7.0.4 ) and it worked good.
Same error on Python3.5, ubuntu 16.04, tf1.5 Updating the gpu driver to version of 390.42 solved this issue for me.
For me putting: config.gpu_options.allow_growth = True in the tensorflow session fixed the problem. Cuda 8, tf 1.4, cudnn 6
I agree with @strickon : it seems to be an memory allocation issue. I had a notebook with tensorflow program running and I tried to run a python + tensorflow in another Windows terminal and got the error. Then I restarted my notebook (release GPU memory) and tried to run the python on Windows terminal again and it worked! I think that tensorflow should provide a better error message to advise the user with a more detailed explanation.
Hi, I got the same question. However, I found the reason is that I used tensorflow twice at the same time.
For example, I usually used the Jupyter notebook for the simple script and used the PyCharm for the project. If I didn’t shut down the jupyter notebook , I could meet this error in the Pycharm.
Wish this could help.
WIndows10 64, NVIDIA TitanX , Driver 385.41, Cuda 8.0.60 Cudnn 6.0 Python 3.5.2 Tensorflow 1.3
Same issue with Windows 10, GTX770, CUDA 8.0, CUDNN 5.1, TF-GPU 1.1.0, not sure where to get the device driver version but Windows Device Manager reports 21.21.13.7651 for the display driver.
@ggranum’s fix worked for me:
I’ve the same issue running my own scripts now. I think it is the same reason like @lockywolf described:
I had this error quite often but irregular, then i followed @RawthiL 's lead and added a session to my script. However, i executed the script successfully restarted the kernel and got the same error message again. Is there any solution to open the session, claim the GPU and close it after the calculation is done?
cheers!
Edit: Beside @RawthiL 's solution i followed the Keras TF introduction where they say:
Having the same issue with gtx 1080 ti, windows 10, CUDA Version 8.0.61, TF version 1.0.1, 5.1 Cudann, cuda 8.0.61
@EncodeTS I just added a minimal reproducible example to my first post. Could you check if it reproduces the problem on your machine? On my machine, one convolutional layer works but not two convolutional layers, which led me to think that the problem might be caused by some resource limitations.