tensorflow: Crash: Could not create cuDNN handle when convnets are used

Tensorflow (GPU) was imported successfully, but when running a session that involves a convolutional neural network (CNN), Python crashes with the following message:

E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 

The problem persists on any combination of CUDA toolkit 7.5/8.0 and Tensorflow installed from pip/source. Test sessions that do not use CNNs are run successfully.

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

The issue is similar to https://github.com/tensorflow/tensorflow/issues/6586, where I first commented. But since I experience the problem on a Mac, I was suggested to open a separate issue.

Environment info

Operating System: macOS Sierra 10.12.2 Xcode version 8.2 (8C38) (When I later tried CUDA 7.5, I installed Command Line Tools version 7.3.1 because CUDA 7.5 lacked support of the more recent compilers.) Python 3.5.2 (anaconda)

Installed version of CUDA: tried both 8.0 (initially) and 7.5 (reported here, toolkit only – the driver is still 8.0) Installed version of cuDNN: 5.1 (different installations according to CUDA versions) (please attach the output of ls -l /path/to/cuda/lib/libcud*):

lrwxr-xr-x  1 root   wheel        33  5 Jan 20:33 /usr/local/cuda/lib/libcuda.1.dylib -> /usr/local/cuda/lib/libcuda.dylib
-rwxr-xr-x@ 1 root   wheel      8280 13 Apr  2016 /usr/local/cuda/lib/libcuda.dylib
lrwxr-xr-x@ 1 root   wheel        45 13 Apr  2016 /usr/local/cuda/lib/libcudadevrt.a -> /Developer/NVIDIA/CUDA-7.5/lib/libcudadevrt.a
lrwxr-xr-x@ 1 root   wheel        50 13 Apr  2016 /usr/local/cuda/lib/libcudart.7.5.dylib -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart.7.5.dylib
lrwxr-xr-x@ 1 root   wheel        46 13 Apr  2016 /usr/local/cuda/lib/libcudart.dylib -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart.dylib
lrwxr-xr-x@ 1 root   wheel        49 13 Apr  2016 /usr/local/cuda/lib/libcudart_static.a -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart_static.a
lrwxr-xr-x  1 root   wheel        16  5 Jan 17:14 /usr/local/cuda/lib/libcudnn.5 -> libcudnn.5.dylib
-rwxr-xr-x@ 1 ymfa   staff  58975112 10 Jun  2016 /usr/local/cuda/lib/libcudnn.5.dylib
lrwxr-xr-x@ 1 ymfa   staff        16 10 Jun  2016 /usr/local/cuda/lib/libcudnn.dylib -> libcudnn.5.dylib
lrwxr-xr-x  1 root   wheel        16  5 Jan 17:14 /usr/local/cuda/lib/libcudnn5.dylib -> libcudnn.5.dylib
-rw-r--r--@ 1 ymfa   staff  56392320 10 Jun  2016 /usr/local/cuda/lib/libcudnn_static.a

I tried both installing from pip and source. I first installed from binary pip package:

  1. A link to the pip package you installed: tensorflow-gpu
  2. The output from python -c "import tensorflow; print(tensorflow.__version__)". 0.12.head

Later I installed from source (the pip package was uninstalled):

  1. The commit hash (git rev-parse HEAD) d67c09d98a576e1fbf2f3609ddb842e53890f31c

  2. The output of bazel version

    Build label: 0.4.3-homebrew Build target: bazel-out/local-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar Build time: Thu Dec 22 15:20:15 2016 (1482420015) Build timestamp: 1482420015 Build timestamp as int: 1482420015

If possible, provide a minimal reproducible example

I made a minimal example by simplifying the network and reducing the training data to only twenty images and two classes for classification. issue.zip contains the Python code and the data. I wrote two convolutional layers because I found the network with only one convolutional layer runs without problem.

Complete log using CUDA 7.5 and Tensorflow compiled from source

I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.7.5.dylib locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.9
pciBusID 0000:01:00.0
Total memory: 1023.69MiB
Free memory: 740.18MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 

Complete log using CUDA 8.0 and Tensorflow installed from pip

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.9
pciBusID 0000:01:00.0
Total memory: 1023.69MiB
Free memory: 590.00MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E tensorflow/stream_executor/cuda/cuda_dnn.cc:392] error retrieving driver version: Invalid argument: expected %d.%d or %d.%d.%d form for driver version; got ""
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 30
  • Comments: 147 (9 by maintainers)

Most upvoted comments

Here is a bit more info on how I temporarily resolved it. I believe these issues are all related to GPU memory allocation and have nothing to do with the errors being reported. There were other errors before this indicating some sort of memory allocation problem but the program continued to progress, eventually giving the cudnn errors that everyone is getting. The reason I believe it works sometimes is that if you use the gpu for other things besides tensorflow such as your primary display, the available memory fluctuates. Sometimes you can allocate what you need and other times it can’t.

From the API https://www.tensorflow.org/versions/r0.12/how_tos/using_gpu/ “By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation.”

I think this default allocation is broken in some way that causes this erratic behavior and certain situations to work and others to fail.

I have resolved this issue by changing the default behavior of TF to allocate a minimum amount of memory and grow as needed as detailed in the webpage. config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config, …)

I have also tried the alternate way and was able to get it to work and fail with experimentally choosing a percentage that worked. In my case it ended up being about .7.

config = tf.ConfigProto() config.gpu_options.per_process_gpu_memory_fraction = 0.4 session = tf.Session(config=config, …)

Still no word from anyone on the TF team confirming this but it is worth a shot to see if others can confirm similar behavior.

I was able to get a program to work by limiting the gpu usage. In my case with a 3gb gtx 1060 on ubuntu 16.04, if I set gpu option per_process_gpu_memory_fraction to .7 it works. Anything higher, I get these errors

E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

It could be a case of bad error reporting by tensorflow. Seems completely unrelated. Maybe it is a clue to getting this resolved in a better manner?

Just for those who are driven mad by this:

I occasionally got a CUBLAS error as well. So I did this:

cd /usr/local/cuda/samples/7_CUDALibraries/simpleCUBLAS make ./simpleCUBLAS

and discovered that I could not initialise CUBLAS

So next I did this (based on advice)

sudo rm -f ~/.nv

And it worked. Cheers…thats 4 days wasted. Hope this saves someone else

If it helps anyone, seems there are sometimes zombie processes left which prevent from tf to start again properly and gave me this error. killing them work around the issue.

Same issue too. I’m on Windows 10, GTX1070, CUDA 8.0, cuDNN 5.1.

E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:359] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:366] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:326] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\kernels\conv_ops.cc:659] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

I faced this issue after accidentally upgrading tensorflow-gpu from version 1.6.0 to 1.18.0. This caused instability due to the versions both of CUDA and cuDNN. The solution was rolling back to tensorflow-gpu 1.6.0.

This was the solution to my problems:

https://stackoverflow.com/questions/50622525/which-tensorflow-and-cuda-version-combinations-are-compatible

Whenever you start facing facing this kind of issues, before you upgrade your NVIDIA dependencies, ALWAYS try to solve the problem by uninstalling the versions of tensorflow and installing a version compatible with your CUDA dependencies first.

Step 1: Check your tensorflow packages versions. If you have GPU, I recommend uninstalling the cpu-version of tensorflow in order to avoid conflicts.

pip list | grep tensorflow

Step 2: Uninstalling tensorflow-gpu.

pip uninstall tensorflow

Step 3: Check your CUDA and cuDNN versions. You may need to adjust these paths.

– CUDA cat /usr/local/cuda/version.txt In case this fails, find your cuda version text file using: sudo find / -name version.txt

– cuDNN cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2 In case this fails, find your cuda version text file using: sudo find / -name cudnn.h

Step 4: Check if your tensorflow-gpu, cuda and cudnn versions match this table. image

In my case, I needed tensorflow-gpu 1.6.0 in order to match the other requirements.

So I installed this version using: pip install tensorflow-gpu==1.6.0 these are the specifications that worked!

OS: Ubuntu 16.04 CUDA Version: 9.0, V9.0.176 cuDNN Version: 7.0 Tensorflow-gpu Version: 1.6.0 Python Version: 3.5.0

Good luck!

I’ve met the same issue. However, I found that after I installed CUDA 9.0, my driver will not be the latest version. SO, try to update your Nvdia driver to the latest version and restart your PC. It works for me!

I have the same problem with GTX 960m, cudnn5.1.5 and cuda-8.0.44.

I am also getting the CUDNN_STATUS_NOT_INITIALIZED error. Here is the full error log:

2017-04-26 00:08:57.526234: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
2017-04-26 00:09:01.111706: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:359] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2017-04-26 00:09:01.111805: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:366] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2017-04-26 00:09:01.114040: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:326] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-04-26 00:09:01.114232: F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\kernels\conv_ops.cc:659] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

I am on Windows 10, CUDA 8.0, cuDNN 5.1 . Can anything be done to avoid these? I was able to run earlier some other tensorflow tests and it worked fine (including conv op), but now it doesn’t work on this new test…

@serans1 What zombie processes are you referring to?

Please let me know if there is a workaround for this. Thank you!

EDIT This might have been a newbie mistake, but I will just mention it here, in case someone else runs in the same issue: My problem was that I already had running an instance of a Jupyter Python Notebook (whose cells were all ran already, hence loaded in the memory), and also some other process that was taking up GPU memory (minimized video game). Therefore, when I checked the memory usage on my GPU, it was already at around 4+GB (50+%). I closed the Jupyter Notebook and the other application, and re-ran my tensorflow test. Now everything ran smoothly 😃 Also, while running I noticed that at peak it uses up to 90% of my GPU memory, and thus it makes sense why it couldn’t initialize CUDNN when it had less than 50% available in my initial situation.

Sorry again for my mistake! I’m just at the beginning of playing around with this 😃

run this fix the issue.

sudo rm -rf ~/.nv

it worked for me when adding these lines of code to the begining of script @Codersadis

add the following code to the very beginning of the .py file, which solves my problem.

from future import print_function, division import tensorflow as tf from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
set_session(tf.Session(config=config))

Adding this in the begining of the file worked for me:

config = tf.ConfigProto() config.gpu_options.allow_growth = True sess = tf.Session(config=config)

I have exactly same issue. But I can run my codes with root access(with sudo). Currently I’m working on Ubuntu 16.04 with GTX 960. My CUDA version is 8.0 and I’m using tensorflow 1.01

I had the same issue with gtx1060, win8.1, cuda8.0.60, cudnn5.0. Upgraded to the latest stable tensorflow-gpu nightly build (currently http://ci.tensorflow.org/job/nightly-win/133/) and cudnn5.1. Problem solved.

I was getting the following error with tensorflow 2.0 in my conda environment.

2019-12-03 23:49:06.381259: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-12-03 23:49:07.220066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
2019-12-03 23:49:07.236411: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-03 23:49:07.247476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-03 23:49:07.256881: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-12-03 23:49:07.269536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
2019-12-03 23:49:07.281954: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-03 23:49:07.295302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-03 23:49:08.589865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-03 23:49:08.599121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-12-03 23:49:08.610543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-12-03 23:49:08.616005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4627 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-12-03 23:49:58.521484: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-12-03 23:49:59.604517: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-12-03 23:50:04.209110: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-12-03 23:50:04.216670: E tensorflow/stream_executor/cuda/cuda_dnn.cc:333] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2019-12-03 23:50:04.226172: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-12-03 23:50:04.234741: E tensorflow/stream_executor/cuda/cuda_dnn.cc:333] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2019-12-03 23:50:04.244958: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node sequential/conv2d/Conv2D}}]]

so i added the following code to my CNN

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

My output is now

2019-12-04 00:10:07.708573: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2019-12-04 00:10:11.643304: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-12-04 00:10:12.753615: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
2019-12-04 00:10:12.769498: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-04 00:10:12.783900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-04 00:10:54.941468: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-12-04 00:10:55.372516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
2019-12-04 00:10:55.383385: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-04 00:10:55.406053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-04 00:10:56.741665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-04 00:10:56.747255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-12-04 00:10:56.752302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-12-04 00:10:56.756861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4627 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-12-04 00:11:08.281356: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-12-04 00:11:08.934804: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-12-04 00:11:11.870237: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.

As everyone suggested it is due to tensorflow using all of the GPU/GPUs. My CNN trains without error now.

I have similar issue: CUDNN_STATUS_ALLOC_FAILED. I broke my head for 3-4 hours. Finally fixed. this indeed works, as mentioned above by many : config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config)

But the key is to write it immediately below “import tensorflow as tf” which I wasn’t doing. I had written it after all the imports.

In my case, this happened because other tensorflow instances were holding the GPU. (Other scripts running.)

Could I propose a better error messages? Say, “Error: other tensorflow instances running, while only a single one is supported.”

If you are using the latest tensorflow and keras. Try this from here, it worked for me:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)
2018-09-03 22:50:26.576765: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2018-09-03 22:50:26.576831: E tensorflow/stream_executor/cuda/cuda_dnn.cc:360] Possibly insufficient driver version: 390.77.0
[1]    8515 segmentation fault (core dumped)  python3 training.py

GTX1070 CUDA9.0 CUDNN7.1 for CUDA9.0 TensorFlow 1.10.1 Runing a simple tensorflow like hello world without problem. Nowhere to know why this happen…

In my case (Windows 10), this problem was caused by using the wrong version of cuDNN. Although I followed TensorFlow’s official instructions closely, I accidentally had downloaded version 7.0.5 for CUDA 9.1, while TF calls explicitly for CUDA 9.0.

As soon as I corrected the cuDNN mistake, my convnets started working 💯 👍 🥇 😃

This problem is generally related to the version of cuda and GPU memory, if former, the easiest way is to change your cuda version by Anaconda!if later, you can find some ways to solve in other answers. 这个问题一般与显存和cuda版本有关,如果尝试了上面的更改GPU memory的方法无效,考虑更改cuda版本,最简单的方法是不用去管系统装了什么cuda版本,直接在Anaconda中的项目环境下修改cuda版本即可,亲测有效。

Got the same problem with Win10/Anaconda3/tf-1.3/keras-2.1.3 add the following code to the very beginning of the .py file, which solves my problem.

from __future__ import print_function, division
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session  
config = tf.ConfigProto()  
config.gpu_options.allow_growth = True  
set_session(tf.Session(config=config)) 

I had the same problem in Ubuntu 16.04 and cuda-8.0 (with GTX1080Ti). I’d just like to inform any of you with the same problem that the solution given by @SimonWalsh1000 worked for me perfectly (i.e., the CUBLAS initialisation problem was solved by sudo rm -rf ~/.nv/). So, many thanks @SimonWalsh1000, it did cost me some hours…

Confirming @strickon 's suggestion works for me.

Am running https://github.com/awjuliani/DeepRL-Agents/blob/master/Double-Dueling-DQN.ipynb and was getting the failures mentioned in this thread on the first call to sess.run within the update block ( The line: Q1 = sess.run(mainQN.predict,feed_dict={mainQN.scalarInput:np.vstack(trainBatch[:,3])}).

Adding the allow_growth flag (as per below) got me past this bump - the code is currently running in the background, we’ll see how far it goes.

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

Stack:

  • MacBook Pro, running Sierra 10.12.4, with NVIDIA GeForce GT 750M 2048 MB. Typically only have 1.7GB free.
  • TensorFlow 1.1 Using Anaconda install instructions.
  • Python 3.6, not virtual (Anaconda)
  • CUDA 8 / cuDNN 5

I’d be fine with dumping more stats on request.

I’m encountering the same problem. The graph will run fine when forced to the cpu, but crashed on the gpu.

Environment

OS: macOS 10.12.2 GPU: GeForce GT 750M TF: 0.12.1 (pip install) Python: 3.6.0 CUDA: 8.0 cuDNN: 5.1

(output of ls -l /path/to/cuda/lib/libcud*):

lrwxr-xr-x  1 root  wheel     33 Dec 14 14:25 /usr/local/cuda/lib/libcuda.1.dylib -> /usr/local/cuda/lib/libcuda.dylib
-rwxr-xr-x  1 root  wheel  13504 Dec  2 16:48 /usr/local/cuda/lib/libcuda.dylib
lrwxr-xr-x  1 root  wheel     45 Nov  3 11:40 /usr/local/cuda/lib/libcudadevrt.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcudadevrt.a
lrwxr-xr-x  1 root  wheel     50 Nov  3 11:40 /usr/local/cuda/lib/libcudart.8.0.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudart.8.0.dylib
lrwxr-xr-x  1 root  wheel     46 Nov  3 11:40 /usr/local/cuda/lib/libcudart.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudart.dylib
lrwxr-xr-x  1 root  wheel     49 Nov  3 11:40 /usr/local/cuda/lib/libcudart_static.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcudart_static.a
lrwxr-xr-x  1 root  wheel     47 Dec 14 10:21 /usr/local/cuda/lib/libcudnn.5.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudnn.5.dylib
lrwxr-xr-x  1 root  wheel     45 Dec 14 10:21 /usr/local/cuda/lib/libcudnn.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudnn.dylib
lrwxr-xr-x  1 root  wheel     48 Dec 14 10:21 /usr/local/cuda/lib/libcudnn_static.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcudnn_static.a

Example

The minimal example provided by @ymfa both fails and succeeds on my setup. The following are three outputs that have been produced. fail(1)

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.76GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
Training...
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 
Abort trap: 6

fail(2)

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.53GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
Training...
E tensorflow/stream_executor/cuda/cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
W tensorflow/stream_executor/stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1021, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1003, in _run_fn
    status, run_metadata)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(20, 400), b.shape=(400, 2), m=20, n=2, k=400
	 [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](Flatten/Reshape, Variable_4/read)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "issue.py", line 52, in <module>
    sess.run(training_operation, feed_dict={x: X, y: Y})
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(20, 400), b.shape=(400, 2), m=20, n=2, k=400
	 [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](Flatten/Reshape, Variable_4/read)]]

Caused by op 'MatMul', defined at:
  File "issue.py", line 43, in <module>
    logits = SimpleNet(x)
  File "issue.py", line 34, in SimpleNet
    logits = tf.matmul(fc1, fc1_W) + fc1_b
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1729, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1442, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

InternalError (see above for traceback): Blas SGEMM launch failed : a.shape=(20, 400), b.shape=(400, 2), m=20, n=2, k=400
	 [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](Flatten/Reshape, Variable_4/read)]]

pass

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.71GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
Training...
Training complete!

I can confirm that @ymfa minimal example fails on MacOS NVidia 750, but also same example works on Linux/Titan X

E tensorflow/stream_executor/cuda/cuda_dnn.cc:353] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I had this issue with 10.1 Cuda+cuDNN7.5 and TF 1.11 compiled from source with cuda. The script I was trying to use needed these lines inserted somewhere: config = tf.ConfigProto() config.gpu_options.allow_growth = True

and then later: sess = tf.Session(graph=detection_graph,config=config)

This done, a lot of “GPU out of memory errors” - but detection goes on very quickly as I suppose it should when we’re using GPU. Thanks for sharing!

I was able to get a program to work by limiting the gpu usage. In my case with a 3gb gtx 1060 on ubuntu 16.04, if I set gpu option per_process_gpu_memory_fraction to .7 it works. Anything higher, I get these errors

E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

It could be a case of bad error reporting by tensorflow. Seems completely unrelated. Maybe it is a clue to getting this resolved in a better manner?

Great,when i decrease the gpu_memory_fraction from 0.8 to 0.7,it start working!

I got this error on windows 10 with CUDA 9.0 and GTX 1060. python 3.5 tensorflow-gpu 1.5.0 I find a easy way to solve it : update my NVIDIA Display Driver to the newest version,reboot PC then it worked!

In my case the same issue was resolved by updating the NVIDIA gpu driver.

having the same problem with gtx650, ubuntu 16.04, CUDA Version 8.0.61, TF version 1.0.0 it was working just now, but giving some low memory warnings. However, it was running Now it doesn’t run at all, giving me same Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) error

Have the same problem windows 10 cudnn 5.1 cuda 8 gtx 1060. Program works on cpu version of tensor flow but get these same errors with the gpu version.

Have the same problem with centOS, titan X

This also resolved the issue for me.

GeForce GTX 1050, CUDA 10.0

Note: this is the only thing I can find that works in TF 2.0 for now. Thanks!

gpus = tf.config.experimental.list_physical_devices(‘GPU’) tf.config.experimental.set_memory_growth(gpus[0], True)

I was facing the same problem when using the community supported version of tensorflow inside a conda environment (i.e. using > conda install tensorflow-gpu )

Turns out this version is not actually good in all situations (even though I’ve been using it on other machines). The best version to use is the pip installable version https://www.tensorflow.org/install/pip inside a conda environment. When I did this everything worked.

I faced this same problem. In my case i was running Jupyter notebook while training my network. Closing Jupyter notebook fixed my problem.

(I think it might have to do something with too high demands of my GPU)

Hope this helped!

In my case, I forgot to close jupyter notebook when I started to run another piece of code in VS code, Close jupyter notebook fixed the problem.

For me the problem was using wrong cudnn lib I used cudnn for cuda 9.1 when I had cuda 9.0. So i reinstalled cudnn for cuda 9.0 and everything worked.

Using: cudnn-9.0-windows10-x64-v7 and tensorflow-gpu==1.7.0

tutorials\image\imagenet>python classify_image.py fails with error: could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Adding the three lines of code from ggranum above solves the problem

Hi Guys,

I have just got the same problem " E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) "

and solved by: 1- Updating the NVIDIA Geforce920M’s driver 2- Setting properly the tf session as follows: config = tf.ConfigProto() config.gpu_options.allow_growth = True sess = tf.Session(config=config) 3- Restarting the Pc

After that I got a more precised error message: “cuDNN7.1 found, but cuDNN7.0 expected. Upgrade”

And solved by: instead of upgrading the rest(tf,cuda,…) to meet cuDNN, I rather downgraded cuDNN7.0 to meet the rest. (downgrading cuDNN from 7.1 to 7.0.4 ) and it worked good.

Same error on Python3.5, ubuntu 16.04, tf1.5 Updating the gpu driver to version of 390.42 solved this issue for me.

For me putting: config.gpu_options.allow_growth = True in the tensorflow session fixed the problem. Cuda 8, tf 1.4, cudnn 6

I agree with @strickon : it seems to be an memory allocation issue. I had a notebook with tensorflow program running and I tried to run a python + tensorflow in another Windows terminal and got the error. Then I restarted my notebook (release GPU memory) and tried to run the python on Windows terminal again and it worked! I think that tensorflow should provide a better error message to advise the user with a more detailed explanation.

Hi, I got the same question. However, I found the reason is that I used tensorflow twice at the same time.

For example, I usually used the Jupyter notebook for the simple script and used the PyCharm for the project. If I didn’t shut down the jupyter notebook , I could meet this error in the Pycharm.

Wish this could help.


WIndows10 64, NVIDIA TitanX , Driver 385.41, Cuda 8.0.60 Cudnn 6.0 Python 3.5.2 Tensorflow 1.3

Same issue with Windows 10, GTX770, CUDA 8.0, CUDNN 5.1, TF-GPU 1.1.0, not sure where to get the device driver version but Windows Device Manager reports 21.21.13.7651 for the display driver.

connect  84557d348c06492e80ff0304d516367b
2017-08-11 15:51:41.974028: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:359] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2017-08-11 15:51:41.974536: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:366] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2017-08-11 15:51:41.974923: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:326] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-08-11 15:51:41.975194: F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\kernels\conv_ops.cc:659] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

@ggranum’s fix worked for me:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

I’ve the same issue running my own scripts now. I think it is the same reason like @lockywolf described:

In my case, this happened because other tensorflow instances were holding the GPU. (Other scripts running.)

I had this error quite often but irregular, then i followed @RawthiL 's lead and added a session to my script. However, i executed the script successfully restarted the kernel and got the same error message again. Is there any solution to open the session, claim the GPU and close it after the calculation is done?

cheers!

Edit: Beside @RawthiL 's solution i followed the Keras TF introduction where they say:

We should start by creating a TensorFlow session and registering it with Keras. This means that Keras will use the session we registered to initialize all variables that it creates internally.

import tensorflow as tf sess = tf.Session()

from keras import backend as K K.set_session(sess)

Having the same issue with gtx 1080 ti, windows 10, CUDA Version 8.0.61, TF version 1.0.1, 5.1 Cudann, cuda 8.0.61

@EncodeTS I just added a minimal reproducible example to my first post. Could you check if it reproduces the problem on your machine? On my machine, one convolutional layer works but not two convolutional layers, which led me to think that the problem might be caused by some resource limitations.