tensorflow: [Windows] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED

NOTE: Only file GitHub issues for bugs and feature requests. All other topics will be closed.

For general support from the community, see StackOverflow. To make bugs and feature requests more easy to find and organize, we close issues that are deemed out of scope for GitHub Issues and point people to StackOverflow.

For bugs or installation issues, please provide the following information. The more information you provide, the more easily we will be able to offer help and advice.

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

Only one, but it was not solved.

Environment info

Operating System: Windows 10 (anaconda 4.3.8) conda --version conda 4.3.8 Installed version of CUDA and cuDNN: nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Sat_Sep__3_19:05:48_CDT_2016 Cuda compilation tools, release 8.0, V8.0.44

If installed from binary pip package, provide:

A link to the pip package you installed: pip install --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-0.12.0rc0-cp35-cp35m-win_amd64.whl
The output from python -c "import tensorflow; print(tensorflow.__version__)". I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally

If possible, provide a minimal reproducible example (We usually don’t have time to read hundreds of lines of your code)

When I tried Single GPU computing example with tensorflow and get the following error: Placeholder_1: (Placeholder): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] Placeholder_1: (Placeholder)/job:localhost/replica:0/task:0/gpu:0 MatMul_10: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_10: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_11: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_11: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_12: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_12: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_13: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_13: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_14: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_14: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_15: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_15: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_16: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_16: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_17: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_17: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_18: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_18: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_19: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_19: (MatMul)/job:localhost/replica:0/task:0/gpu:0 Placeholder: (Placeholder): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] Placeholder: (Placeholder)/job:localhost/replica:0/task:0/gpu:0 MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_1: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_1: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_2: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_2: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_3: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_3: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_4: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_4: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_5: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_5: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_6: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_6: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_7: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_7: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_8: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_8: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_9: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_9: (MatMul)/job:localhost/replica:0/task:0/gpu:0 AddN: (AddN): /job:localhost/replica:0/task:0/cpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] AddN: (AddN)/job:localhost/replica:0/task:0/cpu:0 E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support

What other attempted solutions have you tried?

Then I tried a sample matrix multiplication: sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties: name: Quadro M2000M major: 5 minor: 0 memoryClockRate (GHz) 1.137 pciBusID 0000:01:00.0 Total memory: 4.00GiB Free memory: 3.35GiB I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M2000M, pci bus id: 0000:01:00.0) Device mapping: /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Quadro M2000M, pci bus id: 0000:01:00.0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\direct_session.cc:255] Device mapping: /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Quadro M2000M, pci bus id: 0000:01:00.0

and the following error:

a = tf.random_normal((100,100)) b = tf.random_normal((100,500)) c = tf.matmul(a,b) sess.run(c) random_normal_1/RandomStandardNormal: (RandomStandardNormal): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/RandomStandardNormal: (RandomStandardNormal)/job:localhost/replica:0/task:0/gpu:0 random_normal_1/mul: (Mul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/mul: (Mul)/job:localhost/replica:0/task:0/gpu:0 random_normal_1: (Add): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1: (Add)/job:localhost/replica:0/task:0/gpu:0 random_normal/RandomStandardNormal: (RandomStandardNormal): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/RandomStandardNormal: (RandomStandardNormal)/job:localhost/replica:0/task:0/gpu:0 random_normal/mul: (Mul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/mul: (Mul)/job:localhost/replica:0/task:0/gpu:0 random_normal: (Add): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal: (Add)/job:localhost/replica:0/task:0/gpu:0 MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0 random_normal_1/stddev: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/stddev: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal_1/mean: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/mean: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal_1/shape: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/shape: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal/stddev: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/stddev: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal/mean: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/mean: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal/shape: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/shape: (Const)/job:localhost/replica:0/task:0/gpu:0 E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support `

Like @kingtaurus and @menggangmark,

I then copied the cudnn64_5.dll (cuda\bin\cudnn64_5.dll) from that zip archive into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin;

cuda\include\cudnn.h to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include;

and

cuda\lib\x64\cudnn.lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64\

cupti64_80.dll (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\extras\CUPTI\libx64) to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin; and cupti.lib(C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\extras\CUPTI\libx64) to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64

WHERE C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0 is my install PATH for the CUDA toolkit. I had already added C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\ to my PATH

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 38 (6 by maintainers)

Most upvoted comments

@mingrutar - Correction: config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config)

+117

shingte on Jun 4, 2017

if you are using keras, just put this code before loading the model

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True  # dynamically grow the memory used on the GPU
config.log_device_placement = True  # to log device placement (on which device the operation ran)
                                    # (nothing gets printed in Jupyter, only if you run it standalone)
sess = tf.Session(config=config)
set_session(sess)  # set this TensorFlow session as the default session for Keras

+17

karthikeyan19 on Sep 18, 2018

@mingrutar - Correction: config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config)

Didint work for tensorflow 2.0

Found a different solution


import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

dmitryponv on Sep 5, 2020

I’ve had this too sometimes, and in my case it was related to GPU memory being used by another process. For instance, a game, NVidia RTX Voice, or just a pycharm debugging session that hadn’t stopped cleanly to free the memory.

You can see the memory status of your nvidia gpu with the command nvidia-smi in a command prompt.

Restarting your pc will help, closing the other program, or restarting pycharm.

Lambik on May 12, 2020

If you’re using PyCharm, don’t enable tf.enable_eager_execution() in Python Console while testing your other .py file at the same time.

nyngwang on Jun 10, 2018

Hello, I just solved the problem coping my code from the E: drive and running it in C:. There seem to be problems to find the path. I hope it helps.

juandarango on Dec 1, 2017

@mingrutar - use this instead: tf.Session(config=tf.ConfigProto(allow_growth=True))

shingte on Jun 4, 2017

if you are using keras, just put this code before loading the model

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True  # dynamically grow the memory used on the GPU
config.log_device_placement = True  # to log device placement (on which device the operation ran)
                                    # (nothing gets printed in Jupyter, only if you run it standalone)
sess = tf.Session(config=config)
set_session(sess)  # set this TensorFlow session as the default session for Keras

If you are using TensorFlow 2 then change it to

from tensorflow.compat.v1.keras.backend import set_session
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True  # dynamically grow the memory used on the GPU
config.log_device_placement = True  # to log device placement (on which device the operation ran)
                                    # (nothing gets printed in Jupyter, only if you run it standalone)
sess = tf.compat.v1.Session(config=config)
set_session(sess)  # set this TensorFlow session as the default session for Keras

jackz314 on Jun 3, 2020

I was running a game in the background. Closing that cleared up the error.

PseudoDesign on Jan 31, 2018

I had a similar issue, what worked for me was freeing up hard disk space which was apparently getting full.

tonmoyborah on Jan 25, 2018

The tests for tf.matmul() pass on Windows, so I don’t think this is a Windows-specific issue: perhaps the following lines are a clue to the root cause?

E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_driver.cc:1002] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_driver.cc:1002] failed to allocate 1.80G (1932735232 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

mrry on Feb 22, 2017

Hello, I just solved the problem coping my code from the E: drive and running it in C:. There seem to be problems to find the path. I hope it helps.

I solved my problem with running the program as an administrator

tekinalpturk on Apr 29, 2020

Following as @gussmith23 suggested, a simple reboot did the trick for me.

edlabbe on Nov 7, 2019

At first, none of the above worked for me – then I simply restarted my computer, and stopped getting the error (at least, I haven’t gotten it yet). Earlier, I had force-killed a bunch of TF processes – I’m wondering if GPU memory was not freed? TF always prints

totalMemory: 4.00GiB freeMemory: 3.31GiB

at the start of each run, but could this have been wrong, or maybe it’s referring to something else? I really don’t know. Anyway, glad it’s working for now.

gussmith23 on Apr 18, 2018

I deleted everything and installed them again. Then, I used the code above, and it worked.

maarab-sfu on Dec 29, 2017

I run into a similar problem. The task manager shows 60+% free memory (main memory), gpu has 6.4 GiB free memory when start. Got CUBLAS_STATUS_ALLOC_FAILED. Is there a way to free the GPU memory?

mingrutar on May 24, 2017