tensorflow: [Windows] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED

NOTE: Only file GitHub issues for bugs and feature requests. All other topics will be closed.

For general support from the community, see StackOverflow. To make bugs and feature requests more easy to find and organize, we close issues that are deemed out of scope for GitHub Issues and point people to StackOverflow.

For bugs or installation issues, please provide the following information. The more information you provide, the more easily we will be able to offer help and advice.

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

Only one, but it was not solved.

Environment info

Operating System: Windows 10 (anaconda 4.3.8) conda --version conda 4.3.8 Installed version of CUDA and cuDNN: nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Sat_Sep__3_19:05:48_CDT_2016 Cuda compilation tools, release 8.0, V8.0.44

If installed from binary pip package, provide:

  1. A link to the pip package you installed: pip install --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-0.12.0rc0-cp35-cp35m-win_amd64.whl
  2. The output from python -c "import tensorflow; print(tensorflow.__version__)". I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally

If possible, provide a minimal reproducible example (We usually don’t have time to read hundreds of lines of your code)

When I tried Single GPU computing example with tensorflow and get the following error: Placeholder_1: (Placeholder): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] Placeholder_1: (Placeholder)/job:localhost/replica:0/task:0/gpu:0 MatMul_10: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_10: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_11: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_11: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_12: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_12: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_13: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_13: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_14: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_14: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_15: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_15: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_16: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_16: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_17: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_17: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_18: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_18: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_19: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_19: (MatMul)/job:localhost/replica:0/task:0/gpu:0 Placeholder: (Placeholder): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] Placeholder: (Placeholder)/job:localhost/replica:0/task:0/gpu:0 MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_1: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_1: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_2: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_2: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_3: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_3: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_4: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_4: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_5: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_5: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_6: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_6: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_7: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_7: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_8: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_8: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_9: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_9: (MatMul)/job:localhost/replica:0/task:0/gpu:0 AddN: (AddN): /job:localhost/replica:0/task:0/cpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] AddN: (AddN)/job:localhost/replica:0/task:0/cpu:0 E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support

What other attempted solutions have you tried?

Then I tried a sample matrix multiplication: sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties: name: Quadro M2000M major: 5 minor: 0 memoryClockRate (GHz) 1.137 pciBusID 0000:01:00.0 Total memory: 4.00GiB Free memory: 3.35GiB I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M2000M, pci bus id: 0000:01:00.0) Device mapping: /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Quadro M2000M, pci bus id: 0000:01:00.0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\direct_session.cc:255] Device mapping: /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Quadro M2000M, pci bus id: 0000:01:00.0

and the following error:

a = tf.random_normal((100,100)) b = tf.random_normal((100,500)) c = tf.matmul(a,b) sess.run(c) random_normal_1/RandomStandardNormal: (RandomStandardNormal): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/RandomStandardNormal: (RandomStandardNormal)/job:localhost/replica:0/task:0/gpu:0 random_normal_1/mul: (Mul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/mul: (Mul)/job:localhost/replica:0/task:0/gpu:0 random_normal_1: (Add): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1: (Add)/job:localhost/replica:0/task:0/gpu:0 random_normal/RandomStandardNormal: (RandomStandardNormal): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/RandomStandardNormal: (RandomStandardNormal)/job:localhost/replica:0/task:0/gpu:0 random_normal/mul: (Mul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/mul: (Mul)/job:localhost/replica:0/task:0/gpu:0 random_normal: (Add): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal: (Add)/job:localhost/replica:0/task:0/gpu:0 MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0 random_normal_1/stddev: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/stddev: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal_1/mean: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/mean: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal_1/shape: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/shape: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal/stddev: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/stddev: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal/mean: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/mean: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal/shape: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/shape: (Const)/job:localhost/replica:0/task:0/gpu:0 E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support `

Like @kingtaurus and @menggangmark,

I then copied the cudnn64_5.dll (cuda\bin\cudnn64_5.dll) from that zip archive into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin;

cuda\include\cudnn.h to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include;

and

cuda\lib\x64\cudnn.lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64\

cupti64_80.dll (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\extras\CUPTI\libx64) to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin; and cupti.lib(C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\extras\CUPTI\libx64) to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64

WHERE C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0 is my install PATH for the CUDA toolkit. I had already added C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\ to my PATH

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 38 (6 by maintainers)

Most upvoted comments

@mingrutar - Correction: config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config)

if you are using keras, just put this code before loading the model

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True  # dynamically grow the memory used on the GPU
config.log_device_placement = True  # to log device placement (on which device the operation ran)
                                    # (nothing gets printed in Jupyter, only if you run it standalone)
sess = tf.Session(config=config)
set_session(sess)  # set this TensorFlow session as the default session for Keras

@mingrutar - Correction: config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config)

Didint work for tensorflow 2.0

Found a different solution


import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

I’ve had this too sometimes, and in my case it was related to GPU memory being used by another process. For instance, a game, NVidia RTX Voice, or just a pycharm debugging session that hadn’t stopped cleanly to free the memory.

You can see the memory status of your nvidia gpu with the command nvidia-smi in a command prompt.

Restarting your pc will help, closing the other program, or restarting pycharm.

If you’re using PyCharm, don’t enable tf.enable_eager_execution() in Python Console while testing your other .py file at the same time.

Hello, I just solved the problem coping my code from the E: drive and running it in C:. There seem to be problems to find the path. I hope it helps.

@mingrutar - use this instead: tf.Session(config=tf.ConfigProto(allow_growth=True))

if you are using keras, just put this code before loading the model

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True  # dynamically grow the memory used on the GPU
config.log_device_placement = True  # to log device placement (on which device the operation ran)
                                    # (nothing gets printed in Jupyter, only if you run it standalone)
sess = tf.Session(config=config)
set_session(sess)  # set this TensorFlow session as the default session for Keras

If you are using TensorFlow 2 then change it to

from tensorflow.compat.v1.keras.backend import set_session
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True  # dynamically grow the memory used on the GPU
config.log_device_placement = True  # to log device placement (on which device the operation ran)
                                    # (nothing gets printed in Jupyter, only if you run it standalone)
sess = tf.compat.v1.Session(config=config)
set_session(sess)  # set this TensorFlow session as the default session for Keras

I was running a game in the background. Closing that cleared up the error.

I had a similar issue, what worked for me was freeing up hard disk space which was apparently getting full.

The tests for tf.matmul() pass on Windows, so I don’t think this is a Windows-specific issue: perhaps the following lines are a clue to the root cause?

E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_driver.cc:1002] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_driver.cc:1002] failed to allocate 1.80G (1932735232 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

Hello, I just solved the problem coping my code from the E: drive and running it in C:. There seem to be problems to find the path. I hope it helps.

I solved my problem with running the program as an administrator

Following as @gussmith23 suggested, a simple reboot did the trick for me.

At first, none of the above worked for me – then I simply restarted my computer, and stopped getting the error (at least, I haven’t gotten it yet). Earlier, I had force-killed a bunch of TF processes – I’m wondering if GPU memory was not freed? TF always prints

totalMemory: 4.00GiB freeMemory: 3.31GiB

at the start of each run, but could this have been wrong, or maybe it’s referring to something else? I really don’t know. Anyway, glad it’s working for now.

I deleted everything and installed them again. Then, I used the code above, and it worked.

I run into a similar problem. The task manager shows 60+% free memory (main memory), gpu has 6.4 GiB free memory when start. Got CUBLAS_STATUS_ALLOC_FAILED. Is there a way to free the GPU memory?