tensorflow: [Windows] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
NOTE: Only file GitHub issues for bugs and feature requests. All other topics will be closed.
For general support from the community, see StackOverflow. To make bugs and feature requests more easy to find and organize, we close issues that are deemed out of scope for GitHub Issues and point people to StackOverflow.
For bugs or installation issues, please provide the following information. The more information you provide, the more easily we will be able to offer help and advice.
What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?
Only one, but it was not solved.
Environment info
Operating System:
Windows 10 (anaconda 4.3.8)
conda --version conda 4.3.8
Installed version of CUDA and cuDNN:
nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Sat_Sep__3_19:05:48_CDT_2016 Cuda compilation tools, release 8.0, V8.0.44
If installed from binary pip package, provide:
- A link to the pip package you installed:
pip install --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-0.12.0rc0-cp35-cp35m-win_amd64.whl
- The output from
python -c "import tensorflow; print(tensorflow.__version__)"
.I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally
If possible, provide a minimal reproducible example (We usually don’t have time to read hundreds of lines of your code)
When I tried Single GPU computing example with tensorflow and get the following error:
Placeholder_1: (Placeholder): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] Placeholder_1: (Placeholder)/job:localhost/replica:0/task:0/gpu:0 MatMul_10: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_10: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_11: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_11: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_12: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_12: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_13: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_13: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_14: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_14: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_15: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_15: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_16: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_16: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_17: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_17: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_18: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_18: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_19: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_19: (MatMul)/job:localhost/replica:0/task:0/gpu:0 Placeholder: (Placeholder): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] Placeholder: (Placeholder)/job:localhost/replica:0/task:0/gpu:0 MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_1: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_1: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_2: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_2: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_3: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_3: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_4: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_4: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_5: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_5: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_6: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_6: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_7: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_7: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_8: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_8: (MatMul)/job:localhost/replica:0/task:0/gpu:0 MatMul_9: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_9: (MatMul)/job:localhost/replica:0/task:0/gpu:0 AddN: (AddN): /job:localhost/replica:0/task:0/cpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] AddN: (AddN)/job:localhost/replica:0/task:0/cpu:0 E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
What other attempted solutions have you tried?
Then I tried a sample matrix multiplication:
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties: name: Quadro M2000M major: 5 minor: 0 memoryClockRate (GHz) 1.137 pciBusID 0000:01:00.0 Total memory: 4.00GiB Free memory: 3.35GiB I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M2000M, pci bus id: 0000:01:00.0) Device mapping: /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Quadro M2000M, pci bus id: 0000:01:00.0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\direct_session.cc:255] Device mapping: /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Quadro M2000M, pci bus id: 0000:01:00.0
and the following error:
a = tf.random_normal((100,100))
b = tf.random_normal((100,500))
c = tf.matmul(a,b)
sess.run(c)
random_normal_1/RandomStandardNormal: (RandomStandardNormal): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/RandomStandardNormal: (RandomStandardNormal)/job:localhost/replica:0/task:0/gpu:0 random_normal_1/mul: (Mul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/mul: (Mul)/job:localhost/replica:0/task:0/gpu:0 random_normal_1: (Add): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1: (Add)/job:localhost/replica:0/task:0/gpu:0 random_normal/RandomStandardNormal: (RandomStandardNormal): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/RandomStandardNormal: (RandomStandardNormal)/job:localhost/replica:0/task:0/gpu:0 random_normal/mul: (Mul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/mul: (Mul)/job:localhost/replica:0/task:0/gpu:0 random_normal: (Add): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal: (Add)/job:localhost/replica:0/task:0/gpu:0 MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0 random_normal_1/stddev: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/stddev: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal_1/mean: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/mean: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal_1/shape: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal_1/shape: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal/stddev: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/stddev: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal/mean: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/mean: (Const)/job:localhost/replica:0/task:0/gpu:0 random_normal/shape: (Const): /job:localhost/replica:0/task:0/gpu:0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] random_normal/shape: (Const)/job:localhost/replica:0/task:0/gpu:0 E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
`
Like @kingtaurus and @menggangmark,
I then copied the cudnn64_5.dll (cuda\bin\cudnn64_5.dll) from that zip archive into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin;
cuda\include\cudnn.h to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include;
and
cuda\lib\x64\cudnn.lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64\
cupti64_80.dll (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\extras\CUPTI\libx64) to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin; and cupti.lib(C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\extras\CUPTI\libx64) to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64
WHERE C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0 is my install PATH for the CUDA toolkit. I had already added C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\ to my PATH
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 38 (6 by maintainers)
@mingrutar - Correction: config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config)
if you are using keras, just put this code before loading the model
Didint work for tensorflow 2.0
Found a different solution
I’ve had this too sometimes, and in my case it was related to GPU memory being used by another process. For instance, a game, NVidia RTX Voice, or just a pycharm debugging session that hadn’t stopped cleanly to free the memory.
You can see the memory status of your nvidia gpu with the command
nvidia-smi
in a command prompt.Restarting your pc will help, closing the other program, or restarting pycharm.
If you’re using PyCharm, don’t enable
tf.enable_eager_execution()
in Python Console while testing your other .py file at the same time.Hello, I just solved the problem coping my code from the E: drive and running it in C:. There seem to be problems to find the path. I hope it helps.
@mingrutar - use this instead: tf.Session(config=tf.ConfigProto(allow_growth=True))
If you are using TensorFlow 2 then change it to
I was running a game in the background. Closing that cleared up the error.
I had a similar issue, what worked for me was freeing up hard disk space which was apparently getting full.
The tests for
tf.matmul()
pass on Windows, so I don’t think this is a Windows-specific issue: perhaps the following lines are a clue to the root cause?I solved my problem with running the program as an administrator
Following as @gussmith23 suggested, a simple reboot did the trick for me.
At first, none of the above worked for me – then I simply restarted my computer, and stopped getting the error (at least, I haven’t gotten it yet). Earlier, I had force-killed a bunch of TF processes – I’m wondering if GPU memory was not freed? TF always prints
at the start of each run, but could this have been wrong, or maybe it’s referring to something else? I really don’t know. Anyway, glad it’s working for now.
I deleted everything and installed them again. Then, I used the code above, and it worked.
I run into a similar problem. The task manager shows 60+% free memory (main memory), gpu has 6.4 GiB free memory when start. Got CUBLAS_STATUS_ALLOC_FAILED. Is there a way to free the GPU memory?