tensorflow: Eager Execution error: Blas GEMM launch failed
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
- Have I written custom code: no
- OS Platform and Distribution: Linux Ubuntu 16.04
- TensorFlow installed from (source or binary): pip3 install tensorflow-gpu
- TensorFlow version (use command below): v1.12.0-0-ga6d8ffae09 1.12.0
- Python version: 3.5.2
- CUDA/cuDNN version: CUDA 9.0, cudnn 7.4.2
- GPU model and memory: GeForce RTX 2080 Ti
Describe the current behavior Crashes with error “Blas GEMM launch failed”
Describe the expected behavior Correctly print the matmul result
Code to reproduce the issue I was trying to use eager execution. I tried the following simple code
import tensorflow as tf
tf.enable_eager_execution()
print(tf.matmul([[1., 2.],[3., 4.]], [[1., 2.],[3., 4.]]))
Other eager mode code under at https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/eager/python/examples fails with the same error.
However, non eager mode code can correctly run.
Other info / logs output
2019-01-31 17:00:20.744826: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-01-31 17:00:21.150735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:17:00.0
totalMemory: 10.73GiB freeMemory: 9.36GiB
2019-01-31 17:00:21.399702: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:65:00.0
totalMemory: 10.73GiB freeMemory: 10.53GiB
2019-01-31 17:00:21.399746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1
2019-01-31 17:00:21.906842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-31 17:00:21.906877: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1
2019-01-31 17:00:21.906882: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y
2019-01-31 17:00:21.906886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N
2019-01-31 17:00:21.907143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9026 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:17:00.0, compute capability: 7.5)
2019-01-31 17:00:21.907488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10167 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:65:00.0, compute capability: 7.5)
2019-01-31 17:00:22.144957: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "test.py", line 5, in <module>
print(tf.matmul([[1., 2.],[3., 4.]], [[1., 2.],[3., 4.]]))
File "/home/weixu/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 2057, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/home/weixu/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4586, in mat_mul
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(2, 2), b.shape=(2, 2), m=2, n=2, k=2 [Op:MatMul] name: MatMul/
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 44 (7 by maintainers)
Can you please kill all running notebooks which utilize your GPU. Then restart the kernel and execute the code again?
I’ve just copied cublas64_10.dll to cublas64_100.dll, and it worked 😃
The following what I found,
Also, IF you have RTX 20 series and CUDA 10 you must put this in your code
@Edremelech Thank you! After renaming cublas64_10.dll to cublas64_100.dll, my program runs as well.
Issue solved with tf-nightly-gpu and CUDA 10
IF running an RTX series check first if TF2 is using Cuda 10 Then call set_memory_growth and done!
Think you for your reply,where do you find cublas64_10.dll and cublas64_100,dll?I do not meet it.
My problem is solved. Tensorflow failed to load cublas64_100.dll because it was called cublas64_10.dll. I am simply shocked to encounter such errors. Anyway, thanks everybody, and I hope my stupid messages will help another newb that can’t believe in DLL names change 😃
I am almost a hundred percent sure the GPU truly running out of memory. It doesn’t even run for one epoch.
The same code, same dataset, same everything, runs fine in 1080ti, the GPUs of collab, 1080, 1070.
My feeling tells me something is wrong with CUDA 9.0 on 2080 ti or eager on 2080ti with CUDA 9.0.