keras-yolo3: failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED

When I train voc data, the error happened. My GPU is RTX2080 8G * 2，tensorflow-gpu:1.12，keras2.2.4

Epoch 1/50 2019-01-28 00:16:00.441512: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "train.py", line 192, in <module> _main(annotation_path=anno) File "train.py", line 65, in _main callbacks=[logging, checkpoint]) File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator initial_epoch=initial_epoch) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 217, in fit_generator class_weight=class_weight) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1217, in train_on_batch outputs = self.train_function(ins) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__ return self._call(inputs) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(*array_vals) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1439, in __call__ run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=346112, n=32, k=64 [[{{node conv2d_3/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@batch_normalization_3/cond/FusedBatchNorm/Switch"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](leaky_re_lu_2/LeakyRelu, conv2d_3/kernel/read)]] [[{{node yolo_loss/while_1/LoopCond/_2963}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6607_yolo_loss/while_1/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_1/strided_slice_1/stack_2/_2805)]]

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 36

Most upvoted comments

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

But my tensorflow is 1.15, cuda is 10.0, gpu is RTX 3080, still have the same issue.

+18

mfshiu on Dec 2, 2020

hi @mfshiu, NVIDIA maintains its own version of tensorflow 1.15 here: https://github.com/NVIDIA/tensorflow#install , which support latest gpu card.

So, you need to remove official tensorflow which installed through pip or conda, and install nvidia’s version, as its README.md says:

install the NVIDIA wheel index:

$ pip install --user nvidia-pyindex

install the current NVIDIA Tensorflow release:

$ pip install --user nvidia-tensorflow[horovod]

after installed, just use it as regular tensorflow:

import tensorflow as tf

+15

allenyllee on Dec 4, 2020

Problem fixed after installed !pip install nvidia-pyindex !pip install nvidia-tensorflow

+11

serdarildercaglar on Jul 16, 2022

I had the same problem with an RTX 3090 + TF 1.15. I resolved my problem by using the official nvidia+tf1 ngc docker container, available here: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow

GuillaumeMougeot on Apr 8, 2021

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

yuanzhedong on Dec 4, 2019

Resolved this issue for myself: Be sure you’re running Python 3.8 and Pip 20 or later.

drscotthawley on Feb 3, 2021

I had the same problem with an RTX 3090 + TF 1.15. I resolved my problem by using the official nvidia+tf1 ngc docker container, available here: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow

It works very well to me, in my case with RTX 3090 +TF 1.15, nvidia+tf1 ngc docker container version ‘21.05-tf1-py3’ works very well! Thanks alot.

seongyeop-jeong-poey on Jun 2, 2021

i also met same error, my gpu is RTX 2080ti, tensorflow-gpu 1.8.0, cuda 9.0, but in the GTX 1080ti, tensorflow-gpu 1.4.0, cuda 8.0, the program can run normally. Can someone give some advice? thanks

S0soo on Apr 9, 2019

Yes! Yes!!! Remove official tensorflow. Python3.8

pip install nvidia-pyindex
pip install nvidia-tensorflow

I used A6000, tf1.15, cuda10.0.130, cudnn7.3.1, and TF website let me use python 3.6 or 3.7, that’s what I did before. But!!! For using nvidia-pyindex and nvidia-tensorflow, I need to change python to 3.8. And I succeed!!!

Guo986 on Jan 29, 2023

hey @mfshiu maybe you can try cuda 10.0 with tensorflow-gpu 1.14

kartikwar on Dec 2, 2020

Hey bro, have you figured it out? I met the same issue.

ShuteLee on Mar 18, 2019