tensorflow: failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED on a AWS p2.xlarge instance

Hi,

I have been running docker images on a Centos 7.0 AWS p2.xlarge instance. I have previously installed on it: CUDA: cuda-repo-rhel7-8.0.44-1.x86_64.rpm NVIDIA drivers 361.42

I have also installed nvidia-docker following instructions

I have successfully run all notebooks from Docker images (as fas as I’ve tried tensorflow/tensorflow:latest-devel-gpu and tensorflow/tensorflow:latest-gpu):

However when I try to launch a Single GPU computing example with tensorflow and get the following error:

I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_3: /job:localhost/replica:0/task:0/gpu:0 MatMul_4: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_4: /job:localhost/replica:0/task:0/gpu:0 MatMul_5: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_5: /job:localhost/replica:0/task:0/gpu:0 MatMul_6: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_6: /job:localhost/replica:0/task:0/gpu:0 MatMul_7: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_7: /job:localhost/replica:0/task:0/gpu:0 MatMul_8: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_8: /job:localhost/replica:0/task:0/gpu:0 MatMul_9: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_9: /job:localhost/replica:0/task:0/gpu:0 AddN: /job:localhost/replica:0/task:0/cpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] AddN: /job:localhost/replica:0/task:0/cpu:0 E tensorflow/stream_executor/cuda/cuda_blas.cc:367] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED W tensorflow/stream_executor/stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support E tensorflow/stream_executor/cuda/cuda_blas.cc:367] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED W tensorflow/stream_executor/stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support

Not sure if is something related to Nvidia drivers, OS or some library mismatch. Any idea?

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 25 (6 by maintainers)

Most upvoted comments

if you’re still having trouble, try adding /usr/local/cuda/extras/CUPTI/lib64 to your LD_LIBRARY_PATH. I had the same error and this fixed it (I was on mac though, so verify that directory on your system)

On mac it was /usr/local/cuda/extras/CUPTI/lib

+66

yufengg on Jan 21, 2017

Maybe the following command helps:

sudo rm -rf .nv/

Good luck.

+50

hzxie on Nov 24, 2017

I ran into this problem when I’m running https://github.com/davidsandberg/facenet/ inside docker image tensorflow/tensorflow:latest-gpu-py3.

However, the jupyter can run without any problem.

UPDATE:

After I set per_process_gpu_memory_fraction from 1 to 0.5, this error gone.

        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=args.gpu_memory_fraction)
        sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options, log_device_placement=True))

It seems if set per_process_gpu_memory_fraction to 1 it will require the entire GPU memory, which will fail because my Xorg, chrome have already used part of them.

huan on Jun 2, 2017

Can you run TensorFlow binary from outside the Docker container?
CUBLAS_STATUS_NOT_INITIALIZED is a classic CUDA runtime not setup properly problem. It’s best to remove the complication of docker if you can.
Avoid using latest tags, instead use specific versions. You can also try nightly if you want a bleeding edge version, but you probably want to name or tag the container id once you get a working config if you are using nightly.

aselle on Nov 2, 2016

@hzxie A very good suggestion! In my case, it is sudo rm -rf ~/.nv/

zhenglilei on May 6, 2018