tensorflow: failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED on a AWS p2.xlarge instance

Hi,

I have been running docker images on a Centos 7.0 AWS p2.xlarge instance. I have previously installed on it: CUDA: cuda-repo-rhel7-8.0.44-1.x86_64.rpm NVIDIA drivers 361.42

I have also installed nvidia-docker following instructions

I have successfully run all notebooks from Docker images (as fas as I’ve tried tensorflow/tensorflow:latest-devel-gpu and tensorflow/tensorflow:latest-gpu):

Running tensorflow version within the docker containter: 0.11.0rc2 Bazel version: Build label: 0.3.2 root@de73edc73418:~# nvidia-smi -l Wed Nov 2 12:02:54 2016
±-----------------------------------------------------+
| NVIDIA-SMI 361.42 Driver Version: 361.42 |
|-------------------------------±---------------------±---------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 0000:00:1E.0 Off | 0 | | N/A 57C P0 70W / 149W | 10948MiB / 11519MiB | 0% Default | ±------------------------------±---------------------±---------------------+

However when I try to launch a Single GPU computing example with tensorflow and get the following error:

I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_3: /job:localhost/replica:0/task:0/gpu:0 MatMul_4: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_4: /job:localhost/replica:0/task:0/gpu:0 MatMul_5: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_5: /job:localhost/replica:0/task:0/gpu:0 MatMul_6: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_6: /job:localhost/replica:0/task:0/gpu:0 MatMul_7: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_7: /job:localhost/replica:0/task:0/gpu:0 MatMul_8: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_8: /job:localhost/replica:0/task:0/gpu:0 MatMul_9: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] MatMul_9: /job:localhost/replica:0/task:0/gpu:0 AddN: /job:localhost/replica:0/task:0/cpu:0 I tensorflow/core/common_runtime/simple_placer.cc:819] AddN: /job:localhost/replica:0/task:0/cpu:0 E tensorflow/stream_executor/cuda/cuda_blas.cc:367] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED W tensorflow/stream_executor/stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support E tensorflow/stream_executor/cuda/cuda_blas.cc:367] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED W tensorflow/stream_executor/stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support

Not sure if is something related to Nvidia drivers, OS or some library mismatch. Any idea?

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 25 (6 by maintainers)

Most upvoted comments

if you’re still having trouble, try adding /usr/local/cuda/extras/CUPTI/lib64 to your LD_LIBRARY_PATH. I had the same error and this fixed it (I was on mac though, so verify that directory on your system)

On mac it was /usr/local/cuda/extras/CUPTI/lib

Maybe the following command helps:

sudo rm -rf .nv/

Good luck.

I ran into this problem when I’m running https://github.com/davidsandberg/facenet/ inside docker image tensorflow/tensorflow:latest-gpu-py3.

However, the jupyter can run without any problem.

UPDATE:

After I set per_process_gpu_memory_fraction from 1 to 0.5, this error gone.

        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=args.gpu_memory_fraction)
        sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options, log_device_placement=True))

It seems if set per_process_gpu_memory_fraction to 1 it will require the entire GPU memory, which will fail because my Xorg, chrome have already used part of them.

  • Can you run TensorFlow binary from outside the Docker container?
  • CUBLAS_STATUS_NOT_INITIALIZED is a classic CUDA runtime not setup properly problem. It’s best to remove the complication of docker if you can.
  • Avoid using latest tags, instead use specific versions. You can also try nightly if you want a bleeding edge version, but you probably want to name or tag the container id once you get a working config if you are using nightly.

@hzxie A very good suggestion! In my case, it is sudo rm -rf ~/.nv/