tensorflow: failed call to cuInit: CUDA_ERROR_UNKNOWN in python programs using Ubuntu bumblebee

I have a Quadro K1100M integrated gpu with compute capability 3.0. I had to install bumblebee to make CUDA work. I am now able to run the tutorials_example_trainer with the command sudo optirun bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu. I have been able to do that with TF_UNOFFICIAL_SETTING=1 ./configure. However, I am not able to run examples in python directly.

For example, if I run the convolutional.py in tensorflow/models/image/mnist with the command optirun python convolutional.py, I get the following error :

tensorflow/tensorflow/models/image/mnist$ optirun python convolutional.py 
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 8
E tensorflow/stream_executor/cuda/cuda_driver.cc:466] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:98] retrieving CUDA diagnostic information for host: jp-pc
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:106] hostname: jp-pc
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:131] libcuda reported version is: 352.63
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:242] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.63  Sat Nov  7 21:25:42 PST 2015
GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) 
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:135] kernel reported version is: 352.63
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:211] kernel version seems to match DSO: 352.63
I tensorflow/core/common_runtime/gpu/gpu_init.cc:112] DMA: 
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8

It is like my gpu is not recognized in python programs because of the 3.0 compute capability. Is there a way to avoid this problem?

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 59 (12 by maintainers)

Most upvoted comments

I had the same problem, this fixed it: sudo apt-get install nvidia-modprobe

I had the same issue. Simply reboot the computer fixed the problem for me 😃 Suggestion: do not suspend your computer (which caused the problem in my case)

It’s worth adding that sudo apt-get install nvidia-modprobe fixed it for me too even though I had already installed it on a previous session directly before installing Tensorflow.

Same for me. Does anyone know why this fixes it?

I ran into this issue recently. I upgraded my nvidia-driver to version 375.26 and docker to version Docker version 1.13.0. When training a network I would get the

cuInit: CUDA_ERROR_UNKNOWN

The problem here is that cuda fails to initiate the ‘shared GPU context’. For some reason, the ‘nvidia-cuda-mps-control’ service is not active after the upgrade. I need to investigate more.

However, try running nvidia-cuda-mps-server in the host machine. This solved it for me.

In my case, nvidia-modprobe was installed and the paths were correct. What solved was running the commands here https://devtalk.nvidia.com/default/topic/760872/ubuntu-12-04-error-cudagetdevicecount-returned-30/

Especially, running following: $ sudo modinfo nvidia-<driver_version_num>-uvm (with driver_version_num as 384 in my case) $ sudo modprobe --force-modversion nvidia-331-uvm

Hope this helps.

This is not working for me … GPU is available and works until i put the computer to sleep/suspend after waking up the computer i always get the message below and GPU is unavailable when i run code (only CPU is available).

E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN

I am using nvidia-docker

nvidia-docker run -it -p 8888:8888 -v /*..../Data/docker:/docker --name TensorFlow gcr.io/tensorflow/tensorflow:latest-gpu /bin/bash

and none of the solutions above work. Nvidia-smi and nvidia-debugdump -l both show the GPU is installed and driver is up to date.

No, I still can’t fix with the nvidia-modprobe method.

$ python -m tensorflow.models.image.mnist.convolutional /usr/bin/python: libcudart.so.7.5: cannot open shared object file: No such file or directory

sudo apt-get install nvidia-modprobe, this is magic

I had met this error too when the computer wakes up after hibernation. Run 1_Utilities/deviceQuery example which will tell whether CUDA card is available or not.

@girving: as discussed offline, fixing that.

Yeah, CUDA_ERROR_UNKNOWN is not very helpful. hopefully @zheng-xq might know more what’s going on here

@LeeKyungMoon Reboot only works for me, without intalling nvidia-modprobe as @WeitaoVan said.

sudo apt-get install nvidia-modprobe works for me, with a restart.

Could you run nvidia-debugdump -l or nvidia-smi and paste the output? I had a similar problem and in the end it was a lack of power for the graphics card.

` def ja():

`

@prasad3130 Thanks a lot, that worked like a charm. Although it is worth noting that the command should be the following sudo modprobe --force-modversion nvidia-<nvidia-version>-uvm