tensorflow: failed call to cuInit: CUDA_ERROR_UNKNOWN in python programs using Ubuntu bumblebee
I have a Quadro K1100M integrated gpu with compute capability 3.0. I had to install bumblebee to make CUDA work. I am now able to run the tutorials_example_trainer with the command sudo optirun bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu. I have been able to do that with TF_UNOFFICIAL_SETTING=1 ./configure. However, I am not able to run examples in python directly.
For example, if I run the convolutional.py in tensorflow/models/image/mnist with the command optirun python convolutional.py, I get the following error :
tensorflow/tensorflow/models/image/mnist$ optirun python convolutional.py
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 8
E tensorflow/stream_executor/cuda/cuda_driver.cc:466] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:98] retrieving CUDA diagnostic information for host: jp-pc
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:106] hostname: jp-pc
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:131] libcuda reported version is: 352.63
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:242] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.63 Sat Nov 7 21:25:42 PST 2015
GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:135] kernel reported version is: 352.63
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:211] kernel version seems to match DSO: 352.63
I tensorflow/core/common_runtime/gpu/gpu_init.cc:112] DMA:
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8
It is like my gpu is not recognized in python programs because of the 3.0 compute capability. Is there a way to avoid this problem?
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Comments: 59 (12 by maintainers)
I had the same problem, this fixed it:
sudo apt-get install nvidia-modprobeI had the same issue. Simply reboot the computer fixed the problem for me 😃 Suggestion: do not suspend your computer (which caused the problem in my case)
It’s worth adding that
sudo apt-get install nvidia-modprobefixed it for me too even though I had already installed it on a previous session directly before installing Tensorflow.Same for me. Does anyone know why this fixes it?
I ran into this issue recently. I upgraded my nvidia-driver to version 375.26 and docker to version Docker version 1.13.0. When training a network I would get the
cuInit: CUDA_ERROR_UNKNOWNThe problem here is that cuda fails to initiate the ‘shared GPU context’. For some reason, the ‘nvidia-cuda-mps-control’ service is not active after the upgrade. I need to investigate more.
However, try running
nvidia-cuda-mps-serverin the host machine. This solved it for me.In my case, nvidia-modprobe was installed and the paths were correct. What solved was running the commands here https://devtalk.nvidia.com/default/topic/760872/ubuntu-12-04-error-cudagetdevicecount-returned-30/
Especially, running following: $ sudo modinfo nvidia-<driver_version_num>-uvm (with driver_version_num as 384 in my case) $ sudo modprobe --force-modversion nvidia-331-uvm
Hope this helps.
This is not working for me … GPU is available and works until i put the computer to sleep/suspend after waking up the computer i always get the message below and GPU is unavailable when i run code (only CPU is available).
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWNI am using nvidia-docker
nvidia-docker run -it -p 8888:8888 -v /*..../Data/docker:/docker --name TensorFlow gcr.io/tensorflow/tensorflow:latest-gpu /bin/bashand none of the solutions above work. Nvidia-smi and nvidia-debugdump -l both show the GPU is installed and driver is up to date.
No, I still can’t fix with the nvidia-modprobe method.
sudo apt-get install nvidia-modprobe, this is magic
I had met this error too when the computer wakes up after hibernation. Run 1_Utilities/deviceQuery example which will tell whether CUDA card is available or not.
@girving: as discussed offline, fixing that.
Yeah, CUDA_ERROR_UNKNOWN is not very helpful. hopefully @zheng-xq might know more what’s going on here
@LeeKyungMoon Reboot only works for me, without intalling
nvidia-modprobeas @WeitaoVan said.sudo apt-get install nvidia-modprobe works for me, with a restart.
Could you run
nvidia-debugdump -lornvidia-smiand paste the output? I had a similar problem and in the end it was a lack of power for the graphics card.` def ja():
`
@prasad3130 Thanks a lot, that worked like a charm. Although it is worth noting that the command should be the following sudo modprobe --force-modversion nvidia-<nvidia-version>-uvm