tensorflow: fail to load tensorflow 1.1.0 inside docker container - libcuda.so.1 missing
Hi,
I ran into this bug while trying to upgrade tensorflow version to 1.1.0 Using ‘nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04’ docker image (I also tried other docker images as well).
Note that same code runs fine with tensorflow 1.0.0
Attached is python script that prints tensorflow version (see README.txt for instruction of how to build & run the docker image). fail-to-load-tf-bug.zip
Here is the stack trace for tensorflow 1.1.0:
Traceback (most recent call last):
File "/run.py", line 3, in <module>
import tensorflow as tf
File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 51, in <module>
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/install_sources#common_installation_problems
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
Here is the logs for tensorflow 1.0.0: (Here it works fine) Please pay attention message ‘Couldn’t open CUDA library libcuda.so.1’
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 522f0c0e9705
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1065] LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1066] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
tensorflow version 1.0.0
tensorflow git version v1.0.0-rc2-15-g47bba63-dirty
Thanks 😃 Erez
P.S: I built and ran the docker image on 2 machines:
- MAC
- Docker version 17.03.1-ce, build c6d412e
- no GPU
- Ubuntu 14.04.5 LTS
- Docker version 1.13.0, build 49bf474
- GPU info:
$nvidia-smi
Sun May 21 11:11:31 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:05:00.0 Off | N/A |
| 22% 57C P8 31W / 250W | 0MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... Off | 0000:06:00.0 Off | N/A |
| 22% 54C P8 21W / 250W | 0MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX TIT... Off | 0000:09:00.0 Off | N/A |
| 22% 45C P8 15W / 250W | 0MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX TIT... Off | 0000:0A:00.0 Off | N/A |
| 22% 36C P8 16W / 250W | 0MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 1
- Comments: 20 (7 by maintainers)
To use GPU docker images, you have to use
nvidia-dockerDid you start your docker containers usingnvidia-docker?Thanks for the response:
I think that there are 3 options here:
I will have to think about what suits us best…
Thanks guys for your time. Keep up the good work 😃
Hi @gunan & @martinwicke,
My advice is to add GPU & CPU support at one tensorflow installation, for 2 reasons:
@ophiry, do you agree?
Thanks, Erez
Hi, Thanks for the response.
You are right, running with nvidia-docker works.
But it used to work also with docker (just not using the GPU). Is there any way to support using docker command line?
I want to explain the motivation:
We use CPU for:
We use GPU for:
Enabling one docker to run both GPU and CPU makes our development cycle simple. Otherwise for each one of our services we will need to maintain 2 docker images - one of CPU and one for GPU.
Thanks 😃 Erez
Try this way
Agree - the simplest thing would be to have tf use GPU if it’s avaiable, and CPU if not. of possibly have a configuration variable that can force GPU, force CPU, or do automatic selection
Hi Ophiry,
I found the 3rd solution the best.
This post contains a code snippet of the Dockerfile: http://engineering.taboola.com/deep-learning-from-prototype-to-production
When you want to run on CPU, you have to:
When you want to run on GPU, you have to do 2 things: