tensorflow: fail to load tensorflow 1.1.0 inside docker container - libcuda.so.1 missing

Hi,

I ran into this bug while trying to upgrade tensorflow version to 1.1.0 Using ‘nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04’ docker image (I also tried other docker images as well).

Note that same code runs fine with tensorflow 1.0.0

Attached is python script that prints tensorflow version (see README.txt for instruction of how to build & run the docker image). fail-to-load-tf-bug.zip

Here is the stack trace for tensorflow 1.1.0:

Traceback (most recent call last):
  File "/run.py", line 3, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 51, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

Here is the logs for tensorflow 1.0.0: (Here it works fine) Please pay attention message ‘Couldn’t open CUDA library libcuda.so.1’

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 522f0c0e9705
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1065] LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1066] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
tensorflow version 1.0.0
tensorflow git version v1.0.0-rc2-15-g47bba63-dirty

Thanks 😃 Erez

P.S: I built and ran the docker image on 2 machines:

MAC
- Docker version 17.03.1-ce, build c6d412e
- no GPU
Ubuntu 14.04.5 LTS
- Docker version 1.13.0, build 49bf474
- GPU info:

$nvidia-smi
Sun May 21 11:11:31 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:05:00.0     Off |                  N/A |
| 22%   57C    P8    31W / 250W |      0MiB / 12206MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 0000:06:00.0     Off |                  N/A |
| 22%   54C    P8    21W / 250W |      0MiB / 12206MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  Off  | 0000:09:00.0     Off |                  N/A |
| 22%   45C    P8    15W / 250W |      0MiB / 12206MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX TIT...  Off  | 0000:0A:00.0     Off |                  N/A |
| 22%   36C    P8    16W / 250W |      0MiB / 12206MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 1
Comments: 20 (7 by maintainers)

Most upvoted comments

To use GPU docker images, you have to use nvidia-docker Did you start your docker containers using nvidia-docker?

gunan on May 22, 2017

Thanks for the response:

I think that there are 3 options here:

Install Cuda on machines (the downside is that it is painful on developer machines)
build GPU and CPU docker images
build one docker image, and install both GPU and CPU tensorflow versions with virtual env

I will have to think about what suits us best…

Thanks guys for your time. Keep up the good work 😃

KashiErez on May 25, 2017

Hi @gunan & @martinwicke,

My advice is to add GPU & CPU support at one tensorflow installation, for 2 reasons:

The work around is very tedious
Almost anyone going to production with Tensorflow will face this issue (regardless of using docker)

@ophiry, do you agree?

Thanks, Erez

KashiErez on Aug 3, 2017

Hi, Thanks for the response.

You are right, running with nvidia-docker works.

But it used to work also with docker (just not using the GPU). Is there any way to support using docker command line?

I want to explain the motivation:

We use CPU for:

inference (serving) in production
train & inference unit tests
train & inference sanity tests
developer machines

We use GPU for:

training in production

Enabling one docker to run both GPU and CPU makes our development cycle simple. Otherwise for each one of our services we will need to maintain 2 docker images - one of CPU and one for GPU.

Thanks 😃 Erez

KashiErez on May 23, 2017

Try this way

version: '2'
services:
  tensorflow:
    image: tensorflow/tensorflow:latest-gpu-py3
    devices:
    - /dev/nvidia0:/dev/nvidia0:rwm
    - /dev/nvidiactl:/dev/nvidiactl:rwm
    - /dev/nvidia-uvm:/dev/nvidia-uvm:rwm
    volumes:
    - /usr/lib/x86_64-linux-gnu/libcuda.so:/usr/lib/x86_64-linux-gnu/libcuda.so
    - /usr/lib/x86_64-linux-gnu/libcuda.so.1:/usr/lib/x86_64-linux-gnu/libcuda.so.1
    - /usr/lib/x86_64-linux-gnu/libcuda.so.384.130:/usr/lib/x86_64-linux-gnu/libcuda.so.384.130
    - /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.130:/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.384.130
    ports:
    - 8889:8888/tcp

hanrw on Jul 23, 2018

Agree - the simplest thing would be to have tf use GPU if it’s avaiable, and CPU if not. of possibly have a configuration variable that can force GPU, force CPU, or do automatic selection

ophiry on Aug 3, 2017

Hi Ophiry,

I found the 3rd solution the best.

This post contains a code snippet of the Dockerfile: http://engineering.taboola.com/deep-learning-from-prototype-to-production

When you want to run on CPU, you have to:

override the docker cmd, and use the CPU venv

When you want to run on GPU, you have to do 2 things:

override the docker cmd, and use the GPU venv
use nvidia-docker command line tool

KashiErez on Aug 3, 2017