tensorflow: [1.1.0-gpu image] Can't open shared object file libcuda.so.1

Version info

GPU: Nvidia K40 and K80 Docker : 1.12.6 Image tag: 1.1.0-gpu and latest-gpu

Reproduce

I pulled the tensorflow/tensorflow:1.1.0-gpu docker image and I got the error as below when I started to run it:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 51, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 1
  • Comments: 24 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Are you using nvidia-docker? e.g. https://hub.docker.com/r/tensorflow/tensorflow/

Make sure to run with nvidia-docker

Fair enough. The real question is whether or not the GPU install is working or not or if it is a Tensorflow bug. Using some other GPU program like nvidia-smi that is simpler is a common way to separate the problem. LD_DEBUG is a great debugging tool for seeing why shared libraries aren’t loading

LD_DEBUG=libs python -c "import tensorflow"

pipe it to a file and find where it is trying to load libcuda and it will tell you exactly what locations its trying for which libraries.

Maybe google for others that have gotten tenosrflow on kubernetes. https://medium.com/jim-fleming/running-tensorflow-on-kubernetes-ca00d0e67539

Yes, just append --runtime=nvidia to your docker command. The reason is nvidia-docker v1 uses the nvidia-docker alias, where v2 uses docker --runtime=nvidia.

https://hub.docker.com/r/tensorflow/tensorflow/

Are you using nvidia-docker? e.g. https://hub.docker.com/r/tensorflow/tensorflow/

docker run --runtime=nvidia -p xxxxxxx -t tensorflow/serving:1.12.0-gpu