tensorflow: failed call to cuInit: CUDA_ERROR_UNKNOWN after Docker build on Macbook Pro (Late 2013) with Linux

This issue is similar to to #394. I believe i’m seeing it because tensorflow can’t find libcuda.so.

I know that my libcuda.so can be found in /usr/lib/x86_64-linux-gnu/, but is this accessible to the compiler and to python without further configuration? I tried adding it to my docker user’s $LD_LIBRARY_PATH, but that did not fix the problem.

I’m running on:

Macbook Pro (Late 2013) Hardware
Mint 17
3.0 Compute Capability
CUDA 7.0 & cuDNN 2.0
running docker service under a docker user & group. no sudo access for docker user.

on host os, values for $CUDA_SO and $DEVICES that are passed into ./docker_run_gpu.sh are:

$ export CUDA_SO=$(\ls /usr/lib/x86_64-linux-gnu/libcuda* | xargs -I{} echo '-v {}:{}')
$ export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')

$ echo $CUDA_SO
-v /usr/lib/x86_64-linux-gnu/libcuda.so:/usr/lib/x86_64-linux-gnu/libcuda.so -v /usr/lib/x86_64-linux-gnu/libcuda.so.1:/usr/lib/x86_64-linux-gnu/libcuda.so.1 -v /usr/lib/x86_64-linux-gnu/libcuda.so.352.68:/usr/lib/x86_64-linux-gnu/libcuda.so.352.68

$ echo $DEVICES
--device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl

I customized Dockerfile.devel-gpu as below. I prepended TF_CUDA_COMPUTE_CAPABILITIES=3.0 and TF_UNOFFICIAL_SETTING=1 for the ./configure step.

Should I add the libcuda.so path to ENV LD_LIBRARY_PATH?

# ........

RUN git clone --recursive https://github.com/tensorflow/tensorflow.git && \
    cd tensorflow && \
    git checkout 0.6.0
WORKDIR /tensorflow

# Configure the build for our CUDA configuration.
ENV CUDA_TOOLKIT_PATH /usr/local/cuda
ENV CUDNN_INSTALL_PATH /usr/local/cuda
ENV TF_NEED_CUDA 1

RUN TF_CUDA_COMPUTE_CAPABILITIES=3.0 TF_UNOFFICIAL_SETTING=1 ./configure && \
    bazel build -c opt --config=cuda tensorflow/tools/pip_package:build_pip_package && \
    bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip && \
    pip install --upgrade /tmp/pip/tensorflow-*.whl

WORKDIR /root

# Set up CUDA variables
ENV CUDA_PATH /usr/local/cuda
ENV LD_LIBRARY_PATH /usr/local/cuda/lib64

# TensorBoard
EXPOSE 6006
# IPython
EXPOSE 8888

RUN ["/bin/bash"]

Then I built the image and ran it with ./docker_run_gpu.sh tf/tf

Started tensorflow with: python -m tensorflow.models.image.mnist.convolutional

And I’m getting failed call to cuInit: CUDA_ERROR_UNKNOWN when tensorflow starts up. But this is after reporting that it successfully opened CUDA library libcuda.so locally.

I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 8
modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.19.0-32-generic/modules.dep.bin'
E tensorflow/stream_executor/cuda/cuda_driver.cc:481] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:114] retrieving CUDA diagnostic information for host: 24b008aee65f
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:121] hostname: 24b008aee65f
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:146] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:257] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.68  Tue Dec  1 17:24:11 PST 2015
GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

About this issue

Original URL
State: closed
Created 9 years ago
Comments: 23 (7 by maintainers)

Commits related to this issue

Merge pull request #601 from panyx0718/master Explicitly set state_is_tuple=False. — committed to tarasglek/tensorflow by panyx0718 8 years ago
Merge pull request #601 from ROCmSoftwarePlatform/develop-upstream-sync-190812 Develop upstream sync 190812 — committed to darkbuck/tensorflow by whchung 5 years ago

Most upvoted comments

I had the same problem with running tensorflow on a Ubuntu machine after I upgraded my driver to 352.63 and 352.93. (I remember it works with 346.* but when I try to install 346., it installs 352. automatically for some reason).

I finally figured out that it’s caused by permission issue. (I can run it with root) So, I changed the permission of the libcuda.so.352-63 file to executable by anyone and it works well now.

Hope this will be helpful to those still struggling with this issue.

I didn’t try the docker one, but I guess it’s also caused by permission setting.

PhoenixDai on Jun 27, 2016