serving: Docker with GPU failed call to cuInit: CUresult(-1)

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 17.10
  • TensorFlow Serving installed from (source or binary): binary
  • TensorFlow Serving version: 1.9
  • Docker version: 18.03.1-ce
  • Nvidia docker version: 2.0.3

Describe the problem

I’m attempting to run a tensorflow serving in a container which needs GPU. When I’m starting the container and use it I don’t see the process in host of nvidia-smi Looking at the log I saw few weird issues.

Exact Steps to Reproduce

This is a simple example which shows the same error docker run -p 8501:8501 \ -v /tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_three:/models/half_plus_three \ -e MODEL_NAME=half_plus_three -t tensorflow/serving:1.9.0-devel-gpu \ tensorflow_model_server \ --port=8500 \ --rest_api_port=8501 \ --model_name=half_plus_three \ --model_base_path=/models/half_plus_three

Source code / logs

2018-07-26 05:55:57.044214: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-07-26 05:55:57.044874: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:397] failed call to cuInit: CUresult(-1) 2018-07-26 05:55:57.045256: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 3
  • Comments: 17

Most upvoted comments

run rm /usr/local/cuda/lib64/stubs/libcuda.so.1 fixed my problem

A good way to test if the GPU drivers of your container are setup correctly before you start building the model server is this script, which should return the details of your video card without any dependencies: https://gist.github.com/f0k/63a664160d016a491b2cbea15913d549

I spent the last day debugging the same error in a similar configuration (Ubuntu 16.04, TFServing 1.9, Tesla P100). The GPU worked fine in tensorflow/tensorflow. Running in tensorflow/serving:nightly-devel-gpu fixed the problem.

https://github.com/tensorflow/serving/commit/4cbac38c307ea11527d0e45a3b18fd41f1b67601#diff-5442e32f8ca43e5ee752e24804404913

docker run -p 8501:8501 \ -v

You need to use nvidia-docker to run the GPU build.