serving: Docker with GPU failed call to cuInit: CUresult(-1)
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 17.10
- TensorFlow Serving installed from (source or binary): binary
- TensorFlow Serving version: 1.9
- Docker version: 18.03.1-ce
- Nvidia docker version: 2.0.3
Describe the problem
I’m attempting to run a tensorflow serving in a container which needs GPU.
When I’m starting the container and use it I don’t see the process in host of nvidia-smi
Looking at the log I saw few weird issues.
Exact Steps to Reproduce
This is a simple example which shows the same error
docker run -p 8501:8501 \ -v /tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_three:/models/half_plus_three \ -e MODEL_NAME=half_plus_three -t tensorflow/serving:1.9.0-devel-gpu \ tensorflow_model_server \ --port=8500 \ --rest_api_port=8501 \ --model_name=half_plus_three \ --model_base_path=/models/half_plus_three
Source code / logs
2018-07-26 05:55:57.044214: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-07-26 05:55:57.044874: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:397] failed call to cuInit: CUresult(-1) 2018-07-26 05:55:57.045256: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 3
- Comments: 17
run
rm /usr/local/cuda/lib64/stubs/libcuda.so.1fixed my problemA good way to test if the GPU drivers of your container are setup correctly before you start building the model server is this script, which should return the details of your video card without any dependencies: https://gist.github.com/f0k/63a664160d016a491b2cbea15913d549
I spent the last day debugging the same error in a similar configuration (Ubuntu 16.04, TFServing 1.9, Tesla P100). The GPU worked fine in tensorflow/tensorflow. Running in tensorflow/serving:nightly-devel-gpu fixed the problem.
https://github.com/tensorflow/serving/commit/4cbac38c307ea11527d0e45a3b18fd41f1b67601#diff-5442e32f8ca43e5ee752e24804404913
You need to use nvidia-docker to run the GPU build.