tensorflow: tensorflow-gpu pip package is not compatible with cuda9 docker image
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
- TensorFlow installed from (source or binary):
binary (
pip install tensorflow-gpu) - TensorFlow version (use command below): 1.6.0
- Python version: 2.7
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version: CUDA 9, cuDNN 7
- GPU model and memory:
- Exact command to reproduce:
I was trying to build a horovod image, but this would affect anyone using the
nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04base image:
docker build -t horovod https://raw.githubusercontent.com/uber/horovod/master/Dockerfile
docker run -it --rm horovod python tensorflow_mnist.py
Describe the problem
When building a docker image based on nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04 and doing a pip install tensorflow-gpu==1.6.0, the resulting image causes a crash because the base image contains cuDNN 7.1, while the tensorflow-gpu pip package was built against cuDNN 7.0.
Source code / logs
Error messages:
2018-03-08 17:46:50.845206: E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-03-08 17:46:50.845868: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 4
- Comments: 34 (24 by maintainers)
Commits related to this issue
- Add cuDNN 7.1 fix Add downgrade option for cuDNN, workaround for https://github.com/tensorflow/tensorflow/issues/17566 — committed to Luke035/nvidia-anaconda-docker by Luke035 6 years ago
- Pin the version of cuDNN used in Dockerfile.gpu (#17723) Related: #17566 Fixes: #17431 Signed-off-by: Felix Abecassis <fabecassis@nvidia.com> — committed to tensorflow/tensorflow by flx42 6 years ago
- Pin the version of cuDNN used in Dockerfile.gpu (#17723) Related: #17566 Fixes: #17431 Signed-off-by: Felix Abecassis <fabecassis@nvidia.com> — committed to StanislawAntol/tensorflow by flx42 6 years ago
- Downgrade cuDNN to 7.0.5.15-1 See https://github.com/tensorflow/tensorflow/issues/17566 — committed to OpenNMT/nmt-wizard-docker by guillaumekln 6 years ago
If you use docker, I think you have 3 options:
nvidia/cuda:9.0-devel-ubuntu16.04; note this doesn’t have cuDNN), and install cuDNN 7.0 yourself, as I’ve done for horovod (https://github.com/uber/horovod/pull/206).nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04), but downgrade cuDNN to 7.0. You need to doapt-get install --allow-downgrades libcudnn7=7.0.5.15-1+cuda9.0.tensorflow/tensorflow:1.6.0-gpu) as base.If you don’t use docker, just make sure your machine has cuDNN 7.0, not 7.1.
@adampl Installing tensorflow per these instructions (https://www.tensorflow.org/install/) generates the above error. Typing “pip install update” fixes it. I hope this helps!
Thank you for your reply.
I had just solved it by updating Tensorflow. Type “pip install update”
Hello,
I had to rebuild my computer and am now experiencing the one of the errors described in the original post (see below). Is there a recommended workaround?
Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
System information
Hello, I leave my story installing cuda with the problems related to the messages below.
I installed cuDNN v7.0.4 Library for Linux(the oldest version for cuda9.0) (link) like belows. tar xzvf cudnn-9.0-linux-x64-v7.tgz sudo cp cuda/lib64/* /usr/local/cuda-9.0/lib64/ sudo cp cuda/include/* /usr/local/cuda-9.0/include/ sudo chmod a+r /usr/local/cuda-9.0/lib64/libcudnn* sudo chmod a+r /usr/local/cuda-9.0/include/cudnn.h
No Errors, works well.
@cliffwoolley Thank you. I am opening an internal issue and looking for someone to update the code to match your statement in cuDNN.
I think the biggest problem is the “latest” nvidia-docker images are cuda 9.1, cudnn 7.1. And our builds look for cuda 9 and cudnn 7. In our nightlies, or tests the fix is to avoid using “latest” nvidia docker images.
Also, it is too late to change anything for 1.7. RC0 is almost out.
@rongou I implemented your second suggestion in my Dockerfile and I’ve been able to run TF 1.6 along with KERAS 2.15 within the base image nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04. The only thing I had to do was to add a RUN layer in my docker file for executing “apt-get install --allow-downgrades libcudnn7=7.0.5.15-1+cuda9.0”.