tensorflow: tensorflow-gpu pip package is not compatible with cuda9 docker image

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
TensorFlow installed from (source or binary): binary (pip install tensorflow-gpu)
TensorFlow version (use command below): 1.6.0
Python version: 2.7
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: CUDA 9, cuDNN 7
GPU model and memory:
Exact command to reproduce: I was trying to build a horovod image, but this would affect anyone using the nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04 base image:

docker build -t horovod https://raw.githubusercontent.com/uber/horovod/master/Dockerfile
docker run -it --rm horovod python tensorflow_mnist.py

Describe the problem

When building a docker image based on nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04 and doing a pip install tensorflow-gpu==1.6.0, the resulting image causes a crash because the base image contains cuDNN 7.1, while the tensorflow-gpu pip package was built against cuDNN 7.0.

Source code / logs

Error messages:

2018-03-08 17:46:50.845206: E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-03-08 17:46:50.845868: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)

@flx42

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 4
Comments: 34 (24 by maintainers)

Commits related to this issue

Add cuDNN 7.1 fix Add downgrade option for cuDNN, workaround for https://github.com/tensorflow/tensorflow/issues/17566 — committed to Luke035/nvidia-anaconda-docker by Luke035 6 years ago
Pin the version of cuDNN used in Dockerfile.gpu (#17723) Related: #17566 Fixes: #17431 Signed-off-by: Felix Abecassis <fabecassis@nvidia.com> — committed to tensorflow/tensorflow by flx42 6 years ago
Pin the version of cuDNN used in Dockerfile.gpu (#17723) Related: #17566 Fixes: #17431 Signed-off-by: Felix Abecassis <fabecassis@nvidia.com> — committed to StanislawAntol/tensorflow by flx42 6 years ago
Downgrade cuDNN to 7.0.5.15-1 See https://github.com/tensorflow/tensorflow/issues/17566 — committed to OpenNMT/nmt-wizard-docker by guillaumekln 6 years ago

Most upvoted comments

If you use docker, I think you have 3 options:

Use the cuda base image (e.g. nvidia/cuda:9.0-devel-ubuntu16.04; note this doesn’t have cuDNN), and install cuDNN 7.0 yourself, as I’ve done for horovod (https://github.com/uber/horovod/pull/206).
Use the cuda+cudnn base image (e.g. nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04), but downgrade cuDNN to 7.0. You need to do apt-get install --allow-downgrades libcudnn7=7.0.5.15-1+cuda9.0.
Use Tensorflow’s docker image (tensorflow/tensorflow:1.6.0-gpu) as base.

If you don’t use docker, just make sure your machine has cuDNN 7.0, not 7.1.

+15

rongou on Mar 12, 2018

@adampl Installing tensorflow per these instructions (https://www.tensorflow.org/install/) generates the above error. Typing “pip install update” fixes it. I hope this helps!

+15

wdma on Mar 12, 2018

Thank you for your reply.

I had just solved it by updating Tensorflow. Type “pip install update”

wdma on Mar 12, 2018

Hello,

I had to rebuild my computer and am now experiencing the one of the errors described in the original post (see below). Is there a recommended workaround?

Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.

wdma on Mar 9, 2018

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 16.04
TensorFlow installed from (source or binary):pip3 install --upgrade tensorflow-gpu
TensorFlow version (use command below):1.6.0
Python version: Python 3.5.2
Bazel version (if compiling from source):no
GCC/Compiler version (if compiling from source):gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
CUDA/cuDNN version:CUDA 9, cuDNN v7.1.1 Library for Linux
GPU model and memory:1070

Hello, I leave my story installing cuda with the problems related to the messages below.

2018-03-13 10:19:33.118216: E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-03-13 10:19:33.118929: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
Aborted

I installed cuDNN v7.0.4 Library for Linux(the oldest version for cuda9.0) (link) like belows. tar xzvf cudnn-9.0-linux-x64-v7.tgz sudo cp cuda/lib64/* /usr/local/cuda-9.0/lib64/ sudo cp cuda/include/* /usr/local/cuda-9.0/include/ sudo chmod a+r /usr/local/cuda-9.0/lib64/libcudnn* sudo chmod a+r /usr/local/cuda-9.0/include/cudnn.h

No Errors, works well.

ghost on Mar 13, 2018

@cliffwoolley Thank you. I am opening an internal issue and looking for someone to update the code to match your statement in cuDNN.

tfboyd on Mar 13, 2018

I think the biggest problem is the “latest” nvidia-docker images are cuda 9.1, cudnn 7.1. And our builds look for cuda 9 and cudnn 7. In our nightlies, or tests the fix is to avoid using “latest” nvidia docker images.

Also, it is too late to change anything for 1.7. RC0 is almost out.

gunan on Mar 13, 2018

@rongou I implemented your second suggestion in my Dockerfile and I’ve been able to run TF 1.6 along with KERAS 2.15 within the base image nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04. The only thing I had to do was to add a RUN layer in my docker file for executing “apt-get install --allow-downgrades libcudnn7=7.0.5.15-1+cuda9.0”.

Luke035 on Mar 13, 2018