tensorflow: tensorflow-gpu pip package is not compatible with cuda9 docker image

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): binary (pip install tensorflow-gpu)
  • TensorFlow version (use command below): 1.6.0
  • Python version: 2.7
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: CUDA 9, cuDNN 7
  • GPU model and memory:
  • Exact command to reproduce: I was trying to build a horovod image, but this would affect anyone using the nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04 base image:
docker build -t horovod https://raw.githubusercontent.com/uber/horovod/master/Dockerfile
docker run -it --rm horovod python tensorflow_mnist.py

Describe the problem

When building a docker image based on nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04 and doing a pip install tensorflow-gpu==1.6.0, the resulting image causes a crash because the base image contains cuDNN 7.1, while the tensorflow-gpu pip package was built against cuDNN 7.0.

Source code / logs

Error messages:

2018-03-08 17:46:50.845206: E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-03-08 17:46:50.845868: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 

@flx42

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 4
  • Comments: 34 (24 by maintainers)

Commits related to this issue

Most upvoted comments

If you use docker, I think you have 3 options:

  • Use the cuda base image (e.g. nvidia/cuda:9.0-devel-ubuntu16.04; note this doesn’t have cuDNN), and install cuDNN 7.0 yourself, as I’ve done for horovod (https://github.com/uber/horovod/pull/206).
  • Use the cuda+cudnn base image (e.g. nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04), but downgrade cuDNN to 7.0. You need to do apt-get install --allow-downgrades libcudnn7=7.0.5.15-1+cuda9.0.
  • Use Tensorflow’s docker image (tensorflow/tensorflow:1.6.0-gpu) as base.

If you don’t use docker, just make sure your machine has cuDNN 7.0, not 7.1.

@adampl Installing tensorflow per these instructions (https://www.tensorflow.org/install/) generates the above error. Typing “pip install update” fixes it. I hope this helps!

Thank you for your reply.

I had just solved it by updating Tensorflow. Type “pip install update”

Hello,

I had to rebuild my computer and am now experiencing the one of the errors described in the original post (see below). Is there a recommended workaround?

Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 16.04
  • TensorFlow installed from (source or binary):pip3 install --upgrade tensorflow-gpu
  • TensorFlow version (use command below):1.6.0
  • Python version: Python 3.5.2
  • Bazel version (if compiling from source):no
  • GCC/Compiler version (if compiling from source):gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  • CUDA/cuDNN version:CUDA 9, cuDNN v7.1.1 Library for Linux
  • GPU model and memory:1070

Hello, I leave my story installing cuda with the problems related to the messages below.

2018-03-13 10:19:33.118216: E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-03-13 10:19:33.118929: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
Aborted

I installed cuDNN v7.0.4 Library for Linux(the oldest version for cuda9.0) (link) like belows. tar xzvf cudnn-9.0-linux-x64-v7.tgz sudo cp cuda/lib64/* /usr/local/cuda-9.0/lib64/ sudo cp cuda/include/* /usr/local/cuda-9.0/include/ sudo chmod a+r /usr/local/cuda-9.0/lib64/libcudnn* sudo chmod a+r /usr/local/cuda-9.0/include/cudnn.h

No Errors, works well.

@cliffwoolley Thank you. I am opening an internal issue and looking for someone to update the code to match your statement in cuDNN.

I think the biggest problem is the “latest” nvidia-docker images are cuda 9.1, cudnn 7.1. And our builds look for cuda 9 and cudnn 7. In our nightlies, or tests the fix is to avoid using “latest” nvidia docker images.

Also, it is too late to change anything for 1.7. RC0 is almost out.

@rongou I implemented your second suggestion in my Dockerfile and I’ve been able to run TF 1.6 along with KERAS 2.15 within the base image nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04. The only thing I had to do was to add a RUN layer in my docker file for executing “apt-get install --allow-downgrades libcudnn7=7.0.5.15-1+cuda9.0”.