gpu-jupyter: cuDNN missing

Hey,

I’m trying to use your docker container in a kubernetes cluster with kubespawner for jupyterhub. I got everything running except tensorflow with GPU acceleration because of this error from the container log:

“Could not load dynamic library ‘libcudnn.so.7’; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory”

Because I’m new to kubernetes/docker, I don’t know if this library has to be inside the container, mounted from the host filesystem or through c.Spawner.env_keep = [‘LD_LIBRARY_PATH’], which I already tried.

With the localspawner everything is working and every GPU (6 available) visible. The container only get access to one GPU at a time.

But that’s the result:

import tensorflow as tf
tf.config.list_physical_devices() 

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
 PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU'),
 PhysicalDevice(name='/physical_device:XLA_GPU:0', device_type='XLA_GPU')]

So I thought, maybe anyone here can guide me in the right direction.

Thanks in advance

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 21 (8 by maintainers)

Commits related to this issue

closes #25 by adding libcudnn7-dev — committed to mathematicalmichael/gpu-jupyter by mathematicalmichael 4 years ago
Merge pull request #26 from mathematicalmichael/patch-2 closes #25 by adding libcudnn7-dev Thank you all! — committed to iot-salzburg/gpu-jupyter by ChristophSchranz 4 years ago

Most upvoted comments

This library is essential and required when you want to use the GPU part of tensorflow.

Here are the requirements for tensorflow-gpu (part Software requirements): https://www.tensorflow.org/install/gpu

I guess for a docker container specialized for GPU acceleration, I would recommend to include the cuDNN library.

Greetings

galitus on Aug 27, 2020

Hi guys,

I have the same problem in my AWS/EKS-Jupyerhub setup. Apart from adding the cuDNN library to the gpulibs it might also be possible to change the base image from nvidia/cuda:10.1-base-ubuntu18.04 to nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04. I couldn’t try it locally due to some strange docker build errors, that’s why I wanted to ask you.

The resulting image will be bigger but tensorflow might work. What do you think?

It seems to work fine. I guess that it is better to fetch directly from nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 as their compatibility would be guaranteed. Do you agree?

ChristophSchranz on Sep 1, 2020

Solved with commit e5eb8f6

Use nvidia/cuda:10.1-base-ubuntu18.04 as base image.

Thanks to everyone here! smiley

I agree it’s better to use that as base. Storage isn’t that big a problem but I think crossing 10 puts us into “be careful now” territory (assume you get 25-50GB of cloud storage with an instance, models can easily take up 10-20 even without any datasets).

DNN’s are popular so I agree they should be included by default, but not technically can be unnecessary for running some very sophisticated programs. I ran a bunch of NLP networks that gobbled up 10GB of VRAM (for inference!) and those still weren’t using deep network architecture. Half a gig in image savings is a potentially half a million pages of rich text corpus to train on.

mathematicalmichael on Sep 2, 2020

@ph-lp Installing libcudnn7 takes about 900 MB.

Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  libcudnn7
The following NEW packages will be installed:
  libcudnn7 libcudnn7-dev
0 upgraded, 2 newly installed, 0 to remove and 35 not upgraded.
Need to get 355 MB of archives.
After this operation, 892 MB of additional disk space will be used.

Maybe it is a good idea to use the nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 base image as you’ve suggested. I’ll test it today.

ChristophSchranz on Sep 1, 2020