gpu-jupyter: cuDNN missing

Hey,

I’m trying to use your docker container in a kubernetes cluster with kubespawner for jupyterhub. I got everything running except tensorflow with GPU acceleration because of this error from the container log:

“Could not load dynamic library ‘libcudnn.so.7’; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory”

Because I’m new to kubernetes/docker, I don’t know if this library has to be inside the container, mounted from the host filesystem or through c.Spawner.env_keep = [‘LD_LIBRARY_PATH’], which I already tried.

With the localspawner everything is working and every GPU (6 available) visible. The container only get access to one GPU at a time.

But that’s the result:

import tensorflow as tf
tf.config.list_physical_devices() 

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
 PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU'),
 PhysicalDevice(name='/physical_device:XLA_GPU:0', device_type='XLA_GPU')]

So I thought, maybe anyone here can guide me in the right direction.

Thanks in advance

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 21 (8 by maintainers)

Commits related to this issue

Most upvoted comments

This library is essential and required when you want to use the GPU part of tensorflow.

Here are the requirements for tensorflow-gpu (part Software requirements): https://www.tensorflow.org/install/gpu

I guess for a docker container specialized for GPU acceleration, I would recommend to include the cuDNN library.

Greetings

Hi guys,

I have the same problem in my AWS/EKS-Jupyerhub setup. Apart from adding the cuDNN library to the gpulibs it might also be possible to change the base image from nvidia/cuda:10.1-base-ubuntu18.04 to nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04. I couldn’t try it locally due to some strange docker build errors, that’s why I wanted to ask you.

The resulting image will be bigger but tensorflow might work. What do you think?

It seems to work fine. I guess that it is better to fetch directly from nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 as their compatibility would be guaranteed. Do you agree?

Solved with commit e5eb8f6

Use nvidia/cuda:10.1-base-ubuntu18.04 as base image.

Thanks to everyone here! smiley

I agree it’s better to use that as base. Storage isn’t that big a problem but I think crossing 10 puts us into “be careful now” territory (assume you get 25-50GB of cloud storage with an instance, models can easily take up 10-20 even without any datasets).

DNN’s are popular so I agree they should be included by default, but not technically can be unnecessary for running some very sophisticated programs. I ran a bunch of NLP networks that gobbled up 10GB of VRAM (for inference!) and those still weren’t using deep network architecture. Half a gig in image savings is a potentially half a million pages of rich text corpus to train on.

@ph-lp Installing libcudnn7 takes about 900 MB.

Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  libcudnn7
The following NEW packages will be installed:
  libcudnn7 libcudnn7-dev
0 upgraded, 2 newly installed, 0 to remove and 35 not upgraded.
Need to get 355 MB of archives.
After this operation, 892 MB of additional disk space will be used.

Maybe it is a good idea to use the nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 base image as you’ve suggested. I’ll test it today.