gpu-jupyter: cuDNN missing
Hey,
I’m trying to use your docker container in a kubernetes cluster with kubespawner for jupyterhub. I got everything running except tensorflow with GPU acceleration because of this error from the container log:
“Could not load dynamic library ‘libcudnn.so.7’; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory”
Because I’m new to kubernetes/docker, I don’t know if this library has to be inside the container, mounted from the host filesystem or through c.Spawner.env_keep = [‘LD_LIBRARY_PATH’], which I already tried.
With the localspawner everything is working and every GPU (6 available) visible. The container only get access to one GPU at a time.
But that’s the result:
import tensorflow as tf
tf.config.list_physical_devices()
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:0', device_type='XLA_GPU')]
So I thought, maybe anyone here can guide me in the right direction.
Thanks in advance
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 21 (8 by maintainers)
Commits related to this issue
- closes #25 by adding libcudnn7-dev — committed to mathematicalmichael/gpu-jupyter by mathematicalmichael 4 years ago
- Merge pull request #26 from mathematicalmichael/patch-2 closes #25 by adding libcudnn7-dev Thank you all! — committed to iot-salzburg/gpu-jupyter by ChristophSchranz 4 years ago
This library is essential and required when you want to use the GPU part of tensorflow.
Here are the requirements for tensorflow-gpu (part Software requirements): https://www.tensorflow.org/install/gpu
I guess for a docker container specialized for GPU acceleration, I would recommend to include the cuDNN library.
Greetings
It seems to work fine. I guess that it is better to fetch directly from
nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04as their compatibility would be guaranteed. Do you agree?I agree it’s better to use that as base. Storage isn’t that big a problem but I think crossing 10 puts us into “be careful now” territory (assume you get 25-50GB of cloud storage with an instance, models can easily take up 10-20 even without any datasets).
DNN’s are popular so I agree they should be included by default, but not technically can be unnecessary for running some very sophisticated programs. I ran a bunch of NLP networks that gobbled up 10GB of VRAM (for inference!) and those still weren’t using deep network architecture. Half a gig in image savings is a potentially half a million pages of rich text corpus to train on.
@ph-lp Installing
libcudnn7takes about 900 MB.Maybe it is a good idea to use the
nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04base image as you’ve suggested. I’ll test it today.