tensorflow: Installation broken - Tensorflow 2.1 Cuda 10.1 Ubuntu 18.04

Having followed the installation guide for GPU support multiple times, each time starting from a blank Ubuntu 18.04 LTS instance it breaks when installing cuda 10-1 after installing the driver and rebooting.

See guide for Ubuntu 18.04 + Cuda 10.1: https://www.tensorflow.org/install/gpu Cuda 10.1 is the version with which Tensorflow 2.1 is compiled and therefore Cuda 10.1 needs to be installed.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04 LTS
  • TensorFlow installed from (source or binary): Tensorflow 2.1 binary
  • TensorFlow version: 2.1
  • Python version: 3.6
  • Installed using virtualenv? pip? conda?: virtualenv
  • CUDA/cuDNN version: 10.1
  • GPU model and memory: Tesla T4

Having restarted the machine and confirmed that the driver recognises the GPU: The installation guide for GPU support breaks at this section:

sudo apt-get install --no-install-recommends \
    cuda-10-1 \
    libcudnn7=7.6.4.38-1+cuda10.1  \
    libcudnn7-dev=7.6.4.38-1+cuda10.1

Which results in:

Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 cuda-10-1 : Depends: cuda-runtime-10-1 (>= 10.1.243) but it is not going to be installed
             Depends: cuda-demo-suite-10-1 (>= 10.1.243) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

Having searched stackoverflow, github and the Tensorflow website, it seems that the dependencies list can not be installed.

Again, I have rerun this installation multiple times on a blank machine, without running anything else before trying to run this installation.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 31
  • Comments: 15 (3 by maintainers)

Most upvoted comments

Possible solution here differing from above driver problem — A few comments from my side, as I’ve been struggling with a similar problem with a slightly different root cause: Couldn’t install cuda-10-1, but nvidia-driver installation went well for me (TF-nightly (2.2), Ubuntu 18.04, GTX 1660Ti).

  1. Starting from a CLEAN MACHINE (delete your ./cache/pip/ if you have to).
  2. Created and activated conda environment with python=3.6
  3. pip install tf-nightly (installs 2.2.0 at this time of writing)
  4. Then followed the entire block in the Tensorflow GPU manual with no problems:
# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update
  1. Next block:
# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-418
# Reboot. Check that GPUs are visible using the command: nvidia-smi

This worked well for me as opposed to for @igorhoogerwoord as the 418 driver is automatically fetched as 430.50. For anyone who’s interested: After a reboot, open the Ubuntu “software update” settings, go to the tab “additional drivers”, and see whether the correct version is selected (430 should be selected). In a terminal, nvidia-smi should say: | NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 | This is why I think the official instructions are okay here. Verification comes from this fact:

$ sudo apt-get install nvidia-driver-418
nvidia-driver-418 is already the newest version (430.50-0ubuntu0.18.04.2).
  1. But then, next step from the manual:
# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends cuda-10-1
cuda-10-1 : Depends: cuda-runtime-10-1 (>= 10.1.243) but it is not going to be installed
            Depends: cuda-demo-suite-10-1 (>= 10.1.243) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

Trying to install the depends just lead to hair loss.

  1. Anyways, left it at that and continued with the rest of the manual:
sudo apt-get install --no-install-recommends libcudnn7=7.6.4.38-1+cuda10.1
sudo apt-get install --no-install-recommends libcudnn7-dev=7.6.4.38-1+cuda10.1

# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1
sudo apt-get install -y --no-install-recommends libnvinfer-dev=6.0.1-1+cuda10.1
sudo apt-get install -y --no-install-recommends libnvinfer-plugin6=6.0.1-1+cuda10.1

All of these went well. Notice I did them all in clear separate steps.

  1. But how to install CUDA? After hours of cursing I decided to solve it dirtily: conda install cudatoolkit=10.1.243 Following this, my tensorflow programs run just fine!

Nonetheless, the manual needs a major overhaul. It says “The following NVIDIA® software must be installed on your system” right before the Linux manual where everything is installed again. It just confuses the user where to start.

ADDENDUM You might notice that installing the packages in step 6 actually installs some CUDA 10.2 related things, particularly libcublas. You can check this by typing:

dpkg -l | grep libcublas
libcublas-dev       10.2.2.89-1       amd64    CUBLAS native dev links, headers
libcublas10         10.2.2.89-1       amd64    CUBLAS native runtime libraries

I had problems with this version mix before, so I decided to clean this up for once and for all:

sudo apt-get install libcublas-dev=10.2.1.243-1
sudo apt-get install libcublas10=10.2.1.243-1

The version numbers are a mess, 10.2.2 is CUDA 10.2 and 10.2.1 is CUDA 10.1. Downgrading libcublas will also trigger the cuda-license-10-2 package to become obsolete.

Hi @mihaimaruseac

10.1 is indeed the CUDA version that I am installing, because the current documentation does refer to that version for Tensorflow 2.1: https://www.tensorflow.org/install/gpu

It turns out the Nvidia driver needs to be 430 instead of 418 to make it work on Linux Ubuntu 18.04.

I think the documentation needs to be updated to reflect the new driver version right? Or do you mean to say something else with your comment?

Update 2:

When trying to reinstall sudo apt-get install --no-install-recommends nvidia-driver-418 from the guide:

The following packages have unmet dependencies:
 nvidia-driver-418 : Depends: nvidia-driver-430 but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

It appears that Nvidia driver 418 depends on 430. When trying to install nvidia-driver 430 I ran into the same problem of many broken packages which would not be installed.

After successfully installing nvidia-driver-430 by going down the rabit hole of broken packages and installing each individiually, nvidia-smi output:

nvidia-smi
Mon Jan 27 13:14:10 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   53C    P0    28W /  70W |      0MiB / 15109MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

After this testing Tensorflow 2.1 with list devices does seem to work:

Python 3.6.9 (default, Nov  7 2019, 10:44:02) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-01-27 13:15:54.758171: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-01-27 13:15:54.760036: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
>>> tf.config.list_physical_devices('GPU')
2020-01-27 13:16:08.210038: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-01-27 13:16:08.908023: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-27 13:16:08.908709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2020-01-27 13:16:08.908758: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-01-27 13:16:08.908799: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-01-27 13:16:08.911539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-01-27 13:16:08.912234: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-01-27 13:16:08.914737: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-01-27 13:16:08.916157: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-01-27 13:16:08.916220: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-01-27 13:16:08.916306: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-27 13:16:08.916993: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-27 13:16:08.917573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Does Tensorflow 2.1 require nvidia-driver-430 instead of 418 and should the documentation be updated?