tensorflow: Installation broken - Tensorflow 2.1 Cuda 10.1 Ubuntu 18.04
Having followed the installation guide for GPU support multiple times, each time starting from a blank Ubuntu 18.04 LTS instance it breaks when installing cuda 10-1 after installing the driver and rebooting.
See guide for Ubuntu 18.04 + Cuda 10.1: https://www.tensorflow.org/install/gpu Cuda 10.1 is the version with which Tensorflow 2.1 is compiled and therefore Cuda 10.1 needs to be installed.
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04 LTS
- TensorFlow installed from (source or binary): Tensorflow 2.1 binary
- TensorFlow version: 2.1
- Python version: 3.6
- Installed using virtualenv? pip? conda?: virtualenv
- CUDA/cuDNN version: 10.1
- GPU model and memory: Tesla T4
Having restarted the machine and confirmed that the driver recognises the GPU: The installation guide for GPU support breaks at this section:
sudo apt-get install --no-install-recommends \
cuda-10-1 \
libcudnn7=7.6.4.38-1+cuda10.1 \
libcudnn7-dev=7.6.4.38-1+cuda10.1
Which results in:
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
cuda-10-1 : Depends: cuda-runtime-10-1 (>= 10.1.243) but it is not going to be installed
Depends: cuda-demo-suite-10-1 (>= 10.1.243) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
Having searched stackoverflow, github and the Tensorflow website, it seems that the dependencies list can not be installed.
Again, I have rerun this installation multiple times on a blank machine, without running anything else before trying to run this installation.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 31
- Comments: 15 (3 by maintainers)
— Possible solution here differing from above driver problem — A few comments from my side, as I’ve been struggling with a similar problem with a slightly different root cause: Couldn’t install cuda-10-1, but nvidia-driver installation went well for me (TF-nightly (2.2), Ubuntu 18.04, GTX 1660Ti).
python=3.6pip install tf-nightly(installs 2.2.0 at this time of writing)This worked well for me as opposed to for @igorhoogerwoord as the 418 driver is automatically fetched as 430.50. For anyone who’s interested: After a reboot, open the Ubuntu “software update” settings, go to the tab “additional drivers”, and see whether the correct version is selected (430 should be selected). In a terminal,
nvidia-smishould say:| NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 |This is why I think the official instructions are okay here. Verification comes from this fact:Trying to install the depends just lead to hair loss.
All of these went well. Notice I did them all in clear separate steps.
conda install cudatoolkit=10.1.243Following this, my tensorflow programs run just fine!Nonetheless, the manual needs a major overhaul. It says “The following NVIDIA® software must be installed on your system” right before the Linux manual where everything is installed again. It just confuses the user where to start.
ADDENDUM You might notice that installing the packages in step 6 actually installs some CUDA 10.2 related things, particularly
libcublas. You can check this by typing:I had problems with this version mix before, so I decided to clean this up for once and for all:
The version numbers are a mess, 10.2.2 is CUDA 10.2 and 10.2.1 is CUDA 10.1. Downgrading libcublas will also trigger the cuda-license-10-2 package to become obsolete.
Hi @mihaimaruseac
10.1 is indeed the CUDA version that I am installing, because the current documentation does refer to that version for Tensorflow 2.1: https://www.tensorflow.org/install/gpu
It turns out the Nvidia driver needs to be 430 instead of 418 to make it work on Linux Ubuntu 18.04.
I think the documentation needs to be updated to reflect the new driver version right? Or do you mean to say something else with your comment?
Update 2:
When trying to reinstall
sudo apt-get install --no-install-recommends nvidia-driver-418from the guide:It appears that Nvidia driver 418 depends on 430. When trying to install nvidia-driver 430 I ran into the same problem of many broken packages which would not be installed.
After successfully installing nvidia-driver-430 by going down the rabit hole of broken packages and installing each individiually, nvidia-smi output:
After this testing Tensorflow 2.1 with list devices does seem to work:
Does Tensorflow 2.1 require nvidia-driver-430 instead of 418 and should the documentation be updated?