tensorflow: Importing TF 2.12, then torch, hangs, but not the other way around

Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

Yes

Source

binary

Tensorflow Version

2.12.0

Custom Code

OS Platform and Distribution

Linux Ubuntu 20.04.5

Mobile device

No response

Python version

3.8.10

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

11.8

GPU model and memory

No response

Current Behaviour?

If I import tensorflow and then import torch, the torch import line hangs forever without completing. On the other hand, if I import torch first and then import tensorflow there is no problem.

The hang is so severe that no amount of ctr-c can kill it. You have to kill the python process from a separate terminal to free the hung terminal. 

This issue does not exist in tensorflow 2.11.1 or earlier. It also doesn't happen when using older versions of torch like 1.13.1. Since torch followed by tf works but tf followed by torch doesn't, this seems like an issue tf is causing.

Standalone code to reproduce the issue

docker pull tensorflow/tensorflow:2.12.0-gpu
docker run -it tensorflow/tensorflow:2.12.0-gpu

pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html

python

import tensorflow as tf
import torch

Relevant log output

No response

About this issue

Original URL
State: open
Created a year ago
Reactions: 6
Comments: 15 (9 by maintainers)

Commits related to this issue

Import TF after Torch to avoid deadlock (https://github.com/tensorflow/tensorflow/issues/60109) — committed to LEAT-EDGE/qualia-core by piernov a year ago

Most upvoted comments

The hang comes from ABI incompatibility between TF and Torch. Pytorch has compiled their wheels with _GLIBCXX_USE_CXX11_ABI=0 (which is out-of-date). TensorFlow upgraded to _GLIBCXX_USE_CXX11_ABI=1 with the release of TensorFlow 2.9. TF 2.12 + Torch 2.0 is the first pairing where this happens—I am not sure why. TF 2.11 + Torch 2.0, and TF 2.12 + Torch 1.13, are both fine.

According to auditwheel inspect, torch 1.13 -> 2.0 adds a dependency on libgomp.so.1, which seems innocuous. TF 2.11 -> 2.12 added a new shared object dependency on libtensorflow_cc.so.2 with a version called “tensorflow.” It is possible that this was unintended and ultimately exposed this particular breakage; I am not sure. @vam-google and @learning-to-play may be interested in this.

You can verify that CXX11 ABI compatibility is (a) problem by checking a wheel compiled explicitly with the CXX11 ABI enabled. I’m exceedingly grateful for https://github.com/pytorch/builder/pull/990, which added CXX11-compatible wheels to Pytorch’s CI system, and you can install them both and compare:

docker run -it --rm python:3.10 bash
wget https://download.pytorch.org/whl/cpu/torch-2.0.0%2Bcpu-cp310-cp310-linux_x86_64.whl
wget https://download.pytorch.org/whl/cpu-cxx11-abi/torch-2.0.0%2Bcpu.cxx11.abi-cp310-cp310-linux_x86_64.whl
pip install tensorflow-cpu
pip install torch-2.0.0+cpu.cxx11.abi-cp310-cp310-linux_x86_64.whl
python -c "import tensorflow; import torch"
pip uninstall torch
pip install torch-2.0.0+cpu-cp310-cp310-linux_x86_64.whl
python -c "import tensorflow; import torch"

The final command hangs, as expected (you can kill it with Ctrl-D).

@MichaelHudgins FYI

Just to be clear, the TF team’s stance on this is that it’s an issue that needs to be resolved in PyTorch’s builds.

angerson on May 2, 2023

Sure @sachinprasadhs, so starting with the tensorflow docker image to get Cuda 11.8, I then run pip freeze | xargs pip uninstall -y to clear all the existing dependencies, then pip install tensorflow. This gives the following dependency list:

Package                      Version
---------------------------- ---------
absl-py                      1.4.0
astunparse                   1.6.3
cachetools                   5.3.0
certifi                      2022.12.7
charset-normalizer           3.1.0
flatbuffers                  23.3.3
gast                         0.4.0
google-auth                  2.17.1
google-auth-oauthlib         1.0.0
google-pasta                 0.2.0
grpcio                       1.53.0
h5py                         3.8.0
idna                         3.4
importlib-metadata           6.1.0
jax                          0.4.8
keras                        2.12.0
libclang                     16.0.0
Markdown                     3.4.3
MarkupSafe                   2.1.2
ml-dtypes                    0.0.4
numpy                        1.23.5
oauthlib                     3.2.2
opt-einsum                   3.3.0
packaging                    23.0
pip                          23.0.1
protobuf                     4.22.1
pyasn1                       0.4.8
pyasn1-modules               0.2.8
requests                     2.28.2
requests-oauthlib            1.3.1
rsa                          4.9
scipy                        1.10.1
setuptools                   67.6.0
six                          1.16.0
tensorboard                  2.12.1
tensorboard-data-server      0.7.0
tensorboard-plugin-wit       1.8.1
tensorflow                   2.12.0
tensorflow-estimator         2.12.0
tensorflow-io-gcs-filesystem 0.32.0
termcolor                    2.2.0
typing_extensions            4.5.0
urllib3                      1.26.15
Werkzeug                     2.2.3
wheel                        0.40.0
wrapt                        1.14.1
zipp                         3.15.0

If I then delete everything and install torch I get:

Package            Version
------------------ ------------
certifi            2022.12.7
charset-normalizer 3.1.0
cmake              3.26.1
filelock           3.10.7
idna               3.4
Jinja2             3.1.2
lit                16.0.0
MarkupSafe         2.1.2
mpmath             1.3.0
networkx           3.0
numpy              1.24.2
Pillow             9.4.0
pip                23.0.1
requests           2.28.2
setuptools         67.6.0
sympy              1.11.1
torch              2.0.0+cu118
torchaudio         2.0.1+cu118
torchvision        0.15.1+cu118
triton             2.0.0
typing_extensions  4.5.0
urllib3            1.26.15
wheel              0.40.0

If I delete everything, then install TF followed by Torch I get:

Package                      Version
---------------------------- ------------
absl-py                      1.4.0
astunparse                   1.6.3
cachetools                   5.3.0
certifi                      2022.12.7
charset-normalizer           3.1.0
cmake                        3.26.1
filelock                     3.10.7
flatbuffers                  23.3.3
gast                         0.4.0
google-auth                  2.17.1
google-auth-oauthlib         1.0.0
google-pasta                 0.2.0
grpcio                       1.53.0
h5py                         3.8.0
idna                         3.4
importlib-metadata           6.1.0
jax                          0.4.8
Jinja2                       3.1.2
keras                        2.12.0
libclang                     16.0.0
lit                          16.0.0
Markdown                     3.4.3
MarkupSafe                   2.1.2
ml-dtypes                    0.0.4
mpmath                       1.3.0
networkx                     3.0
numpy                        1.23.5
oauthlib                     3.2.2
opt-einsum                   3.3.0
packaging                    23.0
Pillow                       9.4.0
pip                          23.0.1
protobuf                     4.22.1
pyasn1                       0.4.8
pyasn1-modules               0.2.8
requests                     2.28.2
requests-oauthlib            1.3.1
rsa                          4.9
scipy                        1.10.1
setuptools                   67.6.0
six                          1.16.0
sympy                        1.11.1
tensorboard                  2.12.1
tensorboard-data-server      0.7.0
tensorboard-plugin-wit       1.8.1
tensorflow                   2.12.0
tensorflow-estimator         2.12.0
tensorflow-io-gcs-filesystem 0.32.0
termcolor                    2.2.0
torch                        2.0.0+cu118
torchaudio                   2.0.1+cu118
torchvision                  0.15.1+cu118
triton                       2.0.0
typing_extensions            4.5.0
urllib3                      1.26.15
Werkzeug                     2.2.3
wheel                        0.40.0
wrapt                        1.14.1
zipp                         3.15.0

Comparing this list to the first one, the process of installing torch did not change any of the existing tensorflow installed dependencies, but did add cmake, filelock, Jinja2, lit, mpmath, networkx, Pillow, sympy, triton, and torch/torchvision/torchaudio. Given that there doesn’t seem to be any dependency version conflict, and that the two frameworks do both work simultaneously if torch is imported before tensorflow, I think this is more likely related to something TF 2.12 is doing the python or cuda environment on import.

TortoiseHam on Mar 31, 2023