tensorflow: Importing TF 2.12, then torch, hangs, but not the other way around
Click to expand!
Issue Type
Bug
Have you reproduced the bug with TF nightly?
Yes
Source
binary
Tensorflow Version
2.12.0
Custom Code
No
OS Platform and Distribution
Linux Ubuntu 20.04.5
Mobile device
No response
Python version
3.8.10
Bazel version
No response
GCC/Compiler version
No response
CUDA/cuDNN version
11.8
GPU model and memory
No response
Current Behaviour?
If I import tensorflow and then import torch, the torch import line hangs forever without completing. On the other hand, if I import torch first and then import tensorflow there is no problem.
The hang is so severe that no amount of ctr-c can kill it. You have to kill the python process from a separate terminal to free the hung terminal.
This issue does not exist in tensorflow 2.11.1 or earlier. It also doesn't happen when using older versions of torch like 1.13.1. Since torch followed by tf works but tf followed by torch doesn't, this seems like an issue tf is causing.
Standalone code to reproduce the issue
docker pull tensorflow/tensorflow:2.12.0-gpu
docker run -it tensorflow/tensorflow:2.12.0-gpu
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html
python
import tensorflow as tf
import torch
Relevant log output
No response
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 6
- Comments: 15 (9 by maintainers)
Commits related to this issue
- Import TF after Torch to avoid deadlock (https://github.com/tensorflow/tensorflow/issues/60109) — committed to LEAT-EDGE/qualia-core by piernov a year ago
The hang comes from ABI incompatibility between TF and Torch. Pytorch has compiled their wheels with
_GLIBCXX_USE_CXX11_ABI=0(which is out-of-date). TensorFlow upgraded to_GLIBCXX_USE_CXX11_ABI=1with the release of TensorFlow 2.9. TF 2.12 + Torch 2.0 is the first pairing where this happens—I am not sure why. TF 2.11 + Torch 2.0, and TF 2.12 + Torch 1.13, are both fine.According to
auditwheel inspect, torch 1.13 -> 2.0 adds a dependency on libgomp.so.1, which seems innocuous. TF 2.11 -> 2.12 added a new shared object dependency on libtensorflow_cc.so.2 with a version called “tensorflow.” It is possible that this was unintended and ultimately exposed this particular breakage; I am not sure. @vam-google and @learning-to-play may be interested in this.You can verify that CXX11 ABI compatibility is (a) problem by checking a wheel compiled explicitly with the CXX11 ABI enabled. I’m exceedingly grateful for https://github.com/pytorch/builder/pull/990, which added CXX11-compatible wheels to Pytorch’s CI system, and you can install them both and compare:
The final command hangs, as expected (you can kill it with Ctrl-D).
@MichaelHudgins FYI
Just to be clear, the TF team’s stance on this is that it’s an issue that needs to be resolved in PyTorch’s builds.
Sure @sachinprasadhs, so starting with the tensorflow docker image to get Cuda 11.8, I then run
pip freeze | xargs pip uninstall -yto clear all the existing dependencies, thenpip install tensorflow. This gives the following dependency list:If I then delete everything and install torch I get:
If I delete everything, then install TF followed by Torch I get:
Comparing this list to the first one, the process of installing torch did not change any of the existing tensorflow installed dependencies, but did add cmake, filelock, Jinja2, lit, mpmath, networkx, Pillow, sympy, triton, and torch/torchvision/torchaudio. Given that there doesn’t seem to be any dependency version conflict, and that the two frameworks do both work simultaneously if torch is imported before tensorflow, I think this is more likely related to something TF 2.12 is doing the python or cuda environment on import.