tensorflow: Build with CUDA support fails with GCC >= 10.3
System information
- OS Platform and Distribution: Ubuntu Linux 21.04
- TensorFlow installed from (source or binary): source
- TensorFlow version: v2.5.0-rc2
- Python version: 3.9
- Bazel version (if compiling from source): 3.7.2
- GCC/Compiler version (if compiling from source): 10.3
- CUDA/cuDNN version: 11.2 / 8.2
Describe the problem
Building tensorflow with CUDA support with GCC 10.3 fails with the following error:
/usr/include/c++/10/chrono:428:27: internal compiler error: Segmentation fault
428 | _S_gcd(intmax_t __m, intmax_t __n) noexcept
| ^~~~~~
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
Apparently, this is a regression starting with GCC 10.3 (default compiler on Ubuntu 21.04) when using gcc in conjunction with nvcc. Here is the upstream bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100102
Installing and using gcc-9 as NVCC host compiler in configure
still works.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 17 (5 by maintainers)
Commits related to this issue
- upgpkg: gcc10 1:10.2.0-1: Go back to 10.2 because of segfaults in 10.3 See also https://github.com/tensorflow/tensorflow/issues/48890 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100102 git-svn-... — committed to archlinux/svntogit-community by svenstaro 3 years ago
- upgpkg: gcc10 1:10.2.0-1: Go back to 10.2 because of segfaults in 10.3 See also https://github.com/tensorflow/tensorflow/issues/48890 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100102 git-svn-... — committed to archlinux/svntogit-community by svenstaro 3 years ago
The GCC project has committed a patch:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=5357ab75dedef403b0eebf9277d61d1cbeb5898f (in response to the problem report https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100102)
Got a successful build with the following environment:
@sanjoy Yes, probably a pure GCC issue. No suggestions on how to handle this on the Tensorflow end other than monitoring what’s happening upstream. A warning emitted by the Tensorflow build for known-bad compiler versions would be nice, but I don’t know how much work this is. Could be worthwhile though since there is no telling when we’ll get a fix in GCC and at which point that patch is applied in linux-distribution-of-your-choice (if at all).