tensorflow: Error on compiling from source
System information
-
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux archlinux 5.3.11-arch1-1 x86_64 GNU/Linux
-
TensorFlow installed from (source or binary): source
-
TensorFlow version: commit hash: 872b1ab23f0aac182d5b2051f45d5d003963bfe3
-
Python version: 3.7.5
-
Installed using virtualenv? pip? conda?: conda
-
Bazel version (if compiling from source): bazel 0.29.1- (@non-git)
-
GCC/Compiler version (if compiling from source): gcc (GCC) 9.2.0
-
CUDA/cuDNN/TensorRT version: 10.1.243-2/7.6.4.38-1/6.0.1.5-1
-
GPU model and memory: nVidia RTX 2080 8GB
Describe the problem Compiling with bazel fails: ERROR: /home/jaaq/.cache/bazel/_bazel_jaaq/c463894dd2648fc5b64eeed02cc022b5/external/grpc/BUILD:507:1: C++ compilation of rule ‘@grpc//:gpr_base’ failed (Exit 1)
Provide the exact sequence of commands / steps that you executed before running into the problem
git clone repo
cd tensorflow
source /opt/anaconda/bin/activate
conda activate python375env
./configure (Yes on XLA JIT, CUDA, TensorRT, clang)
bazel build //tensorflow/tools/pip_package:build_pip_package
Any other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. ERROR: /home/jaaq/.cache/bazel/_bazel_jaaq/c463894dd2648fc5b64eeed02cc022b5/external/grpc/BUILD:507:1: C++ compilation of rule ‘@grpc//:gpr_base’ failed (Exit 1) external/grpc/src/core/lib/gpr/log_linux.cc:43:13: error: ambiguating new declaration of ‘long int gettid()’ static long gettid(void) { return syscall(__NR_gettid); } ^~~~~~ In file included from /usr/include/unistd.h:1170, from external/grpc/src/core/lib/gpr/log_linux.cc:41: /usr/include/bits/unistd_ext.h:34:16: note: old declaration ‘__pid_t gettid()’ extern __pid_t gettid (void) __THROW; ^~~~~~ external/grpc/src/core/lib/gpr/log_linux.cc:43:13: warning: ‘long int gettid()’ defined but not used [-Wunused-function] static long gettid(void) { return syscall(__NR_gettid); } ^~~~~~ Target //tensorflow/tools/pip_package:build_pip_package failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 89.815s, Critical Path: 22.19s INFO: 1215 processes: 1215 local. FAILED: Build did NOT complete successfully
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 5
- Comments: 15 (1 by maintainers)
The grpc.patch should be applied in tensorflow/bazel-tensorflow/external/grpc/src after the bazel build started the download.
Or apply it in the .cache (P.S. not working if applied in the cache)
I’m with @IceCryptonym: clearer directions would be greatly appreciated!
Perhaps the source has changed, but I don’t see any linked patch file or even much mention of patching at the page described by @shantanu-gontia (https://gist.github.com/kmhofmann/e368a2ebba05f807fa1a90b3bf9a1e03). Nor does the Arch-Linux repo mentioned by @Mithrandir2k18 seem to make any mention of
gprc, rather it’s all aboutmkl.But I followed @shantanu-gontia’s instructions, and have posted a patch file here: https://gist.github.com/drscotthawley/8eb51af1b4c92c4f18432cb045698af7
It can be applied by going to the main grpc directory and running
Whats still not clear is where this should be applied, i.e. where does Bazel put
grpcand how I can apply the patch after Bazel puts it there? I seetensorflow/third_party/grpc/but it’s empty except for a zero-length file calledBUILD. There’s alsotensorflow/tensorflow/contrib/cmake/patches/grpc/, but that only contains the filerand.hFinally, just putting the patch file inthird_party/doesn’t seem to cause the patch to be applied.EDIT: Seems Bazel puts it in
~/.cache/bazel/_bazel_($USER)/(big_long_random_directory_name)/external/grpc, but this is not agitdirectory sogit applywon’t work. So…still unclear on how to apply the patch reliably and automatically within the Bazel build.I have ran into this issue as well and I am not sure how to apply the patch. Would someone be able to give me some incite?
same here
Arch Linux
The error seems to be due to the same symbol definitions (of
gettid) introduced in GCC 9.0+ extant in grpc. https://github.com/grpc/grpc/issues/20043The folks at grpc have patched the problem in newer releases. https://github.com/grpc/grpc/pull/20048
However, during the build process, TensorFlow downloads a cached version of an older commit of grpc from the TensorFlow mirror. At this point, I do not know which commit from the grpc repository fixes the issue in particular (I tried the latest one, it has some problem with
upb_proto_library). Therefore, the best course of action, as I found from other sources (https://gist.github.com/kmhofmann/e368a2ebba05f807fa1a90b3bf9a1e03) is to patch the grpc commit downloaded using a patch file.If the patch in https://gist.github.com/kmhofmann/e368a2ebba05f807fa1a90b3bf9a1e03 doesn’t work I would suggest cloning the grpc source from github, checkout the commit
4566c2a29ebec0835643b972eb99f4306c4234a3. Edit the following files -src/core/lib/gpr/log_linux.ccsrc/core/lib/gpr/log_posix.ccsrc/core/lib/iomgr/ev_epollex_linux.ccJust change every instance of the term
gettidtosys_gettidin these three files and generate a patch for yourself usinggit diffand use that patch.