serving: bazel GPU build error with fatal error: external/nccl_archive/src/nccl.h: No such file or directory
We are trying to build Tensorflow Serving 0.5.1 with TensorFlow 1.0.0@07bb8ea
Basing on CUDA 7.5, cuDNN 5. Bazel 0.4.4
cd serving && bazel build -c opt --config=cuda tensorflow_serving/...
ERROR: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe0160
8c/external/org_tensorflow/tensorflow/contrib/nccl/BUILD:23:1: C++ c
ompilation of rule '@org_tensorflow//tensorflow/contrib/nccl:python/
ops/_nccl_ops.so' failed: crosstool_wrapper_driver_is_not_gcc failed
: error executing command external/local_config_cuda/crosstool/clang
/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTI
FY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-paramete
r ... (remaining 76 argument(s) skipped): com.google.devtools.build.
lib.shell.BadExitStatusException: Process exited with status 1.
In file included from external/org_tensorflow/tensorflow/contrib/ncc
l/kernels/nccl_manager.cc:15:0:
external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager
.h:23:44: fatal error: external/nccl_archive/src/nccl.h: No such fil
e or directory
#include "external/nccl_archive/src/nccl.h"
^
compilation terminated.
INFO: Elapsed time: 147.378s, Critical Path: 107.11s
I’m able to find nccl.h, but it can’t be found during bazel build. Any suggestions? Thanks in advanced.
find / -name nccl.h
/root/.cache/bazel/_bazel_root/5071e8dca1385fb776f72b33971bf157/exte
rnal/nccl_archive/src/nccl.h
/root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/exte
rnal/nccl_archive/src/nccl.h
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 44 (4 by maintainers)
git clone https://github.com/NVIDIA/nccl.git cd nccl/ make CUDA_HOME=/usr/local/cuda
sudo make install sudo mkdir -p /usr/local/include/external/nccl_archive/src sudo ln -s /usr/local/include/nccl.h /usr/local/include/external/nccl_archive/src/nccl.h
Hi, @perdasilva
I have compiled successful tensorflow 1.8 with NCCL2, the problem is that if you have used the deb package to install it on your system, then the package will be splited into different locations:
However Tensorflow configuration needs only one path for the root of this content, that’s why the compilation is not happy.
To solve this you can:
To get around it you can comment out the DEP for nccl in: tensorflow/tensorflow/contrib/BUILD
Line 42 iirc
@skonto removing prefix /external/nccl_archive in files nccl_ops.cc and nccl_manager.h which in folder tensorflow/tensorflow/contrib/nccl/kernels, fix the issue
NVIDIA in times to times change the locations of its packages (because they think its funny) 😃 If you investigate a little, depending on your cuda version some files go to some places others go to another places… I believe NVidia doesn’t have a stable ideia where to put this things exacly and tensorflow cannot enter on their hell.
I solved it by removing the prefix /external/nccl_archive.
65: “//tensorflow/contrib/nccl:nccl_py”,
I believe…
seems that now there’s a
--config=nonccloption you can add to a bazel command, e.g.bazel build --config=opt --config=cuda --config=nonccl //tensorflow/tools/pip_package:build_pip_package(dunno if this will work entirely, but it seems to get me past this error …)Thanks @jlertle.
Thanks, @jlertle
We don’t have any official support for macOS and nccl builds currently, though feel free to file a new issue specifically for macOS, we welcome any community support here!
Wouldn’t it be the right way to tensorflow to just look at the right directories?
/usr/include/is the place for header files in linux, I don’t get why it looks somewhere else…?