tensorflow: TF 2.2: Build failure on Win10 (Bad address issue)
Dear experts,
I am trying to compile Tensorflow 2.2.0 from source on a Win 10 system, including GPU support. My actual goal is compile the dll (but I tried to build the pip package as well and it does not seem to make a difference for this issue). I have been following the instructions on the official website (https://www.tensorflow.org/install/source_windows) as closely as possible, and have started from a completely fresh Windows install. However, I run into the issue that the build, at some point, always aborts with a strange “Bad address” failure of some tool (see below). I have no idea what else to try or how to get a more meaningful hint to the problem. Please help me out with advice here. Thanks in advance!
Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template
System information
-
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 Pro 1809
-
TensorFlow installed from (source or binary): source
-
TensorFlow version: 2.2.0
-
Python version: 3.5.6 (also tried 3.7.7, same issue)
-
Installed using virtualenv? pip? conda?: conda
-
Bazel version (if compiling from source): 2.0.0
-
GCC/Compiler version (if compiling from source): MS Visual Studio Buildtools 2019 (v 14.25.28610)
-
CUDA/cuDNN version: CUDA 10.1 Update 2 / cuDNN 7.6.5.32
-
GPU model and memory: NVidia Tesla K80
-
MSYS2: packges as required in installation instructions of TF, all updated to the latest version
Describe the problem The build aborts with a message like the following:
INFO: Analyzed target //tensorflow:tensorflow.dll (174 packages loaded, 15684 targets configured). INFO: Found 1 target… ERROR: C:/users/admin.ml/_bazel_admin.ml/mamyapdv/external/llvm-project/llvm/BUILD:45:1: Executing genrule @llvm-project//llvm:config_gen failed (Exit 126) /usr/bin/bash: bazel-out/x64_windows-opt/bin/third_party/llvm/expand_cmake_vars.exe: Bad address Target //tensorflow:tensorflow.dll failed to build
During multiple attempts, I saw different executables failing with the “Bad address” issue, so it seems to be related to some non-deterministic behaviour in the tool chain.
Provide the exact sequence of commands / steps that you executed before running into the problem
>python configure.py
You have bazel 2.0.0 installed.
Please specify the location of python. [Default is C:\ProgramData\Miniconda3\python.exe]:
Found possible Python library paths:
C:\ProgramData\Miniconda3\lib\site-packages
Please input the desired Python library path to use. Default is [C:\ProgramData\Miniconda3\lib\site-packages]
Do you wish to build TensorFlow with ROCm support? [y/N]:
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10]: 10.1
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7
Please specify the comma-separated list of base paths to look for CUDA libraries and headers. [Leave empty to use the default]: C:/cuDNN/cuda,C:/cuDNN/cuda/bin,C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1,C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/bin
Found CUDA 10.1 in:
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/lib/x64
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include
Found cuDNN 7 in:
C:/cuDNN/cuda/lib/x64
C:/cuDNN/cuda/include
Please specify a list of comma-separated CUDA compute capabilities you want to build with.
[Default is: 3.5,7.0]:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is /arch:AVX]:
Would you like to override eigen strong inline for some C++ compilation to reduce the compilation time? [Y/n]:
Eigen strong inline overridden.
>bazel build --config=opt --config=cuda --define=no_tensorflow_py_deps=true --copt=-nvcc_options=disable-warnings tensorflow:tensorflow.dll
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 18 (3 by maintainers)
Actually I don’t think it is llvm-related. As I wrote above, I am getting this kind of errors during compilation of all kinds of rules, seemingly at random. For example, I just got:
So I believe part of the build toolchain has a problem, but I have no idea which part. Has anyone ever seen something like this or could give me a hint how to progress in debugging?
I do not think this is related to anything in TF build at all. Could you first confirm that your OS, and msys installations are all 64 bits? Next, could you try just running
git
under msys?If you see the same failure, it may be related to this: https://stackoverflow.com/questions/41699029/cant-run-git-in-git-bash-bash-mingw32-bin-git-bad-address
Ok, I made one more observation after making Bazel print its executed commands. The failing commands so far all seem to start with the following:
C:/msys64/usr/bin/bash.exe -c source external/bazel_tools/tools/genrule/genrule-setup.sh; <some_other_command>
When I try to run this line interactively, it also does not work. The reason is that quotes are missing:
C:/msys64/usr/bin/bash.exe -c "source external/bazel_tools/tools/genrule/genrule-setup.sh"
would be working. Who is responsible for assembling this command? Is it Bazel, or the TF build instructions? And could anybody with a working Windows build confirm that quotes are either used or not required there, for whatever reason? Thanks!