tensorflow: Missing input file mpi:mpio.h
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
- OS Platform and Distribution: Linux Ubuntu 16.04)
- TensorFlow installed from: source
- TensorFlow version: master
- Python version: 3.5.2
- Bazel version (if compiling from source): 0.5.2
- CUDA/cuDNN version: 8.0
- GPU model and memory: K80
- Exact command to reproduce:
#!/usr/bin/env bash
# Only the compilation step for tensorflow is in this script, for clarity.
git clone https://github.com/tensorflow/tensorflow
cd ./tensorflow
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-8.0/targets/x86_64-linux/lib:/usr/lib/x86_64-linux-gnu/
export PYTHON_BIN_PATH="/home/ubuntu/anaconda3/bin/python"
export PYTHON_LIB_PATH="/home/ubuntu/anaconda3/lib/python3.6/site-packages"
export TF_NEED_JEMALLOC=1
export TF_NEED_GCP=0
export TF_NEED_HDFS=0
export TF_ENABLE_XLA=0
export TF_NEED_VERBS=0
export TF_NEED_OPENCL=0
export TF_NEED_CUDA=1
export TF_CUDA_CLANG=0
export TF_NEED_MPI=1
export MPI_HOME="/usr/lib/openmpi"
export GCC_HOST_COMPILER_PATH="/usr/bin/gcc"
export CUDA_VERSION='8.0'
export CUDNN_VERSION='6'
export CUDNN_INSTALL_PATH=/usr/local/cuda
export CUDA_COMPUTE_CAPABILITIES='3.7'
export CUDA_PATH='/usr/local/cuda'
export CUDA_PATH_LINUX='/opt/cuda'
yes "" | ./configure
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package && \
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Describe the problem
When turning on mpi, the compile fails with the following error:
INFO: Found 1 target...
ERROR: missing input file '//third_party/mpi:mpicxx.h'.
ERROR: missing input file '//third_party/mpi:mpi.h'.
ERROR: missing input file '//third_party/mpi:mpio.h'.
ERROR: /home/ubuntu/scripts/tensorflow/third_party/mpi/BUILD:18:1: //third_party/mpi:mpi: missing input file '//third_party/mpi:mpicxx.h'.
ERROR: /home/ubuntu/scripts/tensorflow/third_party/mpi/BUILD:18:1: //third_party/mpi:mpi: missing input file '//third_party/mpi:mpio.h'.
ERROR: /home/ubuntu/scripts/tensorflow/third_party/mpi/BUILD:18:1: //third_party/mpi:mpi: missing input file '//third_party/mpi:mpi.h'.
Source code / logs
before running the build script I install bazel 0.5.2 (0.5.3 breaks the build)
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 19 (9 by maintainers)
@episodeyang
I’m using the latest OpenMPI and MVAPICH versions, built from source. I’m not sure what the Ubuntu pre-built packages do and do not install. In this case it looks like the configuration script can not find the ‘mpi_portable_platform.h’ file and incorrectly assumes you are using MVAPICH. I would get the latest MVAPICH version which is confirmed to work (http://mvapich.cse.ohio-state.edu/downloads/ )
Note that there has been a recent change which breaks the compilation until pull request #11935 is merged.
Yes, give it a shot with setting:
export MPI_HOME=/usr/lib/openmpithat should result in correct symlinks.There is no need to pass any extra flags during build. I downloaded OpenMPI 3.0.0 source and compiled it with C++ support although C++ support is deprecated. Then pointed tensorflow to use new installation of OpenMPI. After that it compiled fine.
@jbedorf @eLvErDe @ahaider3 @xgh45 @episodeyang I have successfully tweaked the variable OMPI_SKIP_MPICXX for a successful building (for trunk version of tensorflow). However, it seems that we have to use an enviroment variable CC_OPT_FLAGS=“-DOMPI_SKIP_MPICXX=1 -march=native” instead. That’s because only in this way we can pass the variable into the source code layer and bypass the original definition of the macro OMPI_SKIP_MPICXX in the source code (for this case mpi.h). CC_OPT_FLAGS is the variable taken by configure.py (for trunk version of tensorflow) to assign the values for --copt & --cxxopt in bazel build process.