tensorflow: XLA tools build broken (//tensorflow/compiler/xla/tools:replay_computation_gpu ) on tensorflow master branch

Click to expand!

Issue Type

Bug

Source

source

Tensorflow Version

build from commit: a3f3bb0eeda308acb970478034449d4d272cabae

Custom Code

Yes

OS Platform and Distribution

Ubuntu 20.04.5 LTS (Focal Fossa)

Mobile device

No response

Python version

3.8

Bazel version

5.3.0

GCC/Compiler version

gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

XLA tool build failure, error log: 

ERROR: /home/scratch.shawnw_inf/git/tensorflow-upstream/tensorflow-parser-compute/tensorflow/tensorflow/compiler/xla/tools/BUILD:110:14: Linking tensorflow/compiler/xla/tools/replay_computation_gpu failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command
  (cd /tmp/bazel_cache_2/_bazel_shawnw/6d5d44a3e4597a9a56f32dc526ea10a5/execroot/org_tensorflow && \
  exec env - \
    CUDA_TOOLKIT_PATH=/usr/local/cuda-11.8 \
    GCC_HOST_COMPILER_PATH=/usr/bin/x86_64-linux-gnu-gcc-9 \
    LD_LIBRARY_PATH=/usr/local/cuda/compat/lib.real:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 \
    PATH=/usr/local/nvm/versions/node/v16.15.1/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin:/opt/tensorrt/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python3 \
    PYTHON_LIB_PATH=/usr/lib/python3.8/dist-packages \
    TF2_BEHAVIOR=1 \
    TF_CUDA_COMPUTE_CAPABILITIES=3.5,7.0 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/tensorflow/compiler/xla/tools/replay_computation_gpu-2.params)
gemm_algorithm_picker.cc:(.text._ZNSt6vectorIN10tensorflow14AutotuneResultESaIS1_EE17_M_realloc_insertIJEEEvN9__gnu_cxx17__normal_iteratorIPS1_S3_EEDpOT_[_ZNSt6vectorIN10tensorflow14AutotuneResultESaIS1_EE17_M_realloc_insertIJEEEvN9__gnu_cxx17__normal_iteratorIPS1_S3_EEDpOT_]+0x92): undefined reference to `tensorflow::AutotuneResult::AutotuneResult()'



### Standalone code to reproduce the issue

```shell
bazel build  --verbose_failures  //tensorflow/compiler/xla/tools:replay_computation_gpu 


### Relevant log output

```shell
ERROR: /home/scratch.shawnw_inf/git/tensorflow-upstream/tensorflow-parser-compute/tensorflow/tensorflow/compiler/xla/tools/BUILD:110:14: Linking tensorflow/compiler/xla/tools/replay_computation_gpu failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /tmp/bazel_cache_2/_bazel_shawnw/6d5d44a3e4597a9a56f32dc526ea10a5/execroot/org_tensorflow && \
  exec env - \
    CUDA_TOOLKIT_PATH=/usr/local/cuda-11.8 \
    GCC_HOST_COMPILER_PATH=/usr/bin/x86_64-linux-gnu-gcc-9 \
    LD_LIBRARY_PATH=/usr/local/cuda/compat/lib.real:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 \
    PATH=/usr/local/nvm/versions/node/v16.15.1/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin:/opt/tensorrt/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python3 \
    PYTHON_LIB_PATH=/usr/lib/python3.8/dist-packages \
    TF2_BEHAVIOR=1 \
    TF_CUDA_COMPUTE_CAPABILITIES=3.5,7.0 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/tensorflow/compiler/xla/tools/replay_computation_gpu-2.params)


gemm_algorithm_picker.cc:(.text._ZNSt6vectorIN10tensorflow14AutotuneResultESaIS1_EE17_M_realloc_insertIJEEEvN9__gnu_cxx17__normal_iteratorIPS1_S3_EEDpOT_[_ZNSt6vectorIN10tensorflow14AutotuneResultESaIS1_EE17_M_realloc_insertIJEEEvN9__gnu_cxx17__normal_iteratorIPS1_S3_EEDpOT_]+0xbc): undefined reference to `tensorflow::AutotuneResult::AutotuneResult()'


gpu_conv_algorithm_picker.cc:(.text._ZNSt8__detail9__variant17__gen_vtable_implILb1ENS0_12_Multi_arrayIPFNS0_16__variant_cookieEOZNS0_16_Variant_storageILb0EJSt9monostateSt10unique_ptrIN15stream_executor3dnn12LazyOpRunnerINS8_11FusedConvOpEEESt14default_deleteISB_EES6_INS9_INS8_6ConvOpEEESC_ISG_EEEE13_M_reset_implEvEUlOT_E_RSt7variantIJS5_SE_SI_EEEJEEESt5tupleIJSQ_EESt16integer_sequenceImJLm1EEEE14__visit_invokeESN_SQ_[_ZNSt8__detail9__variant17__gen_vtable_implILb1ENS0_12_Multi_arrayIPFNS0_16__variant_cookieEOZNS0_16_Variant_storageILb0EJSt9monostateSt10unique_ptrIN15stream_executor3dnn12LazyOpRunnerINS8_11FusedConvOpEEESt14default_deleteISB_EES6_INS9_INS8_6ConvOpEEESC_ISG_EEEE13_M_reset_implEvEUlOT_E_RSt7variantIJS5_SE_SI_EEEJEEESt5tupleIJSQ_EESt16integer_sequenceImJLm1EEEE14__visit_invokeESN_SQ_]+0x3a): undefined reference to `stream_executor::dnn::AlgorithmProto::~AlgorithmProto()'

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (20 by maintainers)

Commits related to this issue

Most upvoted comments

@chr1sj0nes Could you take look at this issue? seems that the broken is related to your commits to tensorflow/compiler/xla/service/gpu/gemm_algorithm_picker.cc

Apologies for the slow reply. I can’t see how my commit might have caused this error. @d0k 's theory looks a lot more likely to me.

@chr1sj0nes Could you take look at this issue? seems that the broken is related to your commits to tensorflow/compiler/xla/service/gpu/gemm_algorithm_picker.cc