tensorflow: Compiler Test cases with tf-mlir-translate pass/crash with specific build flag on s390x architecture

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): 2.3.1
Python version: 3.6.9
Bazel version (if compiling from source): 3.4.1
GCC/Compiler version (if compiling from source): Ubuntu 7.5.0-3ubuntu1~18.04
CUDA/cuDNN version: N/A
GPU model and memory: N/A

Describe the current behavior When running test case //tensorflow/compiler/mlir/xla/tests/translate:while.hlotxt.test on s390x machine, if I include the build flag: --per_file_copt=mlir,llvm-project@-UNDEBUG the Test case passes, but if I remove the build flag, it fails with a bad alloc crash. Backtrace is attached below. Another test case: //tensorflow/compiler/tf2xla:fused_batchnorm_reserve_space_test, unlike while.hlotxt.test, fails with this build flag but passes without it.

There are multiple test case failures in //tensorflow/compiler/... with similar crash. The command I am using to test:

bazel --host_jvm_args="-Xms1024m" --host_jvm_args="-Xmx2048m" test --host_javabase="@local_jdk//:jdk" --test_tag_filters=-gpu,-benchmark-test,-v1only,-no_oss,-oss_serial  -k --test_timeout 300,450,1200,3600 --build_tests_only --test_output=errors --per_file_copt=mlir,llvm-project@-UNDEBUG -- //tensorflow/compiler/...

Please note that there is no regression in the compiler test cases with --per_file_copt=mlir,llvm-project@-UNDEBUG on x86 machine.

Describe the expected behavior Test case should pass and test case behaviour should not vary with build flag.

Other info / logs while.hlotxt.test.log

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 21 (18 by maintainers)

Most upvoted comments

@smit-hinsu @kun-lu20, Thank you for the confirmation. Glad the issue is resolved for you, please feel free to move this to closed status.

tilakrayal on Nov 2, 2022

Hi @tilakrayal ,

I am @skribm9 's colleague. Thanks for your response.

Yes, test case //tensorflow/compiler/mlir/xla/tests/translate:while.hlotxt.test could pass with both optimized binary and debug binary. We’ve verified it on v2.9.1.

kun-lu20 on Nov 1, 2022