tensorflow: armeabi-v7a assembler error

Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

No

Source

source

Tensorflow Version

v2.12.0-rc1

Custom Code

No

OS Platform and Distribution

Ubuntu 22.04

Mobile device

N/A

Python version

N/A

Bazel version

Using CMake

GCC/Compiler version

Clang, NDK 25b

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

Building tensorflow lite (v2.12.0-rc1) for Android armeabi-v7a using CMake and NDK 25b, I get the following invalid assembly code error:

tflite-runtime/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/cmake_build/xnnpack/src/xnnpack/math.h:311:13: error: invalid output constraint \'=t\' in asm\n      : [i] "=t" (i)

The cause is here (a more recent freeze) https://github.com/google/XNNPACK/blob/test_515720556/src/xnnpack/math.h#L332

Android arm64-v8a builds and runs without error. With an earlier tensorflow lite version (v2.8.0) both armeabi-v7a and arm64-v8a built and ran without error.

As I read it '=t' is documented as a valid constraint for “ARM family”, but the assembler thinks this is not the case. https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints

Support at XNNPACK said a compiler flag -mfpu=vfp is required to enable the assembly code https://github.com/google/XNNPACK/issues/4348#issuecomment-1445437613 , and that the flag was set. Then suggested without reference that this was a Clang bug, and did not offer a workaround.

Further investigation suggested Clang was not the issue https://github.com/google/XNNPACK/issues/4348#issuecomment-1465060489

The CMake build script covers eight conditions for various arm 32 bit devices. Only two of these (both -march=armv6) set the required flag. The -mfpu=vfp flag is not set for -march=armv7-a, which I suspect is the cause of this issue. https://github.com/google/XNNPACK/blob/master/CMakeLists.txt#L546-L553

XNNPACK support responded, but we did not communicate successfully (as shown by https://github.com/google/XNNPACK/issues/4348#issuecomment-1465103944 and https://github.com/google/XNNPACK/issues/4348#issuecomment-1465259707) ; and we did not get a resolution. Since tflite depends on XNNPACK, I look for resolution here. Thank you.



### Standalone code to reproduce the issue

```shell
This is a build issue, no extra code.

Relevant log output

tflite-runtime/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/cmake_build/xnnpack/src/xnnpack/math.h:311:13: error: invalid output constraint \'=t\' in asm\n      : [i] "=t" (i)

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 1
  • Comments: 21 (3 by maintainers)

Commits related to this issue

Most upvoted comments

Trying to compile v2.15.0 for armeabi-v7a, with NDK 25c (officially supported now I believe) and running into the same problem. Applying the fixes from https://github.com/microsoft/onnxruntime/commit/8d298f6f78f3280ece8d42ddb50caa6e81a6826f to XNNPACK’s CMakeLists.txt gets me past the first issue, but the build then later fails on XNNPACK microkernel compilation with below (rather cryptic) error:

Long cryptic error (click me)
[ 23%] Building C object _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c.o
fatal error: error in backend: Cannot select: 0x83465a8: v4bf16 = ARMISD::VEXT 0x836ab38, 0x836ab38, Constant:i32<2>, xnnpack/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c:119:22
  0x836ab38: v4bf16,ch = CopyFromReg 0x84e41b8, Register:v4bf16 %54, xnnpack/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c:117:9
    0x851e1f0: v4bf16 = Register %54
  0x836ab38: v4bf16,ch = CopyFromReg 0x84e41b8, Register:v4bf16 %54, xnnpack/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c:117:9
    0x851e1f0: v4bf16 = Register %54
  0x836afb0: i32 = Constant<2>
In function: xnn_bf16_gemm_minmax_ukernel_1x4c8__neonbf16_bfdot
PLEASE submit a bug report to https://github.com/android-ndk/ndk/issues and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang --target=armv7-none-linux-androideabi26 --sysroot=/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/sysroot -DEIGEN_MPL2_ONLY -DFXDIV_USE_INLINE_ASSEMBLY=0 -DNOMINMAX=1 -DPTHREADPOOL_NO_DEPRECATED_API=1 -DXNN_ENABLE_ARM_BF16=1 -DXNN_ENABLE_ARM_DOTPROD=1 -DXNN_ENABLE_ARM_FP16_SCALAR=1 -DXNN_ENABLE_ARM_FP16_VECTOR=1 -DXNN_ENABLE_ARM_I8MM=1 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_CPUINFO=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_JIT=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -I/home/peter/dev/sandbox/tensorflow/tensorflow-2.15.0/third_party/xla/third_party/tsl -I/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/opencl_headers -I/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/vulkan_headers/include -I/home/peter/dev/sandbox/tensorflow/tensorflow-2.15.0/tensorflow/lite/delegates/gpu/common -I/home/peter/dev/sandbox/tensorflow/tensorflow-2.15.0/tensorflow/lite/delegates/gpu/common/task -I/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/xnnpack/src -I/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/pthreadpool-source/include -I/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/FXdiv-source/include -I/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/FP16-source/include -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -march=armv7-a -mthumb -Wformat -Werror=format-security -O3 -DNDEBUG -std=c99 -fPIC -O2 -pthread -fno-math-errno -marm -march=armv8.2-a+bf16 -mfpu=neon-fp-armv8 -MD -MT _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c.o -MF CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c.o.d -o CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c.o -c /home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/xnnpack/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module '/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/xnnpack/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c'.
4.	Running pass 'ARM Instruction Selection' on function '@xnn_bf16_gemm_minmax_ukernel_1x4c8__neonbf16_bfdot'
 #0 0x00000000047d91d8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x47d91d8)
 #1 0x00000000047d8340 llvm::sys::RunSignalHandlers() (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x47d8340)
 #2 0x00000000047a3dc3 (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x47a3dc3)
 #3 0x00000000047a3d7b (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x47a3d7b)
 #4 0x00000000047d7a87 llvm::sys::Process::Exit(int, bool) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x47d7a87)
 #5 0x00000000040dc70a (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x40dc70a)
 #6 0x0000000003083072 llvm::report_fatal_error(llvm::Twine const&, bool) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x3083072)
 #7 0x000000000282b5f5 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x282b5f5)
 #8 0x0000000006cf4e77 (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x6cf4e77)
 #9 0x000000000641f425 (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x641f425)
#10 0x0000000005e86b63 llvm::SelectionDAGISel::DoInstructionSelection() (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x5e86b63)
#11 0x0000000005e8710a llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x5e8710a)
#12 0x0000000006417d3c llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x6417d3c)
#13 0x0000000006457ad3 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x6457ad3)
#14 0x00000000064572df (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x64572df)
#15 0x0000000005d9faea llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x5d9faea)
#16 0x0000000005da0113 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x5da0113)
#17 0x0000000005d9fc6f llvm::FPPassManager::runOnModule(llvm::Module&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x5d9fc6f)
#18 0x00000000063aa794 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x63aa794)
#19 0x00000000065d6968 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x65d6968)
#20 0x00000000060524d5 (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x60524d5)
#21 0x0000000005ea25a9 clang::ParseAST(clang::Sema&, bool, bool) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x5ea25a9)
#22 0x00000000063c128d clang::FrontendAction::Execute() (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x63c128d)
#23 0x00000000063c112d clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x63c112d)
#24 0x00000000063c1541 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x63c1541)
#25 0x00000000066a9f54 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a9f54)
#26 0x00000000066a6de3 (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a6de3)
#27 0x00000000066a6c92 (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a6c92)
#28 0x00000000066a6c61 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a6c61)
#29 0x00000000066a69f4 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a69f4)
#30 0x00000000066a685f clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a685f)
#31 0x00000000066a66f2 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a66f2)
#32 0x00000000066752ee main (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66752ee)
#33 0x00007f6833640cd0 (/usr/lib/libc.so.6+0x27cd0)
#34 0x00007f6833640d8a __libc_start_main (/usr/lib/libc.so.6+0x27d8a)
#35 0x00000000064cce69 _start (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x64cce69)
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
Android (9352603, based on r450784d1) clang version 14.0.7 (https://android.googlesource.com/toolchain/llvm-project 4c603efb0cca074e9238af8b4106c30add4418f6)
Target: armv7-none-linux-android26
Thread model: posix
InstalledDir: /home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/bf16-gemm-1x4c8-minmax-neonbf16-bfdot-463081.c
clang: note: diagnostic msg: /tmp/bf16-gemm-1x4c8-minmax-neonbf16-bfdot-463081.sh
clang: note: diagnostic msg: 

********************
make[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:49874: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c.o] Error 70
make[1]: *** [CMakeFiles/Makefile2:6653: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

Any update regarding this?

----- Update Building with NDK 21e also fails with below error

[  1%] Building C object _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neoni8mm.c.o
clang: error: the clang compiler does not support '-march=armv8.2-a+i8mm'

@pkgoogle 5 weeks have passed, any clarity on this issue?