grpc: Tests that load "main" from a shared library don't work on ARM64 (under bazel)
Tests known to be impacted:
- fuzzer tests (that import “main” from fuzzer_corpus_test.cc). https://github.com/grpc/grpc/blob/73978ad0471832932517876da9d38a14b8f1d9a6/test/core/util/fuzzer_corpus_test.cc#L157
//test/cpp/end2end:end2end_test
(where main is in the end2end_test_lib)
Problem:
- When the test binary starts, _start function is first invoked and it invokes __libc_start_main with the address of main as the first parameter.
- since the main function is in a shared library, the test binary has an undefined symbol for main (=null), which gets resolved when the shared library is loaded.
- On x86_64, the address of main is resolved correctly, the right address of main is passed to __libc_start_main and the binary works just fine.
- On ARM64, the address is not resolved correctly and 0 is passed as main’s address (it never gets resolved) That leads to crash.
x86_64 dissasembly of the _start entry point
00000000004033a0 <_start>:
4033a0: 31 ed xor %ebp,%ebp
4033a2: 49 89 d1 mov %rdx,%r9
4033a5: 5e pop %rsi
4033a6: 48 89 e2 mov %rsp,%rdx
4033a9: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
4033ad: 50 push %rax
4033ae: 54 push %rsp
4033af: 4c 8d 05 1a 44 02 00 lea 0x2441a(%rip),%r8 # 4277d0 <__libc_csu_fini>
4033b6: 48 8d 0d b3 43 02 00 lea 0x243b3(%rip),%rcx # 427770 <__libc_csu_init>
4033bd: 48 8b 3d 3c da 02 00 mov 0x2da3c(%rip),%rdi # 430e00 <main>
^ this is where address of "main" is set to be passed to __libc_start_main
4033c4: ff 15 3e da 02 00 callq *0x2da3e(%rip) # 430e08 <__libc_start_main@GLIBC_2.2.5>
4033ca: f4 hlt
4033cb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
ARM64 disassembly of the _start entry point,
0000000000403190 <_start>:
403190: d280001d mov x29, #0x0 // #0
403194: d280001e mov x30, #0x0 // #0
403198: aa0003e5 mov x5, x0
40319c: f94003e1 ldr x1, [sp]
4031a0: 910023e2 add x2, sp, #0x8
4031a4: 910003e6 mov x6, sp
4031a8: d2e00000 movz x0, #0x0, lsl #48
4031ac: f2c00000 movk x0, #0x0, lsl #32
4031b0: f2a00000 movk x0, #0x0, lsl #16
4031b4: f2800000 movk x0, #0x0
^ PROBLEM HERE
HERE is where address of "main" is supposed to be set (first arg of __libc_start_main). But x0 register is
actually being set to 0.
4031b8: d2e00003 movz x3, #0x0, lsl #48
4031bc: f2c00003 movk x3, #0x0, lsl #32
4031c0: f2a00843 movk x3, #0x42, lsl #16
4031c4: f29e8d03 movk x3, #0xf468
4031c8: d2e00004 movz x4, #0x0, lsl #48
4031cc: f2c00004 movk x4, #0x0, lsl #32
4031d0: f2a00844 movk x4, #0x42, lsl #16
4031d4: f29e9d04 movk x4, #0xf4e8
4031d8: 97ffff12 bl 402e20 <__libc_start_main@plt>
4031dc: 97ffff15 bl 402e30 <abort@plt>
This problem only exhibits if the “main” function is not in the binary itself, but in a shared library (otherwise everything works alright).
Reproduction:
on ARM machine run tools/bazel test --config=dbg //test/core/slice:b64_encode_fuzzer
. It will segfault with said problem.
It still haven’t figured out if this is a bazel problem or if it’s related to a general limitation on linux aarch64.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (11 by maintainers)
@gnossen full
objdump -D
(for b64_encode_fuzzer binary) is here: https://paste.c-net.org/SmokeySprayed