grpc: Tests that load "main" from a shared library don't work on ARM64 (under bazel)

Tests known to be impacted:

Problem:

  • When the test binary starts, _start function is first invoked and it invokes __libc_start_main with the address of main as the first parameter.
  • since the main function is in a shared library, the test binary has an undefined symbol for main (=null), which gets resolved when the shared library is loaded.
  • On x86_64, the address of main is resolved correctly, the right address of main is passed to __libc_start_main and the binary works just fine.
  • On ARM64, the address is not resolved correctly and 0 is passed as main’s address (it never gets resolved) That leads to crash.

x86_64 dissasembly of the _start entry point

00000000004033a0 <_start>:
  4033a0:       31 ed                   xor    %ebp,%ebp
  4033a2:       49 89 d1                mov    %rdx,%r9
  4033a5:       5e                      pop    %rsi
  4033a6:       48 89 e2                mov    %rsp,%rdx
  4033a9:       48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
  4033ad:       50                      push   %rax
  4033ae:       54                      push   %rsp
  4033af:       4c 8d 05 1a 44 02 00    lea    0x2441a(%rip),%r8        # 4277d0 <__libc_csu_fini>
  4033b6:       48 8d 0d b3 43 02 00    lea    0x243b3(%rip),%rcx        # 427770 <__libc_csu_init>
  4033bd:       48 8b 3d 3c da 02 00    mov    0x2da3c(%rip),%rdi        # 430e00 <main>

^ this is where address of "main" is set to be passed to __libc_start_main

  4033c4:       ff 15 3e da 02 00       callq  *0x2da3e(%rip)        # 430e08 <__libc_start_main@GLIBC_2.2.5>
  4033ca:       f4                      hlt    
  4033cb:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

ARM64 disassembly of the _start entry point,

0000000000403190 <_start>:
  403190:       d280001d        mov     x29, #0x0                       // #0
  403194:       d280001e        mov     x30, #0x0                       // #0
  403198:       aa0003e5        mov     x5, x0
  40319c:       f94003e1        ldr     x1, [sp]
  4031a0:       910023e2        add     x2, sp, #0x8
  4031a4:       910003e6        mov     x6, sp
  4031a8:       d2e00000        movz    x0, #0x0, lsl #48
  4031ac:       f2c00000        movk    x0, #0x0, lsl #32
  4031b0:       f2a00000        movk    x0, #0x0, lsl #16
  4031b4:       f2800000        movk    x0, #0x0

^ PROBLEM HERE
   HERE is where address of "main" is supposed to be set (first arg of __libc_start_main). But x0 register is
   actually being set to 0.

  4031b8:       d2e00003        movz    x3, #0x0, lsl #48
  4031bc:       f2c00003        movk    x3, #0x0, lsl #32
  4031c0:       f2a00843        movk    x3, #0x42, lsl #16
  4031c4:       f29e8d03        movk    x3, #0xf468
  4031c8:       d2e00004        movz    x4, #0x0, lsl #48
  4031cc:       f2c00004        movk    x4, #0x0, lsl #32
  4031d0:       f2a00844        movk    x4, #0x42, lsl #16
  4031d4:       f29e9d04        movk    x4, #0xf4e8
  4031d8:       97ffff12        bl      402e20 <__libc_start_main@plt>
  4031dc:       97ffff15        bl      402e30 <abort@plt>

This problem only exhibits if the “main” function is not in the binary itself, but in a shared library (otherwise everything works alright).

Reproduction: on ARM machine run tools/bazel test --config=dbg //test/core/slice:b64_encode_fuzzer. It will segfault with said problem.

It still haven’t figured out if this is a bazel problem or if it’s related to a general limitation on linux aarch64.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (11 by maintainers)

Most upvoted comments

@gnossen full objdump -D (for b64_encode_fuzzer binary) is here: https://paste.c-net.org/SmokeySprayed