runtime: dotnet crashes on rhel8 arm64 while unwinding stack
The CI job we have that runs the master branch on RHEL8 arm64 is crashing for each test process.
The building of .NET works fine. That is still using the .NET 6.0 sdk, so this is probably a regression.
Interestingly, the CI job that runs on RHEL9 arm64 doesn’t have these crashes. One notable difference with RHEL8 which has caused issues with .NET in the past is RHEL8 having a 64k page size rather than the usual 4k.
When running a simple program like:
// See https://aka.ms/new-console-template for more information
Console.WriteLine("Hello, World!");
throw new Exception();
gdb /root/repos/runtime/artifacts/bin/testhost/net7.0-Linux-Debug-arm64/dotnet
gives the following stacktrace which shows a zero pointer being dereferenced.
Hello, World!
Thread 1 "dotnet" received signal SIGSEGV, Segmentation fault.
access_mem (as=<optimized out>, addr=0, val=0xffffffffb090, write=<optimized out>, arg=<optimized out>)
at /root/repos/runtime/src/native/external/libunwind/src/aarch64/Ginit.c:337
337 *val = *(unw_word_t *) addr;
(gdb) bt
#0 access_mem (as=<optimized out>, addr=0, val=0xffffffffb090, write=<optimized out>, arg=<optimized out>)
at /root/repos/runtime/src/native/external/libunwind/src/aarch64/Ginit.c:337
#1 0x0000ffffbdfb4728 in is_plt_entry (c=c@entry=0xffffffffb0e0) at /root/repos/runtime/src/native/external/libunwind/src/aarch64/Gstep.c:43
#2 0x0000ffffbdfb4670 in _ULaarch64_step (cursor=0xffffffffb0e0) at /root/repos/runtime/src/native/external/libunwind/src/aarch64/Gstep.c:171
#3 0x0000ffffbdf7aeec in PAL_VirtualUnwind (context=0xffffffffbd80, contextPointers=0xffffffffbce0)
at /root/repos/runtime/src/coreclr/pal/src/exception/seh-unwind.cpp:566
#4 0x0000ffffbdd8e480 in LazyMachState::unwindLazyState (baseState=<optimized out>, unwoundstate=0xffffffffc160, threadId=<optimized out>, funCallDepth=0,
hostCallPreference=<optimized out>) at /root/repos/runtime/src/coreclr/vm/arm64/stubs.cpp:340
#5 0x0000ffffbdbe1cac in HelperMethodFrame::InsureInit (this=0xffffffffdaf8, initialInit=false, unwindState=<optimized out>,
hostCallPreference=AllowHostCalls) at /root/repos/runtime/src/coreclr/vm/frames.cpp:1813
#6 0x0000ffffbdbe1c0c in HelperMethodFrame::GetFunction (this=0xffffffffdaf8) at /root/repos/runtime/src/coreclr/vm/frames.cpp:1732
#7 0x0000ffffbdd940fc in ExceptionTracker::InitializeCrawlFrameForExplicitFrame (pcfThisFrame=pcfThisFrame@entry=0xffffffffcd10,
pFrame=pFrame@entry=0xffffffffdaf8, pMD=pMD@entry=0xffff4432ff70) at /root/repos/runtime/src/coreclr/vm/exceptionhandling.cpp:1329
#8 0x0000ffffbdd93a58 in ExceptionTracker::ProcessOSExceptionNotification (this=this@entry=0xaaaaaab92a90,
pExceptionRecord=pExceptionRecord@entry=0xaaaaaac8dee0, pContextRecord=<optimized out>, pDispatcherContext=pDispatcherContext@entry=0xffffffffd288,
dwExceptionFlags=dwExceptionFlags@entry=0, sf=sf@entry=..., pThread=pThread@entry=0xaaaaaab94a90, STState=<optimized out>,
pICFSetAsLimitFrame=<optimized out>) at /root/repos/runtime/src/coreclr/vm/exceptionhandling.cpp:1863
#9 0x0000ffffbdd92cd0 in ProcessCLRException (pExceptionRecord=0xaaaaaac8dee0, MemoryStackFp=281474976701536, pContextRecord=0xaaaaaac8db50,
pDispatcherContext=pDispatcherContext@entry=0xffffffffd288) at /root/repos/runtime/src/coreclr/vm/exceptionhandling.cpp:1072
#10 0x0000ffffbdd96f6c in UnwindManagedExceptionPass1 (ex=..., frameContext=0xffffffffd6e0) at /root/repos/runtime/src/coreclr/vm/exceptionhandling.cpp:4624
#11 0x0000ffffbdd9725c in DispatchManagedException (ex=..., isHardwareException=<optimized out>)
at /root/repos/runtime/src/coreclr/vm/exceptionhandling.cpp:4810
#12 0x0000ffffbdcfc188 in IL_Throw (obj=<optimized out>) at /root/repos/runtime/src/coreclr/vm/jithelpers.cpp:4024
#13 0x0000ffff445aa6a8 in ?? ()
#14 0x0000ffbf3680c858 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
@janvorli do you know what could be the cause of this regression?
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 28 (28 by maintainers)
I can confirm that updating libunwind to the latest main fixes the issue. The fix for the page size arrived in libunwind about a month ago, see https://github.com/libunwind/libunwind/commit/e85b65cec757ef589f28957d0c6c21c498a03bdf