runtime: .NET 6 application crash with unmanaged PAL_SEHException
Description
This crash happens after a .NET 6 application loads a specific .Net Framework 4.5 plugin, which works fine on Windows. The .Net Framework 4.5 assembly did not use anything considered unsafe, e.g. PInvoke, unsafe or IntPtr, C++/CLI.
An earlier issue #13685 seems to use lldb to capture a stack trace or dump, but setting a breakpoint to exit
seems too late and most things are ___lldb_unnamed_symbol
❯ lldb -- dotnet run --project name.csproj
(lldb) target create "dotnet"
Current executable set to 'dotnet' (x86_64).
(lldb) settings set -- target.run-args "run" "--project" "name.csproj"
(lldb) breakpoint set -n exit
Breakpoint 1: where = libc.so.6`exit, address = 0x00007ffff7acb100
(lldb) run
Process 2029 launched: '/usr/sbin/dotnet' (x86_64)
...
... (some console output)
...
terminate called after throwing an instance of 'PAL_SEHException'
Process 2136 stopped and restarted: thread 1 received signal: SIGCHLD
Process 2136 stopped
* thread #1, name = 'dotnet', stop reason = breakpoint 1.1
frame #0: 0x00007ffff7acb100 libc.so.6`exit
libc.so.6`exit:
-> 0x7ffff7acb100 <+0>: endbr64
0x7ffff7acb104 <+4>: pushq %rax
0x7ffff7acb105 <+5>: popq %rax
0x7ffff7acb106 <+6>: movl $0x1, %ecx
(lldb) bt
* thread #1, name = 'dotnet', stop reason = breakpoint 1.1
* frame #0: 0x00007ffff7acb100 libc.so.6`exit
frame #1: 0x00007ffff7ab3297 libc.so.6`___lldb_unnamed_symbol3141 + 135
frame #2: 0x00007ffff7ab334a libc.so.6`__libc_start_main + 138
frame #3: 0x00005555555578b5 dotnet`___lldb_unnamed_symbol232 + 37
(lldb) continue
Process 2136 resuming
Process 2136 exited with status = 134 (0x00000086)
Reproduction Steps
The crashed application is a server-side open-source app, but triggering the crash requires a client-side (proprietary) app to connect. It is not strictly restricted but quite big and might be not necessary.
Expected behavior
The earlier issue #13685 seems to expect this kind of exception to be handled.
Actual behavior
It crashed without any managed exception caught, terminate called after throwing an instance of 'PAL_SEHException'
Regression?
It works fine on Windows, but crashes on Linux (either with or without debugger lldb, Debug or Release build, dotnet run
or publish single file
, all crashed).
Known Workarounds
No response
Configuration
Originally reported from a CentOS 7 user (with 6.0.9 or 6.0.10), but reproduced with Arch x86_64 kernel 5.15.
❯ dotnet --list-runtimes
Microsoft.NETCore.App 6.0.11 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Other information
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (9 by maintainers)
Over the past few days, I have been attempting to understand the Linux exception handling format to allow me to create a helper library for this (in NASM, so that a full source build of MonoMod doesn’t depend on having a full Linux toolchain available). While that’s not going all that well, I did come across something interesting:
__register_frame
. I found it through this article (archive) from 2016, which poses it as as an approximate alternative to Windows’RtlInstallFunctionTableCallback
forlibgcc
. Further investigation found references to it as far back as 2004 on the GCC mailinglist (I recommend looking at the archive…). I also came across several PRs to LLVM that talk about it, including this commit which seems to suggest that thelibgcc
andlibunwind
__register_frame
behave differently. There’s also this PR to LLVM’slibunwind
implementation.All this to say, I suspect it should be possible to the runtime to support unwinding native exceptions across P/Invoke boundaries, though perhaps difficult. Either way, I still need to finish this, as I still need to support released versions…
@sgkoishi thank you! Now it makes sense. The CILJit::compileMethod is not expected to be called by managed code and when it is called by the runtime in the usual cases, the exception is caught and processed by the runtime code. On Unix, exceptions cannot be propagated from native code to managed caller over pinvoke boundaries, so the exception goes unhandled and abort. That means that the problem is really in the MonoMod. The MonoMod could be fixed to work correctly in the same way that our NativeAOT compiler works. It would need to call the JIT through a native wrapper function that would catch the exception. Out code that is used in NativeAOT is here: https://github.com/dotnet/runtime/blob/943474ca16db7c65ba6cff4a89c3ebd219dde3e5/src/coreclr/tools/aot/jitinterface/jitwrapper.cpp