runtime: .NET 6 application crash with unmanaged PAL_SEHException

Description

This crash happens after a .NET 6 application loads a specific .Net Framework 4.5 plugin, which works fine on Windows. The .Net Framework 4.5 assembly did not use anything considered unsafe, e.g. PInvoke, unsafe or IntPtr, C++/CLI.

An earlier issue #13685 seems to use lldb to capture a stack trace or dump, but setting a breakpoint to exit seems too late and most things are ___lldb_unnamed_symbol

❯ lldb -- dotnet run --project name.csproj
(lldb) target create "dotnet"
Current executable set to 'dotnet' (x86_64).
(lldb) settings set -- target.run-args  "run" "--project" "name.csproj"
(lldb) breakpoint set -n exit
Breakpoint 1: where = libc.so.6`exit, address = 0x00007ffff7acb100
(lldb) run
Process 2029 launched: '/usr/sbin/dotnet' (x86_64)
...
... (some console output)
...
terminate called after throwing an instance of 'PAL_SEHException'
Process 2136 stopped and restarted: thread 1 received signal: SIGCHLD
Process 2136 stopped
* thread #1, name = 'dotnet', stop reason = breakpoint 1.1
    frame #0: 0x00007ffff7acb100 libc.so.6`exit
libc.so.6`exit:
->  0x7ffff7acb100 <+0>: endbr64
    0x7ffff7acb104 <+4>: pushq  %rax
    0x7ffff7acb105 <+5>: popq   %rax
    0x7ffff7acb106 <+6>: movl   $0x1, %ecx
(lldb) bt
* thread #1, name = 'dotnet', stop reason = breakpoint 1.1
  * frame #0: 0x00007ffff7acb100 libc.so.6`exit
    frame #1: 0x00007ffff7ab3297 libc.so.6`___lldb_unnamed_symbol3141 + 135
    frame #2: 0x00007ffff7ab334a libc.so.6`__libc_start_main + 138
    frame #3: 0x00005555555578b5 dotnet`___lldb_unnamed_symbol232 + 37
(lldb) continue
Process 2136 resuming
Process 2136 exited with status = 134 (0x00000086)

Reproduction Steps

The crashed application is a server-side open-source app, but triggering the crash requires a client-side (proprietary) app to connect. It is not strictly restricted but quite big and might be not necessary.

Expected behavior

The earlier issue #13685 seems to expect this kind of exception to be handled.

Actual behavior

It crashed without any managed exception caught, terminate called after throwing an instance of 'PAL_SEHException'

Regression?

It works fine on Windows, but crashes on Linux (either with or without debugger lldb, Debug or Release build, dotnet run or publish single file, all crashed).

Known Workarounds

No response

Configuration

Originally reported from a CentOS 7 user (with 6.0.9 or 6.0.10), but reproduced with Arch x86_64 kernel 5.15.

❯ dotnet --list-runtimes
Microsoft.NETCore.App 6.0.11 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

Other information

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (9 by maintainers)

Most upvoted comments

Over the past few days, I have been attempting to understand the Linux exception handling format to allow me to create a helper library for this (in NASM, so that a full source build of MonoMod doesn’t depend on having a full Linux toolchain available). While that’s not going all that well, I did come across something interesting: __register_frame. I found it through this article (archive) from 2016, which poses it as as an approximate alternative to Windows’ RtlInstallFunctionTableCallback for libgcc. Further investigation found references to it as far back as 2004 on the GCC mailinglist (I recommend looking at the archive…). I also came across several PRs to LLVM that talk about it, including this commit which seems to suggest that the libgcc and libunwind __register_frame behave differently. There’s also this PR to LLVM’s libunwind implementation.

All this to say, I suspect it should be possible to the runtime to support unwinding native exceptions across P/Invoke boundaries, though perhaps difficult. Either way, I still need to finish this, as I still need to support released versions…

@sgkoishi thank you! Now it makes sense. The CILJit::compileMethod is not expected to be called by managed code and when it is called by the runtime in the usual cases, the exception is caught and processed by the runtime code. On Unix, exceptions cannot be propagated from native code to managed caller over pinvoke boundaries, so the exception goes unhandled and abort. That means that the problem is really in the MonoMod. The MonoMod could be fixed to work correctly in the same way that our NativeAOT compiler works. It would need to call the JIT through a native wrapper function that would catch the exception. Out code that is used in NativeAOT is here: https://github.com/dotnet/runtime/blob/943474ca16db7c65ba6cff4a89c3ebd219dde3e5/src/coreclr/tools/aot/jitinterface/jitwrapper.cpp