runtime: Segfault in AdjustContextForVirtualStub

I have an application that consistently crashes only on Linux/ARM64, with the following stacktrace:

* thread #1, name = 'dotnet', stop reason = signal SIGSEGV
  * frame #0: 0x0000ffffb76ccc40 libcoreclr.so`AdjustContextForVirtualStub(pExceptionRecord=0x0000000000000000, pContext=0x0000ffff077f38e0) at stubs.cpp:1173:40
    frame #1: 0x0000ffffb76d4a18 libcoreclr.so`UnwindManagedExceptionPass1(ex=<unavailable>, frameContext=<unavailable>) at exceptionhandling.cpp:4579:17
    frame #2: 0x0000ffffb76d4c08 libcoreclr.so`DispatchManagedException(ex=0x0000ffff077f3cd0, isHardwareException=<unavailable>) at exceptionhandling.cpp:4686:17
    frame #3: 0x0000ffffb763c0cc libcoreclr.so`IL_Throw(obj=<unavailable>) at jithelpers.cpp:4195:5
[managed frames]

This is an application that throws a bunch of exceptions, on top of lengthy callstacks (this one is 153 frames long).

It’s important to note that the crash only occurs when our profiler is attached and we’ve ReJITted some methods. So it’s entirely possible we’re corrupting something. Still, the consistent location of the error and the fact that it happens only on ARM64 is fishy.

Using .NET 5.0.3.

I’ll keep digging on my side to figure out if the issue is in the runtime or in our profiler. coredump.zip

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 16 (15 by maintainers)

Most upvoted comments

@kevingosse re: the first segfault, do you happen to have the call and data that triggers this? Might possibly be a bug in the json serializer (Utf8Json fork in 7.x) in the Elasticsearch .NET client that would be good for us to fix 🙂 In the interests of not wanting to derail this issue, an issue can be opened on https://github.com/elastic/elasticsearch-net/issues/new/choose

That’s in the version 6 of the client, so still based on JSON.NET. In the end, the segfault happen in ExceptionDispatchInfo.Throw so I doubt the Elasticsearch client is to blame, but if I pinpoint it to something in your code I’ll make sure to report it 👍

kevingosse on Mar 5, 2021

Frames (the coreclr data structure, they are confusingly named the same as stack frames) are what we use to track native code that is used by the runtime but needs to act like managed code. Each thread has a list of Frames that the StackWalker can use to determine if the code it is walking is one of our FCalls/Helpers/etc that plays by the same rules as managed code.

The HelperMethodFrame means that it is a jit helper. You should be able to look at the assembly of the helper by disassembling at the IP (0000ffff7d1fbee8) and you can inspect the HelperMethodFrame by looking at the address of the frame (0000ffff457f0f38). E.g. in lldb expr (HelperMethodFrame *)0000ffff457f0f38 should let you look at it. I’m typing this all from memory so beware of typos.

Just because it segfaults doesn’t mean it’s a bug though. You’d have to determine what the helper is and what it’s supposed to be doing. NullReferenceExceptions in managed code are achieved by letting the native jitted code run and then if a segfault happens we look at the address of the segfault, and if it’s in managed code we translate it to a managed NullReferenceException. The code that does that is what you added a NULL check to. So long story short, this could very well be normal operation.

davmason on Mar 5, 2021