runtime: Unexpected SegFault after long time running.
My Application encounter stranger segfault problem without any exception, it always occurs after several hours running. I suspect the issue lies in clr.
Using dotnet-dump to analyze the core dump, it shows
Loading core dump: CoreDump ...
Ready to process analysis commands. Type 'help' to list available commands or 'help [command]' to get detailed help on a command.
Type 'quit' or 'exit' to exit the session.
> clrstack
Failed to request Module data from assembly.
OS Thread Id: 0x18f2 (0)
Child SP IP Call Site
00007F41D1FF64B8 00007f42d7889b05 [InlinedCallFrame: 00007f41d1ff64b8]
> pe
There is no current managed exception on this thread
PLS help me.
The coredump file uploaded to
http://104.207.146.131:8000/crash/log/CoreDump
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 44 (44 by maintainers)
Sorry this took a while to get to. I see this is running on top of this version of the runtime:
@(#)Version 4.700.19.46205 @Commit: 922429db0144dd6f3b4324805464dae82857512a. This is 3.0.0. Is it possible to test the behavior under a 3.1 (say 3.1.3) runtime? 3.0 is no longer supported and any fix that needs to be done on our side would not make it to 3.0. There’s also a lot of fixes that have gone in.Back to the dump, what I can see so far is that something within
BitcoinAdv.Arbitrage.dllstarts a call into one of theTryStartNoGCRegionfunctions and ends up callingStartNoGCRegionin the runtime. Then the callstack looks something like this before calling the sigsegv handler and creating the dump:Relevant native callstack
The object that was getting promoted
The token matches `AsyncStateMachineBox` in `AsyncTaskMethodBuilder`
We start marking the object. Looking at the
Object*directly I see that the method table in it is0x00007f46a51b9059(differs by one from the one reported in SOS). Both output the same:After there’s a frame in libpthread (
libpthread.so.0___lldb_unnamed_symbol1$$libpthread.so.0 + 1`) that can’t be read. Then the next frame is a segv handler where the context points at:Looks like all that inline was tricking the debugger. The faulting call comes from inlining happening here: https://github.com/dotnet/coreclr/blob/4b5ae70e341bad3c9f25d33cfee58d2bb93d3db7/src/gc/gc.cpp#L18770 trying to get flags for if the method table has pointers. I can’t see how this can fail, unless the MethodTable or Object pointers that I’m looking at are just bogus. Manually performing this on the faulting tread’s object returns that it has no pointers.