netcoredbg: Debugger stopping on exceptions
Using latest release 2.0.0-895 linux amd64 net5, --interpreter=vscode --engineLogging --server=5678
Randomly, but significantly often, the debugger seems to be stopping on handled exceptions. Which shouldn’t be triggering an exception in the first place 🤔, they don’t while not debugging. Reminds me of https://github.com/Samsung/netcoredbg/issues/72 .
Example 1:
Logs don’t show anything significant
Example 2:
Ref https://github.com/QuantConnect/Lean/blob/master/Engine/DataFeeds/WorkScheduling/WeightedWorkQueue.cs#L98
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 25 (12 by maintainers)
@Martin-Molinero here is new patch (make sure you revert previous patch before apply this one) 0001-Fix-callbacks-return-code-check.txt
Need more time for analyze second backtrace in https://github.com/Samsung/netcoredbg/issues/89#issuecomment-1118007059 looks like some error code from
select
at quit (related to closed FD in another place?) thatthrow std::runtime_error()
(in case of Linux this will looks like send SIGABRT signal for process), that handled by CoreCLR signal handler code (our managed part use it, so, CoreCLR is part of debugger process), but CoreCLR probably already in “shutdown” process and we have SIGSEGV here… I was not able to reproduce this, even if I put into debugger codethrow std::runtime_error()
, during debugging I see SIGABRT from debugger native code (not SIGSEGV from CoreCLR signal handler). Any way, this should be investigated in order to understand why we have this error onselect
call at quit at all.About debugger hang: I analyze backtraces and found, that debugger code works fine - CLI protocol part waiting for input, callbacks part waiting for callback call from debuggee process. Could you please check, do you have it hang or it’s just not print “prompt” and you could tape some command? Another point, at hang please wait 6+ minutes (we already faced with deadlock issues in debuggee process runtime / debug API, usually debuggee process runtime return error code 0x80131c08 - CORDBG_E_TIMEOUT after 6+ minutes).
Thanks a lot!
I already see some points, that we didn’t take into account, for example
pFrame=0x0
- we never count on nulled frame in this callback from CLR… MS Docs say nothing about this.Will analyze this backtraces, extremely interesting.
Here is fix for managed part build: 0001-Fix-managed-part-build-type.txt Note, this is separate patch, that don’t include previous fixed.
Hmm https://github.com/Samsung/netcoredbg/blob/a8bd3b95328f19dfe5519973b8176f40d3b4f509/src/CMakeLists.txt#L207 looks like it don’t care about cmake build type, will check this at work.
Ahh… I just checked all around one more time and almost finished wrote you about “please check that patch applied”. 😄
This could be netcoredbg or coreclr part (our debugger use managed part too). Unfotunately, the only way analyze hang - build debug debugger version and attach with gdb during debugger hang and print all backraces (for all threads)…
SIGSEGV inside libcoreclr.so, interesting. Could you please build debug netcoredbg version and share
bt
, so, probably we will see some netcoredbg related part?Just noticed, that
error CS1056
was related to$e
evaluation, but any way I will check$exception
evaluation.Hmm…
error CS1056: Unexpected character '$'...
looks like some issue for me, we haveReplaceInternalNames()
that must care about internal variables, but looks like it was not called. Looks like$exception
evaluation was broken during evaluation code refactor. Just noticed, that$exception
evaluation not covered by tests. I will check this at work (4 May).Any way we need more info about this exception, and direct
$exception
evaluation is the only way I know.I’m the user who initially reported this. It occurred consistently for me where the inner exception was in both the LEAN code and in the Newtonsoft code. And for an algo that runs fine if not debugging. I think @Martin-Molinero fully listed what I hit, but if there’s anything I can do to help, please let me know.