netcoredbg: Debugger stopping on exceptions

Using latest release 2.0.0-895 linux amd64 net5, --interpreter=vscode --engineLogging --server=5678

Randomly, but significantly often, the debugger seems to be stopping on handled exceptions. Which shouldn’t be triggering an exception in the first place 🤔, they don’t while not debugging. Reminds me of https://github.com/Samsung/netcoredbg/issues/72 .

Example 1: image

Logs don’t show anything significant image

Example 2: Ref https://github.com/QuantConnect/Lean/blob/master/Engine/DataFeeds/WorkScheduling/WeightedWorkQueue.cs#L98 image

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 25 (12 by maintainers)

Most upvoted comments

@Martin-Molinero here is new patch (make sure you revert previous patch before apply this one) 0001-Fix-callbacks-return-code-check.txt

Need more time for analyze second backtrace in https://github.com/Samsung/netcoredbg/issues/89#issuecomment-1118007059 looks like some error code from select at quit (related to closed FD in another place?) that throw std::runtime_error() (in case of Linux this will looks like send SIGABRT signal for process), that handled by CoreCLR signal handler code (our managed part use it, so, CoreCLR is part of debugger process), but CoreCLR probably already in “shutdown” process and we have SIGSEGV here… I was not able to reproduce this, even if I put into debugger codethrow std::runtime_error(), during debugging I see SIGABRT from debugger native code (not SIGSEGV from CoreCLR signal handler). Any way, this should be investigated in order to understand why we have this error on select call at quit at all.

About debugger hang: I analyze backtraces and found, that debugger code works fine - CLI protocol part waiting for input, callbacks part waiting for callback call from debuggee process. Could you please check, do you have it hang or it’s just not print “prompt” and you could tape some command? Another point, at hang please wait 6+ minutes (we already faced with deadlock issues in debuggee process runtime / debug API, usually debuggee process runtime return error code 0x80131c08 - CORDBG_E_TIMEOUT after 6+ minutes).

Thanks a lot!

I already see some points, that we didn’t take into account, for example

#5  0x00000000008d541a in netcoredbg::ManagedCallback::Exception (this=0x7f7ce406efd0, pAppDomain=0x7f7cdc000d88, pThread=0x7f7c38047a68, pFrame=0x0, nOffset=12, dwEventType=DEBUG_EXCEPTION_FIRST_CHANCE, dwFlags=1)
    at /home/netcoredbg/src/debugger/managedcallback.cpp:769

pFrame=0x0 - we never count on nulled frame in this callback from CLR… MS Docs say nothing about this.

Will analyze this backtraces, extremely interesting.

Here is fix for managed part build: 0001-Fix-managed-part-build-type.txt Note, this is separate patch, that don’t include previous fixed.

Note: I think the managed dll being used is in a debug build

Hmm https://github.com/Samsung/netcoredbg/blob/a8bd3b95328f19dfe5519973b8176f40d3b4f509/src/CMakeLists.txt#L207 looks like it don’t care about cmake build type, will check this at work.

whops! sorry I hadn’t applied the patch yet 😅, after doing so:

Ahh… I just checked all around one more time and almost finished wrote you about “please check that patch applied”. 😄

I did see the debugging session hang 1/20

This could be netcoredbg or coreclr part (our debugger use managed part too). Unfotunately, the only way analyze hang - build debug debugger version and attach with gdb during debugger hang and print all backraces (for all threads)…

Seeing the same seg fault as posted above, plus this one sometimes

SIGSEGV inside libcoreclr.so, interesting. Could you please build debug netcoredbg version and share bt, so, probably we will see some netcoredbg related part?

Just noticed, that error CS1056 was related to $e evaluation, but any way I will check $exception evaluation.

Hmm… error CS1056: Unexpected character '$'... looks like some issue for me, we have ReplaceInternalNames() that must care about internal variables, but looks like it was not called. Looks like $exception evaluation was broken during evaluation code refactor. Just noticed, that $exception evaluation not covered by tests. I will check this at work (4 May).

Any way we need more info about this exception, and direct $exception evaluation is the only way I know.

I’m the user who initially reported this. It occurred consistently for me where the inner exception was in both the LEAN code and in the Newtonsoft code. And for an algo that runs fine if not debugging. I think @Martin-Molinero fully listed what I hit, but if there’s anything I can do to help, please let me know.