runtime: dotnet fails to exit with thread stuck in coreclr!WatsonLastChance
I built dotnet/runtime main in release, then ran outerloop tests. It jammed with active CPU, so I tried to get a dump.
Running as admin (although it shouldn’t matter - target process is not elevated)
C:\temp>dotnet-dump --version
6.0.257301+27172ce4d05e8a3b0ffdefd65f073d40a1b1fe54
C:\temp>dotnet-dump ps | \t\grep 53884
53884 dotnet D:\git\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\dotnet.exe
C:\temp>dotnet-dump collect --process-id 53884
Writing full to C:\temp\dump_20220312_094512.dmp
Cannot process request because the process (53884) has exited.
C:\temp>dotnet-dump ps | \t\grep 53884
53884 dotnet D:\git\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\dotnet.exe
C:\temp>\t\filever -v D:\git\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\dotnet.exe | \t\head -1
--a-- WAMD64 DLL ENU 7.0.22.15701 shp 126,464 03-07-2022 dotnet.exe
cc @mikem8361
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 18 (18 by maintainers)
Commits related to this issue
- Fix jit attach hang at Shutdown Fixes #66715 We are seeing exceptions thrown at shutdown turn into hangs because the debugger lock suspends threads at that point. We are mitigating that problem by d... — committed to noahfalk/runtime by noahfalk 2 years ago
- Fix jit attach hang at Shutdown (#67166) Fixes #66715 We are seeing exceptions thrown at shutdown turn into hangs because the debugger lock suspends threads at that point. We are mitigating that ... — committed to dotnet/runtime by noahfalk 2 years ago
- Fix jit attach hang at Shutdown (#67166) Fixes #66715 We are seeing exceptions thrown at shutdown turn into hangs because the debugger lock suspends threads at that point. We are mitigating that ... — committed to SteveDunn/runtime by noahfalk 2 years ago
- Fix jit attach hang at Shutdown (#67166) Fixes #66715 We are seeing exceptions thrown at shutdown turn into hangs because the debugger lock suspends threads at that point. We are mitigating that ... — committed to radekdoulik/runtime by noahfalk 2 years ago
Ah I misunderstood and thought you were trying to track down the dotnet-dump issue. I see now. I think the reason the process is hanging is from this claim here: https://github.com/dotnet/runtime/blob/main/src/coreclr/debug/ee/debugger.cpp#L380
This thread appears to be neither the finalizer nor the debugger helper thread, but it does need to keep executing in order for the process to exit. Probably a reasonable fix will be for PreJitAttach to return immediately without doing any work if we detect that m_fShutdownMode is true.
OK, it’s reproing consistently for me with the OleDB tests (for some reason). For some reason, !pe shows a window title with Office in it.
WindowTitle: ‘Microsoft.Office.dotnet.exe.15’
@tommcdon https://microsoft-my.sharepoint.com/:u:/p/danmose/EdfrQ9SlIeRCp-JmGXJ1om4BQNQX83pxpFa-FCHM33lt8w?e=8Zgywx is the dump.
I can probably repro this if needed, since it’s happened twice now. LMK.
aha, hit it again and this time got a dump from task manager. it’s got one thread left, and it’s stuck here
This
mso20win32client
seems to have injected itself into the process - it’s part of office: C:\Program Files\Common Files\Microsoft Shared\Office16\mso20win32client.dll. I do not know why. It seems to have registered itself with the CRT to be called on exit. Something caused kernel32!FatalExit to be called, which led to the onexit functions being called including this one. In the process of doing whatever it’s doing, it throws an exception, which eventually invokes coreclr!COMUnhandledExceptionFilter, which tries to invoke Watson and then enters some interestingly named functions that have warnings about deadlockshttps://github.com/dotnet/runtime/blob/main/src/coreclr/debug/ee/debugger.cpp#L374
https://github.com/dotnet/runtime/blob/main/src/coreclr/debug/ee/debugger.cpp#L6953
Thoughts? Should I move this to the runtime repo? I have the dump if you need it.
BTW, after taking this dump, I tried running
dotnet dump collect
and got the same results as before. The stack explains why – inside RtlExitUserProcess it has already set the exit code that GetExitCodeProcess reads, but the process has not yet quite exited so it’s still possible to get a handle.