runtime: dotnet fails to exit with thread stuck in coreclr!WatsonLastChance

I built dotnet/runtime main in release, then ran outerloop tests. It jammed with active CPU, so I tried to get a dump.

Running as admin (although it shouldn’t matter - target process is not elevated)

C:\temp>dotnet-dump --version
6.0.257301+27172ce4d05e8a3b0ffdefd65f073d40a1b1fe54

C:\temp>dotnet-dump ps | \t\grep 53884
     53884 dotnet     D:\git\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\dotnet.exe

C:\temp>dotnet-dump collect --process-id 53884

Writing full to C:\temp\dump_20220312_094512.dmp
Cannot process request because the process (53884) has exited.

C:\temp>dotnet-dump ps | \t\grep 53884
     53884 dotnet     D:\git\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\dotnet.exe

C:\temp>\t\filever -v D:\git\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\dotnet.exe | \t\head -1
--a-- WAMD64 DLL ENU    7.0.22.15701 shp    126,464 03-07-2022 dotnet.exe

cc @mikem8361

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 18 (18 by maintainers)

Commits related to this issue

Most upvoted comments

Ah I misunderstood and thought you were trying to track down the dotnet-dump issue. I see now. I think the reason the process is hanging is from this claim here: https://github.com/dotnet/runtime/blob/main/src/coreclr/debug/ee/debugger.cpp#L380

This thread appears to be neither the finalizer nor the debugger helper thread, but it does need to keep executing in order for the process to exit. Probably a reasonable fix will be for PreJitAttach to return immediately without doing any work if we detect that m_fShutdownMode is true.

OK, it’s reproing consistently for me with the OleDB tests (for some reason). For some reason, !pe shows a window title with Office in it.

CommandLine:  '"C:\git\runtime\artifacts\bin\testhost\net7.0-windows-Debug-x64\dotnet.exe"  exec --runtimeconfig System.Data.OleDb.Tests.runtimeconfig.json --depsfile System.Data.OleDb.Tests.deps.json xunit.console.dll System.Data.OleDb.Tests.dll -xml testResults.xml -nologo -notrait category=failing '

WindowTitle: ‘Microsoft.Office.dotnet.exe.15’

@tommcdon https://microsoft-my.sharepoint.com/:u:/p/danmose/EdfrQ9SlIeRCp-JmGXJ1om4BQNQX83pxpFa-FCHM33lt8w?e=8Zgywx is the dump.

I can probably repro this if needed, since it’s happened twice now. LMK.

aha, hit it again and this time got a dump from task manager. it’s got one thread left, and it’s stuck here

0:000> ~*k

.  0  Id: 42d4.a24c Suspend: 0 Teb: 0000007a`f0ef2000 Unfrozen
 # Child-SP          RetAddr               Call Site
00 0000007a`f117dfd8 00007fff`d7c2cb20     ntdll!ZwWaitForMultipleObjects+0x14 [minkernel\ntdll\daytona\objfre\amd64\usrstubs.asm @ 907] 
01 0000007a`f117dfe0 00007fff`754b718e     KERNELBASE!WaitForMultipleObjectsEx+0xf0 [minkernel\kernelbase\synch.c @ 1551] 
02 (Inline Function) --------`--------     coreclr!Thread::DoAppropriateAptStateWait+0x45 [C:\git\runtime\src\coreclr\vm\threads.cpp @ 3398] 
03 0000007a`f117e2d0 00007fff`754b6ffd     coreclr!Thread::DoAppropriateWaitWorker+0x152 [C:\git\runtime\src\coreclr\vm\threads.cpp @ 3530] 
04 0000007a`f117e3c0 00007fff`75860f29     coreclr!Thread::DoAppropriateWait+0x89 [C:\git\runtime\src\coreclr\vm\threads.cpp @ 3247] 
05 (Inline Function) --------`--------     coreclr!CLREventBase::WaitEx+0x2f [C:\git\runtime\src\coreclr\vm\synch.cpp @ 459] 
06 (Inline Function) --------`--------     coreclr!CLREventBase::Wait+0x2f [C:\git\runtime\src\coreclr\vm\synch.cpp @ 412] 
07 0000007a`f117e440 00007fff`757f885d     coreclr!WaitForEndOfShutdown+0x61 [C:\git\runtime\src\coreclr\vm\ceemain.cpp @ 1162] 
08 (Inline Function) --------`--------     coreclr!CrstBase::ReleaseAndBlockForShutdownIfNotSpecialThread+0x2b [C:\git\runtime\src\coreclr\vm\crst.cpp @ 134] 
09 (Inline Function) --------`--------     coreclr!Debugger::DoNotCallDirectlyPrivateLock+0x9a [C:\git\runtime\src\coreclr\debug\ee\debugger.cpp @ 385] 
0a (Inline Function) --------`--------     coreclr!Debugger::AcquireDebuggerLock+0x9a [C:\git\runtime\src\coreclr\debug\ee\debugger.h @ 2307] 
0b (Inline Function) --------`--------     coreclr!FunctionBase<Debugger *,&Debugger::AcquireDebuggerLock,&Debugger::ReleaseDebuggerLock>::DoAcquire+0x9a [C:\git\runtime\src\coreclr\inc\holder.h @ 711] 
0c (Inline Function) --------`--------     coreclr!BaseHolder<Debugger *,FunctionBase<Debugger *,&Debugger::AcquireDebuggerLock,&Debugger::ReleaseDebuggerLock>,0,&CompareDefault<Debugger *> >::Acquire+0x9a [C:\git\runtime\src\coreclr\inc\holder.h @ 283] 
0d (Inline Function) --------`--------     coreclr!BaseHolder<Debugger *,FunctionBase<Debugger *,&Debugger::AcquireDebuggerLock,&Debugger::ReleaseDebuggerLock>,0,&CompareDefault<Debugger *> >::{ctor}+0xab [C:\git\runtime\src\coreclr\inc\holder.h @ 250] 
0e (Inline Function) --------`--------     coreclr!Holder<Debugger *,&Debugger::AcquireDebuggerLock,&Debugger::ReleaseDebuggerLock,0,&CompareDefault<Debugger *>,1>::{ctor}+0xab [C:\git\runtime\src\coreclr\inc\holder.h @ 746] 
0f 0000007a`f117e490 00007fff`756c539b     coreclr!Debugger::PreJitAttach+0xdd [C:\git\runtime\src\coreclr\debug\ee\debugger.cpp @ 6958] 
10 0000007a`f117e4e0 00007fff`756c0e01     coreclr!WatsonLastChance+0x103 [C:\git\runtime\src\coreclr\vm\excep.cpp @ 3797] 
11 0000007a`f117e560 00007fff`756c0a1e     coreclr!InternalUnhandledExceptionFilter_Worker+0x381 [C:\git\runtime\src\coreclr\vm\excep.cpp @ 4851] 
12 0000007a`f117e680 00007fff`7586a040     coreclr!InternalUnhandledExceptionFilter+0xe [C:\git\runtime\src\coreclr\vm\excep.cpp @ 4948] 
13 0000007a`f117e6c0 00007fff`d7ceb857     coreclr!COMUnhandledExceptionFilter+0x40 [C:\git\runtime\src\coreclr\vm\excep.cpp @ 5133] 
14 0000007a`f117e6f0 00007fff`da1951b0     KERNELBASE!UnhandledExceptionFilter+0x1e7 [minkernel\kernelbase\xcpt.c @ 575] 
15 (Inline Function) --------`--------     ntdll!RtlpThreadExceptionFilter+0x91 [minkernel\ntdll\rtlstrt.c @ 957] 
16 0000007a`f117e810 00007fff`da17c766     ntdll!RtlUserThreadStart$filt$0+0xa2 [minkernel\ntdll\rtlstrt.c @ 1154] 
17 0000007a`f117e850 00007fff`da1920cf     ntdll!__C_specific_handler+0x96 [minkernel\crts\crtw32\misc\riscchandler.c @ 352] 
18 0000007a`f117e8c0 00007fff`da141454     ntdll!RtlpExecuteHandlerForException+0xf [minkernel\ntos\rtl\amd64\xcptmisc.asm @ 132] 
19 0000007a`f117e8f0 00007fff`da190bfe     ntdll!RtlDispatchException+0x244 [minkernel\ntos\rtl\amd64\exdsptch.c @ 595] 
1a 0000007a`f117f000 00007fff`5a3bb9d1     ntdll!KiUserExceptionDispatch+0x2e [minkernel\ntos\rtl\amd64\trampoln.asm @ 751] 
1b (Inline Function) --------`--------     mso20win32client!std::_Hash<std::_Umap_traits<_orkey const *,std::_List_iterator<std::_List_val<std::_List_simple_types<std::pair<_orkey const *,Mso::TCntPtr<Mso::Registry::Key> > > > >,std::_Uhash_compare<_orkey const *,std::hash<_orkey const *>,std::equal_to<_orkey const *> >,std::allocator<std::pair<_orkey const * const,std::_List_iterator<std::_List_val<std::_List_simple_types<std::pair<_orkey const *,Mso::TCntPtr<Mso::Registry::Key> > > > > > >,0> >::_Find_last+0xe [.A\microsoft.office.vccompiler.hidvswk-ilw78ebaieg-t0clo7j9fofx7yuctgailq8\lib\native\include\xhash @ 1506] 
....] 
.............
32 0000007a`f117f9e0 00007fff`5a50dbe1     mso20win32client!Mso::Experiment::ConfigContext::`scalar deleting destructor'+0x14
33 0000007a`f117fa10 00007fff`5a38392c     mso20win32client!std::default_delete<Mso::Experiment::IConfigContext>::operator()+0x29 [.A\microsoft.office.vccompiler.hidvswk-ilw78ebaieg-t0clo7j9fofx7yuctgailq8\lib\native\include\memory @ 3169] 
34 0000007a`f117fa40 00007fff`5a4e08f3     mso20win32client!std::_Ref_count_base::_Decref+0x28 [.A\microsoft.office.vccompiler.hidvswk-ilw78ebaieg-t0clo7j9fofx7yuctgailq8\lib\native\include\memory @ 1110] 
35 (Inline Function) --------`--------     mso20win32client!std::_Ptr_base<Mso::Experiment::IConfigContext>::_Decref+0x11 [.A\microsoft.office.vccompiler.hidvswk-ilw78ebaieg-t0clo7j9fofx7yuctgailq8\lib\native\include\memory @ 1348] 
36 (Inline Function) --------`--------     mso20win32client!std::shared_ptr<Mso::Experiment::IConfigContext>::{dtor}+0x11 [.A\microsoft.office.vccompiler.hidvswk-ilw78ebaieg-t0clo7j9fofx7yuctgailq8\lib\native\include\memory @ 1653] 
37 (Inline Function) --------`--------     mso20win32client!std::_Default_allocator_traits<std::allocator<std::_List_node<std::pair<int const ,std::shared_ptr<Mso::Experiment::IConfigContext> >,void *> > >::destroy+0x11 [.A\microsoft.office.vccompiler.hidvswk-ilw78ebaieg-t0clo7j9fofx7yuctgailq8\lib\native\include\xmemory @ 738] 
38 0000007a`f117fa70 00007fff`5a4e08ce     mso20win32client!std::_List_node<std::pair<int const ,std::shared_ptr<Mso::Experiment::IConfigContext> >,void *>::_Freenode<std::allocator<std::_List_node<std::pair<int const ,std::shared_ptr<Mso::Experiment::IConfigContext> >,void *> > >+0x17 [.A\microsoft.office.vccompiler.hidvswk-ilw78ebaieg-t0clo7j9fofx7yuctgailq8\lib\native\include\list @ 322] 
39 0000007a`f117faa0 00007fff`5a4fddf4     mso20win32client!std::_List_node<std::pair<int const ,std::shared_ptr<Mso::Experiment::IConfigContext> >,void *>::_Free_non_head<std::allocator<std::_List_node<std::pair<int const ,std::shared_ptr<Mso::Experiment::IConfigContext> >,void *> > >+0x1e [.A\microsoft.office.vccompiler.hidvswk-ilw78ebaieg-t0clo7j9fofx7yuctgailq8\lib\native\include\list @ 333] 
3a (Inline Function) --------`--------     mso20win32client!std::list<std::pair<int const ,std::shared_ptr<Mso::Experiment::IConfigContext> >,std::allocator<std::pair<int const ,std::shared_ptr<Mso::Experiment::IConfigContext> > > >::_Tidy+0x9 [.A\microsoft.office.vccompiler.hidvswk-ilw78ebaieg-t0clo7j9fofx7yuctgailq8\lib\native\include\list @ 1440] 
3b (Inline Function) --------`--------     mso20win32client!std::list<std::pair<int const ,std::shared_ptr<Mso::Experiment::IConfigContext> >,std::allocator<std::pair<int const ,std::shared_ptr<Mso::Experiment::IConfigContext> > > >::{dtor}+0x9 [.A\microsoft.office.vccompiler.hidvswk-ilw78ebaieg-t0clo7j9fofx7yuctgailq8\lib\native\include\list @ 1046] 
3c 0000007a`f117fad0 00007fff`5a4fdd85     mso20win32client!std::_Hash<std::_Umap_traits<int,std::shared_ptr<Mso::Experiment::IConfigContext>,std::_Uhash_compare<int,std::hash<int>,std::equal_to<int> >,std::allocator<std::pair<int const ,std::shared_ptr<Mso::Experiment::IConfigContext> > >,0> >::~_Hash<std::_Umap_traits<int,std::shared_ptr<Mso::Experiment::IConfigContext>,std::_Uhash_compare<int,std::hash<int>,std::equal_to<int> >,std::allocator<std::pair<int const ,std::shared_ptr<Mso::Experiment::IConfigContext> > >,0> >+0x34
3d 0000007a`f117fb00 00007fff`d79e42d6     mso20win32client!Mso::Experiment::ExperimentLiblet::~ExperimentLiblet+0x135
3e 0000007a`f117fb30 00007fff`d79e41fb     ucrtbase!<lambda_f03950bc5685219e0bcd2087efbe011e>::operator()+0xa6 [minkernel\crts\ucrt\src\appcrt\startup\onexit.cpp @ 206] 
3f 0000007a`f117fb80 00007fff`d79e41b4     ucrtbase!__crt_seh_guarded_call<int>::operator()<<lambda_7777bce6b2f8c936911f934f8298dc43>,<lambda_f03950bc5685219e0bcd2087efbe011e> &,<lambda_3883c3dff614d5e0c5f61bb1ac94921c> >+0x3b [VCCRT\vcruntime\inc\internal_shared.h @ 204] 
40 (Inline Function) --------`--------     ucrtbase!__acrt_lock_and_call+0x1c [minkernel\crts\ucrt\inc\corecrt_internal.h @ 976] 
41 0000007a`f117fbb0 00007fff`5a514791     ucrtbase!_execute_onexit_table+0x34 [minkernel\crts\ucrt\src\appcrt\startup\onexit.cpp @ 231] 
42 0000007a`f117fbe0 00007fff`5a5148ba     mso20win32client!dllmain_crt_process_detach+0x45 [d:\a01\_work\11\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp @ 182] 
43 0000007a`f117fc20 00007fff`da109a1d     mso20win32client!dllmain_dispatch+0xe6 [d:\a01\_work\11\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp @ 293] 
44 0000007a`f117fc80 00007fff`da14db91     ntdll!LdrpCallInitRoutine+0x61 [minkernel\ntdll\ldr.c @ 212] 
45 0000007a`f117fcf0 00007fff`da14da2d     ntdll!LdrShutdownProcess+0x141 [minkernel\ntdll\ldrinit.c @ 6216] 
46 0000007a`f117fdf0 00007fff`d9b4e0ab     ntdll!RtlExitUserProcess+0xad [minkernel\ntdll\rtlstrt.c @ 1571] 
47 0000007a`f117fe20 00007fff`d79f05bc     kernel32!FatalExit+0xb [clientcore\base\win32\client\process.c @ 2421] 
48 0000007a`f117fe50 00007fff`d79f045f     ucrtbase!exit_or_terminate_process+0x44 [minkernel\crts\ucrt\src\appcrt\startup\exit.cpp @ 143] 
49 0000007a`f117fe80 00007ff6`b1aefeaf     ucrtbase!common_exit+0x6f [minkernel\crts\ucrt\src\appcrt\startup\exit.cpp @ 280] 
4a 0000007a`f117fed0 00007fff`d9b47034     dotnet!__scrt_common_main_seh+0x173 [d:\a01\_work\12\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 295] 
4b 0000007a`f117ff10 00007fff`da142651     kernel32!BaseThreadInitThunk+0x14 [clientcore\base\win32\client\thread.c @ 64] 
4c 0000007a`f117ff40 00000000`00000000     ntdll!RtlUserThreadStart+0x21 [minkernel\ntdll\rtlstrt.c @ 1153] 

This mso20win32client seems to have injected itself into the process - it’s part of office: C:\Program Files\Common Files\Microsoft Shared\Office16\mso20win32client.dll. I do not know why. It seems to have registered itself with the CRT to be called on exit. Something caused kernel32!FatalExit to be called, which led to the onexit functions being called including this one. In the process of doing whatever it’s doing, it throws an exception, which eventually invokes coreclr!COMUnhandledExceptionFilter, which tries to invoke Watson and then enters some interestingly named functions that have warnings about deadlocks

https://github.com/dotnet/runtime/blob/main/src/coreclr/debug/ee/debugger.cpp#L374

https://github.com/dotnet/runtime/blob/main/src/coreclr/debug/ee/debugger.cpp#L6953

Thoughts? Should I move this to the runtime repo? I have the dump if you need it.

BTW, after taking this dump, I tried running dotnet dump collect and got the same results as before. The stack explains why – inside RtlExitUserProcess it has already set the exit code that GetExitCodeProcess reads, but the process has not yet quite exited so it’s still possible to get a handle.