runtime: Hang while waiting for runtime to suspend thread in busy loop

Ran perf tests overnight on Windows 7, WOW64, 7.0 preview 2 at 1615de6c and found one busy hung in the morning.

One thread is trying to allocate an object, and GC has decided to run (fwiw the heap is not large)

0:005> kp
# ChildEBP RetAddr      
00 0829f448 75221629     ntdll!ZwWaitForSingleObject(void)+0x15 [d:\win7.obj.x86fre\minkernel\ntdll\wow6432\objfre\i386\usrstubs.asm @ 135] 
01 0829f448 75111194     KERNELBASE!WaitForSingleObjectEx(void * hHandle = 0x00000088, unsigned long dwMilliseconds = 1, int bAlertable = 0n0)+0x98 [d:\win7\minkernel\kernelbase\synch.c @ 1301] 
02 0829f460 73346fc4     kernel32!WaitForSingleObjectExImplementation(void * hHandle = 0x00000088, unsigned long dwMilliseconds = 1, int bAlertable = 0n0)+0x75 [d:\win7\base\win32\client\synch.c @ 345] 
03 (Inline) --------     coreclr!CLREventWaitHelper2(void)+0x9 [D:\a\_work\1\s\src\coreclr\vm\synch.cpp @ 372] 
04 0829f4b4 73346f7b     coreclr!CLREventWaitHelper(void * handle = <Value unavailable error>, unsigned long dwMilliseconds = <Value unavailable error>, int alertable = 0n0)+0x2d [D:\a\_work\1\s\src\coreclr\vm\synch.cpp @ 397] 
05 0829f4c4 733469c0     coreclr!CLREventBase::WaitEx(unsigned long dwMilliseconds = 1, WaitMode mode = WaitMode_None (0n0), struct PendingSync * syncState = 0x00000000)+0x32 [D:\a\_work\1\s\src\coreclr\vm\synch.cpp @ 469] 
06 0829f4dc 73335628     coreclr!CLREventBase::Wait(unsigned long dwMilliseconds = 1, int alertable = 0n0, struct PendingSync * syncState = 0x00000000)+0x1a [D:\a\_work\1\s\src\coreclr\vm\synch.cpp @ 413] 
07 0829f560 733705ac     coreclr!ThreadSuspend::SuspendRuntime(ThreadSuspend::SUSPEND_REASON reason = <Value unavailable error>)+0x54a [D:\a\_work\1\s\src\coreclr\vm\threadsuspend.cpp @ 3563] 
08 0829f60c 73370452     coreclr!ThreadSuspend::SuspendEE(ThreadSuspend::SUSPEND_REASON reason = SUSPEND_FOR_GC (0n1))+0xf9 [D:\a\_work\1\s\src\coreclr\vm\threadsuspend.cpp @ 5714] 
09 0829f634 733708cc     coreclr!GCToEEInterface::SuspendEE(SUSPEND_REASON reason = SUSPEND_FOR_GC (0n1))+0x26 [D:\a\_work\1\s\src\coreclr\vm\gcenv.ee.cpp @ 28] 
0a 0829f634 73370415     coreclr!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen = 0, gc_reason reason = reason_alloc_soh (0n0))+0xa8 [D:\a\_work\1\s\src\coreclr\gc\gc.cpp @ 45101] 
0b 0829f658 732ef1b3     coreclr!WKS::gc_heap::trigger_gc_for_alloc(int gen_number = <Value unavailable error>, gc_reason gr = <Value unavailable error>, struct WKS::GCDebugSpinLock * msl = 0x73681a18, bool loh_p = false)+0x19 [D:\a\_work\1\s\src\coreclr\gc\gc.cpp @ 17075] 
0c 0829f68c 733c3bb4     coreclr!WKS::gc_heap::try_allocate_more_space(struct alloc_context * acontext = 0x0255f270, unsigned int size = 0x10, unsigned int flags = 2, int gen_number = 0n0)+0x1e4 [D:\a\_work\1\s\src\coreclr\gc\gc.cpp @ 17221] 
0d 0829f6a4 732ee473     coreclr!WKS::gc_heap::allocate_more_space(struct alloc_context * acontext = 0x0255f270, unsigned int size = 0x10, unsigned int flags = 2, int alloc_generation_number = 0n0)+0x18 [D:\a\_work\1\s\src\coreclr\gc\gc.cpp @ 17685] 
0e (Inline) --------     coreclr!WKS::gc_heap::allocate(void)+0x40 [D:\a\_work\1\s\src\coreclr\gc\gc.cpp @ 17716] 
0f 0829f6c4 732d989b     coreclr!WKS::GCHeap::Alloc(struct gc_alloc_context * context = 0x0255f270, unsigned int size = 0x10, unsigned int flags = 2)+0x53 [D:\a\_work\1\s\src\coreclr\gc\gc.cpp @ 44097] 
10 (Inline) --------     coreclr!Alloc(GC_ALLOC_FLAGS flags = <Value unavailable error>)+0x108 [D:\a\_work\1\s\src\coreclr\vm\gchelpers.cpp @ 226] 
11 0829f724 732ee14a     coreclr!AllocateObject(class MethodTable * pMT = 0x0079e774, bool fHandleCom = <Value unavailable error>)+0x175 [D:\a\_work\1\s\src\coreclr\vm\gchelpers.cpp @ 979] 
12 0829f7a8 007abb6b     coreclr!JIT_New(struct CORINFO_CLASS_STRUCT_ * typeHnd_ = 0x0079e774)+0xaa [D:\a\_work\1\s\src\coreclr\vm\jithelpers.cpp @ 2312] 
13 0829f7c0 718474f5     MicroBenchmarks!System.Collections.Concurrent.AddRemoveFromDifferentThreads`1+<>c__DisplayClass8_0[[System.Int32, System.Private.CoreLib]].<SetupConcurrentStackIteration>b__0(<Win32 error 0n318>)+0x4b

It seems to be waiting for this thread to suspend

0:006> kp
 # ChildEBP RetAddr      
00 05f5f880 718474f5     MicroBenchmarks!System.Collections.Concurrent.AddRemoveFromDifferentThreads`1+<>c__DisplayClass8_0[[System.Int32, System.Private.CoreLib]].<SetupConcurrentStackIteration>b__1(<Win32 error 0n318>)+0x43
01 05f5f890 7184b61d     System_Private_CoreLib!System.Threading.Tasks.Task.InnerInvoke(System.Threading.Tasks.Task this = <Value unavailable error>)+0x35 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 2411] 

and that thread is in a busy loop trying to pop from a ConcurrentStack. the stack is empty, so I believe TryPop is returning false immediately. itโ€™s partially inlined

                while (count < Size)
                {
                    if (stack.TryPop(out T _))
                    {
                        count++;
                    }
                }

Should we have emitted interruptible code here? Or is it by design that if you write such code you may hang at an unpredicatable moment?

Dumps shared offline.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 20 (20 by maintainers)

Most upvoted comments

is it by design that if you write such code you may hang at an unpredicatable moment?

That would be a fun thing to document ๐Ÿ˜ƒ

No. In pure managed code suspension latency (stopping the runtime at safe points) should be in 0.1-1 msec range regardless of what user code is doing. There could be rare outliers going into 10-50 msec range, but anything above that would be worth investigating.