runtime: Access violations around unmanaged LoaderAllocator when using dynamic (emitted with AssemblyBuilder) collectible assemblies

After migration to .NET Core 3.1 from 2.2 we’ve started facing dotnet runtime crash during our test runs: The active test run was aborted. Reason: Test host process crashed : Fatal error. Internal CLR error. (0x80131506). The problem also occurs with .NET Core 3.0, both on Windows and Linux.

The crash is not 100% reproducible (tests need to be rerun multiple times to catch this situation) and unfortunately we were not able to locate the problem and build a small reproducible example. But there are several crash dumps showing different aspects of the problem.

We’re using AssemblyBuilder to emit collectible (AssemblyBuilderAccess.RunAndCollect) assemblies dynamically. Under some circumstances a try to use a type/object from this assembly produces access violation (reading memory near zero). When using AssemblyBuilderAccess.Run instead of RunAndCollect the issue is not reproduced. So it looks like the issue is related to introducing of collectible assemblies occurred in .NET Core 3.0.

There are several crash dumps collected on Windows we can share. I will add some thoughts on the analysis performed so far.

1. coreclr!LoaderAllocator::GetHandleValueFastPhase2

Dump: https://drive.google.com/file/d/1bZ80x7fQ_kkc3mus9KI1BT8Qi1fiNLpq/view?usp=sharing

Managed stack trace (copying only .NET part):

WinDbg output

0000001D68C3E1A8 00007fff9e7d6ab5 [HelperMethodFrame_PROTECTOBJ: 0000001d68c3e1a8] System.Signature.GetSignature(Void*, Int32, System.RuntimeFieldHandleInternal, System.IRuntimeMethodInfo, System.RuntimeType)
0000001D68C3E2E0 00007fff400c727e System.Reflection.RuntimeMethodInfo.FetchNonReturnParameters() [/_/src/System.Private.CoreLib/src/System/Reflection/RuntimeMethodInfo.cs @ 115]
0000001D68C3E330 00007fff4007be10 System.Reflection.RuntimeMethodInfo.GetParametersNoCopy() [/_/src/System.Private.CoreLib/src/System/Reflection/RuntimeMethodInfo.cs @ 357]
0000001D68C3E360 00007fff40079b5e System.RuntimeType.FilterApplyMethodBase(System.Reflection.MethodBase, System.Reflection.BindingFlags, System.Reflection.BindingFlags, System.Reflection.CallingConventions, System.Type[]) [/_/src/System.Private.CoreLib/src/System/RuntimeType.CoreCLR.cs @ 2236]
0000001D68C3E3C0 00007fff400c7e4b System.RuntimeType.GetMethodCandidates(System.String, Int32, System.Reflection.BindingFlags, System.Reflection.CallingConventions, System.Type[], Boolean) [/_/src/System.Private.CoreLib/src/System/RuntimeType.CoreCLR.cs @ 2455]
0000001D68C3E470 00007fff400ca4b2 System.RuntimeType.GetMethodImplCommon(System.String, Int32, System.Reflection.BindingFlags, System.Reflection.Binder, System.Reflection.CallingConventions, System.Type[], System.Reflection.ParameterModifier[]) [/_/src/System.Private.CoreLib/src/System/RuntimeType.CoreCLR.cs @ 2750]
0000001D68C3E500 00007fff4007b488 System.RuntimeType.GetMethodImpl(System.String, System.Reflection.BindingFlags, System.Reflection.Binder, System.Reflection.CallingConventions, System.Type[], System.Reflection.ParameterModifier[]) [/_/src/System.Private.CoreLib/src/System/RuntimeType.CoreCLR.cs @ 2736]

From unmanaged side this crash was produced by the following line (handleTable is null): https://github.com/dotnet/runtime/blob/6d39d5df3cc94cc7204ac187bc84da992dbe9d77/src/coreclr/src/vm/loaderallocator.inl#L135

By redoing some address arithmetic starting from mov rax [rdi+18h] it can be determined that LoaderAllocatorObject is 000001d8`1b940568:

WinDbg outout

00007fff`9e7d6a86 488b4718         mov     rax, qword ptr [rdi+18h]
00007fff`9e7d6a8a 488b4830         mov     rcx, qword ptr [rax+30h]
00007fff`9e7d6a8e 488b4140         mov     rax, qword ptr [rcx+40h]
00007fff`9e7d6a92 4885d2           test    rdx, rdx
00007fff`9e7d6a95 0f846c24ecff     je      coreclr!TypeHandle::GetManagedClassObject+0x3f (00007fff`9e698f07)
00007fff`9e7d6a9b 488b8040020000   mov     rax, qword ptr [rax+240h]
00007fff`9e7d6aa2 488b00           mov     rax, qword ptr [rax]
00007fff`9e7d6aa5 4885c0           test    rax, rax
00007fff`9e7d6aa8 0f845924ecff     je      coreclr!TypeHandle::GetManagedClassObject+0x3f (00007fff`9e698f07)
00007fff`9e7d6aae 488b4010         mov     rax, qword ptr [rax+10h]
00007fff`9e7d6ab2 48d1ea           shr     rdx, 1
ACCESS VIOLATION -> 00007fff`9e7d6ab5 488b5cd008       mov     rbx, qword ptr [rax+rdx*8+8] ds:00000000`00000230=????????????????

0:159> dt coreclr!LoaderAllocatorObject 000001d8`1b940568
   +0x000 m_pMethTab       : 0x00007fff`3ed1dc60 MethodTable
   +0x008 m_pLoaderAllocatorScout : 0x000001d8`1b19f198 LoaderAllocatorScoutObject
   +0x010 m_pSlots         : (null) 
   +0x018 m_slotsUsed      : 0n0
   +0x020 m_methodInstantiationsTable : (null)

Going further into m_pLoaderAllocatorScout:

0:159> dt coreclr!LoaderAllocatorScoutObject 0x1d81b19f198
   +0x000 m_pMethTab       : 0x00007fff`3ec144c0 MethodTable
   +0x008 m_nativeLoaderAllocator : 0x000001d8`1afa2c08 LoaderAllocator

and to m_nativeLoaderAllocator, the memory looks a bit strange:

WinDbg output

0:159> dt coreclr!LoaderAllocator 0x000001d8`1afa2c08
. . .
   +0x288 m_fGCPressure    : 58 -- boolean?
   +0x289 m_fUnloaded      : 15 -- boolean?
   +0x28a m_fTerminated    : 74 -- boolean?
   +0x28b m_fMarked        : 16 -- boolean?
   +0x28c m_nGCCount       : 0n472
   +0x290 m_IsCollectible  : 0
. . .

From the CLR stack this dynamic assembly does not look like a one to be unloaded. Here is the AssemblyBuilder which owns the type on which the crash occurred:

0:159> !gcroot 000001d81afa46c8
Thread 1748:
    0000001D68C3E5E0 00007FFF3F3D3452 Nezaboodka.ClientTypeBinder..ctor(Nezaboodka.ClientTypeSystem, System.Collections.Generic.IEnumerable`1<System.Type>, System.Collections.Generic.IEnumerable`1<System.Type>, Nezaboodka.Nzon.NzonSettings)
        rdi: 
            ->  000001D81AFA46A8 Nezaboodka.Nzon.FormattersGenerator
            ->  000001D81AFA46C8 System.Reflection.Emit.AssemblyBuilder

2. coreclr!MethodTable::GetLoaderAllocatorObjectForGC

Dump: https://drive.google.com/file/d/1RSjdjAJ0TTjpH5QmcmGIheoHCk-il2al/view?usp=sharing

The line produced the crash (looks like it tries to dereference invalid handle returned from GetLoaderAllocatorObjectHandle()): https://github.com/dotnet/runtime/blob/6d39d5df3cc94cc7204ac187bc84da992dbe9d77/src/coreclr/src/vm/methodtable.cpp#L9889

There is an interesting unmanaged stack trace here showing GC and JIT parts:

WinDbg output

00 0000002d`379bccf8 00007ff9`19ba335c coreclr!MethodTable::GetLoaderAllocatorObjectForGC+0x1e [f:\workspace\_work\1\s\src\vm\methodtable.cpp @ 10161] 
01 (Inline Function) --------`-------- coreclr!GCToEEInterface::GetLoaderAllocatorObjectForGC+0xc [f:\workspace\_work\1\s\src\vm\gcenv.ee.cpp @ 359] 
02 0000002d`379bcd00 00007ff9`19a4760e coreclr!WKS::gc_heap::mark_through_cards_for_segments+0x1628ec [f:\workspace\_work\1\s\src\gc\gc.cpp @ 29019] 
03 0000002d`379bce50 00007ff9`19a43402 coreclr!WKS::gc_heap::mark_phase+0x9be [f:\workspace\_work\1\s\src\gc\gc.cpp @ 20795] 
04 0000002d`379be1c0 00007ff9`19a44a5c coreclr!WKS::gc_heap::gc1+0xb2 [f:\workspace\_work\1\s\src\gc\gc.cpp @ 16346] 
05 (Inline Function) --------`-------- coreclr!GCToOSInterface::GetLowPrecisionTimeStamp+0x5 [f:\workspace\_work\1\s\src\vm\gcenv.os.cpp @ 946] 
06 0000002d`379be290 00007ff9`19a65dfc coreclr!WKS::gc_heap::garbage_collect+0x25c [f:\workspace\_work\1\s\src\gc\gc.cpp @ 17976] 
07 0000002d`379be320 00007ff9`19ad41fe coreclr!WKS::GCHeap::GarbageCollectGeneration+0x1fc [f:\workspace\_work\1\s\src\gc\gc.cpp @ 36597] 
08 (Inline Function) --------`-------- coreclr!WKS::gc_heap::trigger_gc_for_alloc+0x12 [f:\workspace\_work\1\s\src\gc\gc.cpp @ 13832] 
09 0000002d`379be370 00007ff9`19ad6d58 coreclr!WKS::gc_heap::try_allocate_more_space+0x24e [f:\workspace\_work\1\s\src\gc\gc.cpp @ 13934] 
0a (Inline Function) --------`-------- coreclr!WKS::gc_heap::allocate_more_space+0x11 [f:\workspace\_work\1\s\src\gc\gc.cpp @ 14369] 
0b (Inline Function) --------`-------- coreclr!WKS::gc_heap::allocate+0x58 [f:\workspace\_work\1\s\src\gc\gc.cpp @ 14400] 
0c 0000002d`379be3f0 00007ff9`19a7c46e coreclr!WKS::GCHeap::Alloc+0x88 [f:\workspace\_work\1\s\src\gc\gc.cpp @ 35827] 
0d (Inline Function) --------`-------- coreclr!Alloc+0x18b [f:\workspace\_work\1\s\src\vm\gchelpers.cpp @ 240] 
0e (Inline Function) --------`-------- coreclr!AllocateObject+0x22d [f:\workspace\_work\1\s\src\vm\gchelpers.cpp @ 1209] 
0f 0000002d`379be420 00007ff8`baf6819f coreclr!JIT_New+0x31e [f:\workspace\_work\1\s\src\vm\jithelpers.cpp @ 2724]

LoaderAllocator address seems to be 00007ff8`ba020a88 (and its’ boolean flags section looks suspicios too). m_hLoaderAllocatorObjectHandle has the value of 0x00000000`00000080 and it produces the crash:

WinDbg output

00007ff9`19cae933 488b4830           mov     rcx, qword ptr [rax+30h]
00007ff9`19cae937 488b4140           mov     rax, qword ptr [rcx+40h]
00007ff9`19cae93b 488b8040020000     mov     rax, qword ptr [rax+240h]
ACCESS VIOLATION -> 00007ff9`19cae942 488b00             mov     rax, qword ptr [rax] ds:00000000`00000080=????????????????

0:165> dt coreclr!LoaderAllocator 00007ff8`ba020a88
. . .
   +0x240 m_hLoaderAllocatorObjectHandle : 0x00000000`00000080 OBJECTHANDLE__
. . .
   +0x288 m_fGCPressure    : 0
   +0x289 m_fUnloaded      : 0
   +0x28a m_fTerminated    : 16
   +0x28b m_fMarked        : 1
   +0x28c m_nGCCount       : 0n1
   +0x290 m_IsCollectible  : ffffffffffffffe8
. . .

3. coreclr!LoaderAllocatorObject::GetHandleTable

Dump: https://drive.google.com/file/d/1n8p-JeOCrUS5qjBkOyiuQmP16kJYfAR4/view?usp=sharing

The crash is produced by the following line (loaderAllocator of type LoaderAllocatorObject is null): https://github.com/dotnet/runtime/blob/6d39d5df3cc94cc7204ac187bc84da992dbe9d77/src/coreclr/src/vm/loaderallocator.cpp#L936

WinDbg output

00007ffc`54d2b8e8 488b8640020000   mov     rax, qword ptr [rsi+240h]
00007ffc`54d2b8ef 488b08           mov     rcx, qword ptr [rax]
00007ffc`54d2b8f2 488b4110         mov     rax, qword ptr [rcx+10h] ds:00000000`00000010=????????????????

0:020> ? rsi
Evaluate expression: 1846619047024 = 000001ad`f3128470
0:020> dt coreclr!LoaderAllocator 000001ad`f3128470
. . .
   +0x240 m_hLoaderAllocatorObjectHandle : 0x000001ad`ee457a10 OBJECTHANDLE__
. . .
   +0x288 m_fGCPressure    : 1
   +0x289 m_fUnloaded      : 0
   +0x28a m_fTerminated    : 0
   +0x28b m_fMarked        : 0
   +0x28c m_nGCCount       : 0n32763
   +0x290 m_IsCollectible  : 1
. . .

The most strange thing here is that this allocator is referenced from managed LoaderAllocatorScout.

By dumping all LoaderAllocatorScout objects from the managed heap and looking inside their m_nativeLoaderAllocator objects for address 000001ad`f3128470 (unmanaged LoaderAllocator) it can be determined that LoaderAllocatorScout presents in managed heap and is not collected. However the handle from unmanaged to managed (m_hLoaderAllocatorObjectHandle) is wrong.

WinDbg output

0:020> .foreach ($obj {!dumpheap -type LoaderAllocatorScout -short}){.echo $obj; !do $obj}
. . .
000001ad847a5dc8
Name:        System.Reflection.LoaderAllocatorScout
MethodTable: 00007ffbf59dfb10
EEClass:     00007ffbf59e39a8
Size:        24(0x18) bytes
File:        C:\Program Files\dotnet\shared\Microsoft.NETCore.App\3.1.1\System.Private.CoreLib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffbf516ecc0  4000ec9        8        System.IntPtr  1 instance 000001ADF3128470 m_nativeLoaderAllocator
. . .

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 42 (30 by maintainers)

Commits related to this issue

Fix allocation of RuntimeTypeCache GC handle When there is a race calling RuntimeType.InitializeCache, each of the racing threads creates a new GC handle using new RuntimeTypeHandle(this).GetGCHandle... — committed to janvorli/runtime by janvorli 4 years ago
Fix allocation of RuntimeTypeCache GC handle (#33243) When there is a race calling RuntimeType.InitializeCache, each of the racing threads creates a new GC handle using new RuntimeTypeHandle(this).... — committed to dotnet/runtime by janvorli 4 years ago
Port to 3.1: Fix allocation of RuntimeTypeCache GC handle When there is a race calling RuntimeType.InitializeCache, each of the racing threads creates a new GC handle using new RuntimeTypeHandle(this... — committed to janvorli/coreclr by janvorli 4 years ago
Port to 3.1: Fix allocation of RuntimeTypeCache GC handle (#28025) * Port to 3.1: Fix allocation of RuntimeTypeCache GC handle When there is a race calling RuntimeType.InitializeCache, each of the... — committed to dotnet/coreclr by janvorli 4 years ago

Most upvoted comments

@vanashimko it got approved for the 3.1.4 release.

janvorli on Mar 10, 2020

I also wanted to thank you a lot for your cooperation with investigating this nasty issue. I have merged the proper fix to the master branch yesterday (PR #33243) and I am going to create a PR to port it to 3.1 today and I hope to get it approved. I am not sure when is the next release slot for the 3.1 though, I’ll let you know once I have more details.

janvorli on Mar 6, 2020

So thanks to the last dump, I’ve found what causes the double registration of the handle. The RuntimeType.InitializeCache uses the Interlocked.CompareExchange pattern to ensure the cache handle gets created just once even if multiple threads are racing for it. It calls RuntimeTypeHandle.GetGCHandle (which registers the handle) and then tries to set the m_cache to that using Interlocked.CompareExchange. If the m_cache was already set, it destroys the just created handle again, as it means that some other thread has won the initialization race. And the destruction is the problem, as it only destroys the handle, but doesn’t remove it from the cleanup list on the AssemblyLoaderAllocator. Some later call to the RuntimeTypeHandle.GetGCHandle gets the same handle and adds it again to the cleanup list. And when the AssemblyLoaderAllocator::CleanupHandles() is called at the loader allocator destruction time, we call DestroyHandle twice on it, as it is twice on that list.

Here is the piece of code I am talking about: https://github.com/dotnet/coreclr/blob/v3.1.1/src/System.Private.CoreLib/src/System/RuntimeType.CoreCLR.cs#L2413-L2421

Now I need to figure out how to fix that properly. It seems that we need to make sure that handles obtained by RuntimeTypeHandle.GetGCHandle are never freed by the GCHandle.InternalFree that doesn’t know anything about the loader allocator. I am not sure yet what would the right fix for that. It seems that adding a RuntimeTypeHandle.FreeGCHandle and a method to AssemblyLoaderAllocator to remove an entry from the cleanup list would be one way.

@vanashimko to verify that it fixes the issue you are hitting completely, I’ll create a patched version of System.Private.CoreLib.dll with removed call to GCHandle.InternalFree from the InitializeCache method. It would just waste one or few GC handles, but the problem should be gone as the handles on the cleanup list would never be freed at other place than the cleanup list.

janvorli on Mar 4, 2020

Great, that moved my understanding of the problem further. What happens is that the LoaderAllocator ends up being on the cleanup list twice. This is the relevant part of the stress log that shows that:

122c   8.707233700 : `CLASSLOADER`        CleanupHandles for allocator 000002E079EAA850 destroying handle 000002E07AD72670
122c   8.707233900 : `GC`                 DestroyHandle: *000002E07AD72670->0000000000000000
122c   8.707234100 : `CLASSLOADER`        CleanupHandles for allocator 000002E079EAA850 destroying handle 000002E07AD72660
122c   8.707234200 : `GC`                 DestroyHandle: *000002E07AD72660->0000000000000000
122c   8.707234300 : `CLASSLOADER`        CleanupHandles for allocator 000002E079EAA850 destroying handle 000002E07AD72648
122c   8.707234300 : `GC`                 DestroyHandle: *000002E07AD72648->0000000000000000
122c   8.707234500 : `CLASSLOADER`        CleanupHandles for allocator 000002E079EAA850 destroying handle 000002E07AD72640
122c   8.707234500 : `GC`                 DestroyHandle: *000002E07AD72640->0000000000000000
122c   8.707234700 : `CLASSLOADER`        CleanupHandles for allocator 000002E079EAA850 destroying handle 000002E07AD72638
122c   8.707234700 : `GC`                 DestroyHandle: *000002E07AD72638->0000000000000000
122c   8.707234800 : `CLASSLOADER`        CleanupHandles for allocator 000002E079EAA850 destroying handle 000002E07AD72630
122c   8.707234900 : `GC`                 DestroyHandle: *000002E07AD72630->0000000000000000
122c   8.707235000 : `CLASSLOADER`        CleanupHandles for allocator 000002E079EAA850 destroying handle 000002E07AD72630
122c   8.707235100 : `GC`                 DestroyHandle: *000002E07AD72630->0000000000000000
122c   8.707235300 : `CLASSLOADER`        Double-destroy of handle 000002E07AD72630 called from 00007FFF1EB9822B

I am going to add some more instrumentation to catch the spot when the double-adding happens.

janvorli on Mar 3, 2020

Sure, you can take a look at it here. The changes are just adding more stress log events and a hashtable to keep track of allocated handles. They are meant only to debug the issue and are a bit hacky. https://github.com/janvorli/coreclr/pull/new/loader-allocator-handle-debugging

janvorli on Mar 4, 2020

This is the real stack trace:

 # Child-SP          RetAddr           Call Site
00 000000c8`f0cfed58 00007fff`61a0a9b8 ntdll!ZwTerminateProcess+0x14 
01 000000c8`f0cfed60 00007fff`6167cd8a ntdll!RtlExitUserProcess+0xb8 
02 000000c8`f0cfed90 00007ffe`e0662045 kernel32!FatalExit+0xa 
03 (Inline Function) --------`-------- coreclr!EEPolicy::ExitProcessViaShim+0x8 [F:\git\coreclr2\src\vm\eepolicy.cpp @ 538] 
04 000000c8`f0cfedc0 00007ffe`e06609a4 coreclr!SafeExitProcess+0x1d9 [F:\git\coreclr2\src\vm\eepolicy.cpp @ 515] 
05 000000c8`f0cfef50 00007ffe`e059baf7 coreclr!EEPolicy::HandleFatalError+0x170 [F:\git\coreclr2\src\vm\eepolicy.cpp @ 1221] 
06 000000c8`f0cff560 00007ffe`e054fc4b coreclr!HandleTestDestroyHandle+0xbf [F:\git\coreclr2\src\vm\corhost.cpp @ 995] 
07 000000c8`f0cff5a0 00007ffe`e06980de coreclr!HndDestroyHandle+0x57 [F:\git\coreclr2\src\gc\handletable.cpp @ 380] 
08 000000c8`f0cff5e0 00007ffe`e06919d6 coreclr!AssemblyLoaderAllocator::CleanupHandles+0x3e [F:\git\coreclr2\src\vm\loaderallocator.cpp @ 1860] 
09 000000c8`f0cff610 00007ffe`e06982f5 coreclr!AppDomain::ShutdownFreeLoaderAllocators+0x52 [F:\git\coreclr2\src\vm\appdomain.cpp @ 848] 
0a 000000c8`f0cff650 00007ffe`dff43d01 coreclr!LoaderAllocator::Destroy+0x1a5 [F:\git\coreclr2\src\vm\loaderallocator.cpp @ 714] 
0b 000000c8`f0cff750 00007ffe`e007a861 System.Reflection.LoaderAllocatorScout.Destroy(IntPtr)
0c 000000c8`f0cff820 00007ffe`e0592326 System.Reflection.LoaderAllocatorScout.Finalize()
0d 000000c8`f0cff860 00007ffe`e04ea9b7 coreclr!FastCallFinalizeWorker+0x6 [F:\git\coreclr2\src\vm\amd64\CallDescrWorkerAMD64.asm @ 27] 
0e 000000c8`f0cff890 00007ffe`e04ea892 coreclr!MethodTable::CallFinalizer+0x97 [F:\git\coreclr2\src\vm\methodtable.cpp @ 4027] 
0f (Inline Function) --------`-------- coreclr!CallFinalizer+0x3d [F:\git\coreclr2\src\vm\finalizerthread.cpp @ 91] 
10 (Inline Function) --------`-------- coreclr!FinalizerThread::DoOneFinalization+0x3d [F:\git\coreclr2\src\vm\finalizerthread.cpp @ 108] 
11 000000c8`f0cff8d0 00007ffe`e04a9587 coreclr!FinalizerThread::FinalizeAllObjects+0xda [F:\git\coreclr2\src\vm\finalizerthread.cpp @ 137] 
12 000000c8`f0cff9c0 00007ffe`e04eb4b7 coreclr!FinalizerThread::FinalizerThreadWorker+0x97 [F:\git\coreclr2\src\vm\finalizerthread.cpp @ 353] 
13 (Inline Function) --------`-------- coreclr!ManagedThreadBase_DispatchInner+0xd [F:\git\coreclr2\src\vm\threads.cpp @ 7454] 
14 000000c8`f0cffa00 00007ffe`e04eb392 coreclr!ManagedThreadBase_DispatchMiddle+0x8f [F:\git\coreclr2\src\vm\threads.cpp @ 7498] 
15 000000c8`f0cffb10 00007ffe`e051346a coreclr!ManagedThreadBase_DispatchOuter+0xaa [F:\git\coreclr2\src\vm\threads.cpp @ 7663] 
16 (Inline Function) --------`-------- coreclr!ManagedThreadBase_NoADTransition+0x28 [F:\git\coreclr2\src\vm\threads.cpp @ 7734] 
17 (Inline Function) --------`-------- coreclr!ManagedThreadBase::FinalizerBase+0x28 [F:\git\coreclr2\src\vm\threads.cpp @ 7760] 
18 000000c8`f0cffbb0 00007ffe`e057b46a coreclr!FinalizerThread::FinalizerThreadStart+0x7a [F:\git\coreclr2\src\vm\finalizerthread.cpp @ 416] 
19 000000c8`f0cffcd0 00007fff`61677bd4 coreclr!Thread::intermediateThreadProc+0x8a 
1a 000000c8`f0cffd90 00007fff`61a0ced1 kernel32!BaseThreadInitThunk+0x14 
1b 000000c8`f0cffdc0 00000000`00000000 ntdll!RtlUserThreadStart+0x21

Unfortunately it shows a path where the cleanup is expected to happen, so the problem seems to be that something else destroyed the handle incorrectly before. An additional possibility could be a race of multiple threads running the AssemblyLoaderAllocator::CleanupHandles and ending up popping the same handle from the cleanup list twice, but I don’t think it is that one based on the fact that in the stresslog, the calls to the DestroyHandle were quite far from each other.

I’ll need to figure out how to add some new stresslog events / instrumentation to the runtime to find out more details. I’ll share another patched version of the coreclr with you once I have it.

janvorli on Mar 3, 2020

Most of the method names you are seeing on the stack trace are not correct as I’ve not shared pdb file for the coreclr.dll. Let me open your dump locally and see.

janvorli on Mar 3, 2020

Ok, great, thank you! I’ll try to get you something on Monday.

janvorli on Feb 28, 2020

@davidwrighton has also told me today that he has recently seen a similar bug caused by double free of a GC handle that later results in double allocation of the same handle. So I’ve looked at the stress log again and I can see exactly that. Extending the list of relevant items from the stress log that I’ve shown before:

38cc  18.090732000 : `GC`                 CreateHandle: 000001F2493291F8, type=1
3164  18.109332900 : `GC`GCROOTS`             GC Root 000001F2493291F8 RELOCATED 000001F2502B42D8 -> 000001F24FB5C338  MT = 00007FFAE89DF8A8 (System.Reflection.LoaderAllocator)
38cc  18.141538400 : `GC`GCROOTS`             GC Root 000001F2493291F8 RELOCATED 000001F24FB5C338 -> 000001F24D0E1140  MT = 00007FFAE89DF8A8 (System.Reflection.LoaderAllocator)
339c  19.530733500 : `GC`                 DestroyHandle: *000001F2493291F8->000001F24D0E1140
339c  20.132816500 : `GC`                 DestroyHandle: *000001F2493291F8->0000000000000000 <<< Some previous handle using the same slot was destroyed
38cc  20.576978600 : `GC`                 CreateHandle: 000001F2493291F8, type=1 << Here the handle for the LoaderAllocator got created
3d64  20.582316000 : `GC`GCROOTS`             GC Root 000001F2493291F8 RELOCATED 000001F24FE01CA0 -> 000001F24FC30828  MT = 00007FFAE89DF8A8 (System.Reflection.LoaderAllocator)
3144  20.606298600 : `GC`GCROOTS`             GC Root 000001F2493291F8 RELOCATED 000001F24FC30828 -> 000001F24C9B5208  MT = 00007FFAE89DF8A8 (System.Reflection.LoaderAllocator)
3700  20.693169400 : `GC`                 CreateHandle: 000001F2493291F8, type=1 <<< Handle in the same slot was created again, but the previous one was not destroyed

As you can see at 19.530733500 and 20.132816500, the same handle was destroyed. The handle was pointing to a LoaderAllocator at the first call to destroy and to nothing at the second call to destroy, so it seems like we have an issue with double freeing the LoaderAllocator handle in some cases in the runtime.

I’ll keep digging into it to pinpoint the case when it happens.

janvorli on Feb 25, 2020

@vanashimko I’ll be OOF next week, so I won’t be able to try to look into it more before I get back. If you’d get more dumps with the stresslog enabled with stack trace similar to the cases 2 and 3, please share them so that we can get more data.

janvorli on Feb 14, 2020

Ok, thank you for your answer, I’ll keep digging in the dumps.

janvorli on Feb 14, 2020

By redoing some address arithmetic starting from mov rax [rdi+18h] it can be determined that LoaderAllocatorObject is 000001d8`1b940568:

The interesting thing is that it is not a LoaderAllocator (named as LoaderAllocatorObject on the native side), so then what you’ve considered to be a LoaderAllocatorScout was something else too and thus the same was true for the m_nativeLoaderAllocator:

!DumpObj /d 000001d81b940568
Name:        System.RuntimeType+RuntimeTypeCache
MethodTable: 00007fff3ed1dc60
EEClass:     00007fff3ed25600
Size:        152(0x98) bytes
Type Name:   <Unloaded Type>
Type MT:     00007fff3fdc5388
File:        C:\Program Files\dotnet\shared\Microsoft.NETCore.App\3.1.1\System.Private.CoreLib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007fff3ec144c0  40001e5        8   System.RuntimeType  0 instance 000001d81b19f198 m_runtimeType
00007fff3ec144c0  40001e6       10   System.RuntimeType  0 instance 0000000000000000 m_enclosingType
00007fff3ed1d960  40001e7       88         System.Int32  1 instance                0 m_typeCode
00007fff3ecd1e18  40001e8       18        System.String  0 instance 0000000000000000 m_name
00007fff3ecd1e18  40001e9       20        System.String  0 instance 0000000000000000 m_fullname
00007fff3ecd1e18  40001ea       28        System.String  0 instance 0000000000000000 m_toString
00007fff3ecd1e18  40001eb       30        System.String  0 instance 0000000000000000 m_namespace
00007fff3ec171c8  40001ec       8c       System.Boolean  1 instance                0 m_isGlobal
00007fff3ec171c8  40001ed       8d       System.Boolean  1 instance                0 m_bIsDomainInitialized
00007fff3ed316b0  40001ee       38 ...Private.CoreLib]]  0 instance 0000000000000000 m_methodInfoCache
00007fff3ef8edb0  40001ef       40 ...Private.CoreLib]]  0 instance 000001d81b940600 m_constructorInfoCache
00007fff3ef87e30  40001f0       48 ...Private.CoreLib]]  0 instance 0000000000000000 m_fieldInfoCache
00007fff3eede150  40001f1       50 ...Private.CoreLib]]  0 instance 0000000000000000 m_interfaceCache
00007fff3eede150  40001f2       58 ...Private.CoreLib]]  0 instance 0000000000000000 m_nestedClassesCache
00007fff3ed1eac0  40001f3       60 ...Private.CoreLib]]  0 instance 0000000000000000 m_propertyInfoCache
00007fff3f38e0c0  40001f4       68 ...Private.CoreLib]]  0 instance 0000000000000000 m_eventInfoCache
00007fff3ecd1e18  40001f7       70        System.String  0 instance 0000000000000000 m_defaultMemberName
00007fff3ec10ae8  40001f8       78        System.Object  0 instance 0000000000000000 m_genericCache
00007fff3ec16610  40001f9       80      System.Object[]  0 instance 0000000000000000 _emptyArray
00007fff405b4b18  40001f5       f0 ...Private.CoreLib]]  1   static 000001d826741130 s_methodInstantiations
00007fff3ec10ae8  40001f6       f8        System.Object  0   static 000001d8187d3d48 s_methodInstantiationsLock

But the MethodTable at frame 3 points to that as if it was a LoaderAllocator instance (you can find the MethodTable address from frame 4: this->m_asMT):

!dumpmt 0x00007fff`3fd98808
EEClass:         00007fff3fd98690
Module:          00007fff3fcac080
Name:            
mdToken:         0000000002000044
File:            Unknown Module
LoaderAllocator: 000001d81b940568
BaseSize:        0x20
ComponentSize:   0x0
Slots in VTable: 11
Number of IFaces in IFaceMap: 0

My guess is that the real LoaderAllocator was collected somehow and a new object of type System.RuntimeType+RuntimeTypeCache got allocated at the same address. Obviously, it should not have gotten collected as the MethodTable of an existing object still references it. But, getting all objects from the heap with this MethodTable shows:

!dumpheap -mt 0x00007fff`3fd98808 
         Address               MT     Size
000001d81b93b428 00007fff3fd98808       32     

Statistics:
              MT    Count    TotalSize Class Name
00007fff3fd98808        1           32 <Unloaded Type>

 !dumpobj 000001d81b93b428 
Name:        <Unloaded Type>
MethodTable: 00007fff3fd98808
EEClass:     00007fff3fd98690
Size:        32(0x20) bytes
File:        Unknown Module
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007fff3f64a150  40000ab        8 ...zaboodka.Server]]  0 instance 000001d81af7b590 as__Timestamp
00007fff3f481038  40000ac       10 ...Private.CoreLib]]  0 instance 000001d81af73418 as__Boolean

!gcroot -all 000001d81b93b428 
Thread 1748:
    0000001D68C3E550 00007FFF3F3E0F58 Nezaboodka.Nzon.ObjectNzonType`1[[System.__Canon, System.Private.CoreLib]].BindFormatter(Nezaboodka.Nzon.FormattersGenerator)
        rbx: 
            ->  000001D81B93B428 <Unloaded Type>

So the object of that type is still referenced by a register in one of the stack frames of the current thread (frame 0b), yet the LoaderAllocator was unloaded as if there were no more references to it. That seems to indicate that for a single GC collection, the object was not found to be referenced anywhere and then resurfaced in the rbx register in that frame or GC somehow didn’t call the MethodTable::GetLoaderAllocatorObjectForGC for this object.

I keep looking, the other failure modes are also quite strange.

janvorli on Feb 13, 2020