runtime: Access Violation on x86
We have a scenario in which we are getting an access violation exception during a particular load in our system. I have a full dump of the process when the crash happened, available here: https://drive.google.com/file/d/11oqZaegxKcoNT8Xj1u9YDIcH7LcmMBsW/view?usp=sharing
The scenario we have is a few servers running and communicating with one another on the same process.
This is part of our test setup. We recently started seeing hard failures, such as this one: (d2c.3390): Access violation - code c0000005 (first chance)
The event log reports:
Application: dotnet.exe
CoreCLR Version: 4.6.27317.3
Description: The process was terminated due to an internal error in the .NET Runtime at IP 5A29AA9B (5A230000) with exit code 80131506.
Faulting application name: dotnet.exe, version: 2.1.27130.1, time stamp: 0x5c007ea0
Faulting module name: coreclr.dll, version: 4.6.27317.3, time stamp: 0x5c40c18e
Exception code: 0xc0000005
Fault offset: 0x0006aa9b
Faulting process id: 0x52d4
Faulting application start time: 0x01d4cc3f7d83fa59
Faulting application path: C:\Program Files (x86)\dotnet\dotnet.exe
Faulting module path: C:\Users\ayende\.nuget\packages\runtime.win-x86.microsoft.netcore.app\2.1.8\runtimes\win-x86\native\coreclr.dll
Report Id: ff99edfe-baf1-431a-9ce3-06e987219e6c
Faulting package full name:
Faulting package-relative application ID:
This machine has the following hot fixes applies:
PS C:\Windows\SysWOW64> Get-HotFix
Source Description HotFixID InstalledBy InstalledOn
------ ----------- -------- ----------- -----------
OREN-PC Update KB4100347 NT AUTHORITY\SYSTEM 2/18/2019 12:00:00 AM
OREN-PC Update KB4343669 NT AUTHORITY\SYSTEM 7/11/2018 12:00:00 AM
OREN-PC Update KB4456655 NT AUTHORITY\SYSTEM 9/13/2018 12:00:00 AM
OREN-PC Security Update KB4465663 NT AUTHORITY\SYSTEM 11/14/2018 12:00:00 AM
OREN-PC Security Update KB4471331 NT AUTHORITY\SYSTEM 12/6/2018 12:00:00 AM
OREN-PC Security Update KB4477137 NT AUTHORITY\SYSTEM 12/12/2018 12:00:00 AM
OREN-PC Security Update KB4480979 NT AUTHORITY\SYSTEM 1/9/2019 12:00:00 AM
OREN-PC Security Update KB4485449 NT AUTHORITY\SYSTEM 2/19/2019 12:00:00 AM
OREN-PC Security Update KB4487038 NT AUTHORITY\SYSTEM 2/19/2019 12:00:00 AM
OREN-PC Security Update KB4480966 HRHINOS\Ayende 2/21/2019 12:00:00 AM
OREN-PC Security Update KB4487017
The actual stack we are seeing is always something similar to:
0:043> kp
# ChildEBP RetAddr
00 (Inline) -------- coreclr!VolatileLoad+0x3 [e:\a\_work\335\s\src\inc\volatile.h @ 153]
01 (Inline) -------- coreclr!Volatile<unsigned long>::Load+0x3 [e:\a\_work\335\s\src\inc\volatile.h @ 292]
02 (Inline) -------- coreclr!Volatile<unsigned long>::operator unsigned long+0x3 [e:\a\_work\335\s\src\inc\volatile.h @ 346]
03 (Inline) -------- coreclr!RelativePointer<Module *>::GetValue+0x3 [e:\a\_work\335\s\src\inc\fixuppointer.h @ 68]
04 (Inline) -------- coreclr!RelativePointer<Module *>::GetValueAtPtr+0x3 [e:\a\_work\335\s\src\inc\fixuppointer.h @ 85]
05 (Inline) -------- coreclr!ReadPointer+0x3 [e:\a\_work\335\s\src\inc\fixuppointer.h @ 954]
06 (Inline) -------- coreclr!ReadPointer+0x3 [e:\a\_work\335\s\src\inc\fixuppointer.h @ 954]
07 (Inline) -------- coreclr!MethodTable::GetLoaderModule+0x3 [e:\a\_work\335\s\src\vm\methodtable.inl @ 176]
08 (Inline) -------- coreclr!MethodTable::GetLoaderAllocator+0x3 [e:\a\_work\335\s\src\vm\methodtable.inl @ 182]
09 0e60d0a0 5a299c42 coreclr!VirtualCallStubManager::ResolveWorker(struct StubCallSite * pCallSite = 0x0e60d13c, class Object ** protectedObj = 0x0e60d180, struct DispatchToken token = struct DispatchToken, VirtualCallStubManager::StubKind stubKind = SK_DISPATCH (0n2))+0x6b [e:\a\_work\335\s\src\vm\virtualcallstub.cpp @ 1719]
0a 0e60d168 5a34a26b coreclr!VSD_ResolveWorker(struct TransitionBlock * pTransitionBlock = 0x0e60d17c, unsigned long siteAddrForRegisterIndirect = 0, unsigned int token = 0xa0000)+0x24f [e:\a\_work\335\s\src\vm\virtualcallstub.cpp @ 1611]
0b 0e60d190 0e995fca coreclr!ResolveWorkerAsmStub(void)+0x1b [e:\a\_work\335\s\src\vm\i386\virtualcallstubcpu.hpp @ 525]
WARNING: Frame IP not in any known module. Following frames may be wrong.
0c 0e60d194 5a33afeb 0xe995fca
0d 0e60d1a8 5a47fc7f coreclr!CallJitEHFinallyHelper(void)+0x28 [E:\A\_work\335\s\src\vm\i386\asmhelpers.asm @ 390]
0e 0e60d200 5a3a2acc coreclr!CallJitEHFinally(class CrawlFrame * pCf = 0x43a07fdc, unsigned char * startPC = <Value unavailable error>, struct EE_ILEXCEPTION_CLAUSE * EHClausePtr = 0x0e60d260, unsigned long nestingLevel = 0)+0xb8 [e:\a\_work\335\s\src\vm\i386\excepx86.cpp @ 3385]
0f 0e60d2d0 5a249cf5 coreclr!COMPlusUnwindCallback+0x15a70c [e:\a\_work\335\s\src\vm\i386\excepx86.cpp @ 2996]
10 (Inline) -------- coreclr!Thread::MakeStackwalkerCallback+0x151 [e:\a\_work\335\s\src\vm\stackwalk.cpp @ 877]
11 0e60d59c 5a2525c1 coreclr!Thread::StackWalkFramesEx(struct REGDISPLAY * pRD = 0x0e60d5c8, <function> * pCallback = 0x0ddb401c, void * pData = 0x0e60d954, unsigned int flags = 4, class Frame * pStartFrame = 0x00000000)+0x1d4 [e:\a\_work\335\s\src\vm\stackwalk.cpp @ 958]
12 0e60d8d0 5a251f60 coreclr!Thread::StackWalkFrames(<function> * pCallback = 0x5a2483c0, void * pData = 0x0e60d954, unsigned int flags = 4, class Frame * pStartFrame = 0x00000000)+0xa1 [e:\a\_work\335\s\src\vm\stackwalk.cpp @ 1042]
13 0e60d8f0 5a252a52 coreclr!UnwindFrames(class Thread * pThread = 0x0ddb3ea8, struct ThrowCallbackType * tct = 0x0e60d954)+0x3e [e:\a\_work\335\s\src\vm\excep.cpp @ 2228]
14 (Inline) -------- coreclr!COMPlusAfterUnwind+0x98 [e:\a\_work\335\s\src\vm\i386\excepx86.cpp @ 482]
15 0e60db48 5a2512fa coreclr!CPFH_RealFirstPassHandler(struct _EXCEPTION_RECORD * pExceptionRecord = 0x0e60dcb0, struct _EXCEPTION_REGISTRATION_RECORD * pEstablisherFrame = <Value unavailable error>, struct _CONTEXT * pContext = 0x0e60dd00, int bAsynchronousThreadStop = 0n0)+0x459 [e:\a\_work\335\s\src\vm\i386\excepx86.cpp @ 1263]
16 0e60db88 5a250ef3 coreclr!CPFH_FirstPassHandler(struct _EXCEPTION_RECORD * pExceptionRecord = 0x0e60dcb0, struct _EXCEPTION_REGISTRATION_RECORD * pEstablisherFrame = 0x0e60e650, struct _CONTEXT * pContext = 0x0e60dd00)+0xc3 [e:\a\_work\335\s\src\vm\i386\excepx86.cpp @ 1401]
17 0e60dbac 7772f1a2 coreclr!COMPlusFrameHandler(struct _EXCEPTION_RECORD * pExceptionRecord = 0x0e60dcb0, struct _EXCEPTION_REGISTRATION_RECORD * pEstablisherFrame = 0x0e60e650, struct _CONTEXT * pContext = 0x0e60dd00, struct _DISPATCHER_CONTEXT * pDispatcherContext = 0x0e60dc38)+0x83 [e:\a\_work\335\s\src\vm\i386\excepx86.cpp @ 1821]
18 0e60dbd0 7772f174 ntdll!ExecuteHandler2+0x26
19 0e60dc98 7771cd86 ntdll!ExecuteHandler+0x24
1a 0e60dc98 76f21812 ntdll!KiUserExceptionDispatcher+0x26
1b 0e60e1bc 5a252d94 KERNELBASE!RaiseException+0x62
1c 0e60e264 5a3248fb coreclr!RaiseTheExceptionInternalOnly(class Object * throwable = <Value unavailable error>, int rethrow = <Value unavailable error>, int fForStackOverflow = 0n0)+0x11d [e:\a\_work\335\s\src\vm\excep.cpp @ 3039]
1d 0e60e32c 2ed2b3e4 coreclr!IL_Throw(class Object * obj = <Value unavailable error>)+0x11b [e:\a\_work\335\s\src\vm\jithelpers.cpp @ 4860]
1e 0e60e33c 0e3bd604 0x2ed2b3e4
1f 0e60e354 0e99a747 0xe3bd604
20 0e60e384 0e999f92 0xe99a747
21 0e60e3b0 0eecd3c9 0xe999f92
22 0e60e400 0e995ad1 0xeecd3c9
23 0e60e578 0e9934e6 0xe995ad1
24 0e60e5b0 0c3c7770 0xe9934e6
25 0e60e5d8 593868dd 0xc3c7770
26 0e60e5f4 59cfa31d System_Threading_Thread+0x68dd
27 0e60e624 59cfc0cc System_Private_CoreLib+0x4ca31d
28 0e60e638 5a33b0ef System_Private_CoreLib+0x4cc0cc
29 0e60e644 5a26fbf1 coreclr!CallDescrWorkerInternal(unsigned long pParams = 0xe60e6b8)+0x34 [E:\A\_work\335\s\src\vm\i386\asmhelpers.asm @ 618]
2a (Inline) -------- coreclr!CallDescrWorkerWithHandler+0x52 [e:\a\_work\335\s\src\vm\callhelpers.cpp @ 78]
2b 0e60e6e4 5a32d4e4 coreclr!MethodDescCallSite::CallTargetWorker(unsigned int64 * pArguments = 0x0e60e730, unsigned int64 * pReturnValue = 0x00000000, int cbReturnValue = 0n0)+0x235 [e:\a\_work\335\s\src\vm\callhelpers.cpp @ 620]
2c 0e60e7bc 5a342b06 coreclr!ThreadNative::KickOffThread_Worker(void * ptr = 0x0e60e948)+0x104 [e:\a\_work\335\s\src\vm\comsynchronizable.cpp @ 260]
2d 0e60e7d4 5a26f86a coreclr!ManagedThreadBase_DispatchInner(struct ManagedThreadCallState * pCallState = <Value unavailable error>)+0x70 [e:\a\_work\335\s\src\vm\threads.cpp @ 8852]
2e 0e60e880 5a26f7bb coreclr!ManagedThreadBase_DispatchMiddle(struct ManagedThreadCallState * pCallState = <Value unavailable error>)+0x65 [e:\a\_work\335\s\src\vm\threads.cpp @ 8902]
2f 0e60e8e4 5a3352b9 coreclr!ManagedThreadBase_DispatchOuter(struct ManagedThreadCallState * pCallState = 0x0e60e8ec)+0x78 [e:\a\_work\335\s\src\vm\threads.cpp @ 9161]
30 0e60e908 5a2d9e7c coreclr!ManagedThreadBase_FullTransitionWithAD(struct ADID pAppDomain = struct ADID, <function> * pTarget = <Value unavailable error>, void * args = <Value unavailable error>, UnhandledExceptionLocation filterType = ManagedThread (0n2))+0x2f [e:\a\_work\335\s\src\vm\threads.cpp @ 9200]
31 (Inline) -------- coreclr!ManagedThreadBase::KickOff+0x15 [e:\a\_work\335\s\src\vm\threads.cpp @ 9234]
32 0e60e984 5a2d9d90 coreclr!ThreadNative::KickOffThread(void * pass = 0x0ad364d0)+0xcc [e:\a\_work\335\s\src\vm\comsynchronizable.cpp @ 380]
33 0e60f824 76388484 coreclr!Thread::intermediateThreadProc(void * arg = 0x0abac020)+0x50 [e:\a\_work\335\s\src\vm\threads.cpp @ 2255]
34 0e60f838 77713ab8 KERNEL32!BaseThreadInitThunk+0x24
35 0e60f880 77713a88 ntdll!__RtlUserThreadStart+0x2f
36 0e60f890 00000000 ntdll!_RtlUserThreadStart+0x1b
The managed stack, FWIW, is:
0:043> !clrstack
OS Thread Id: 0x3390 (43)
Child SP IP Call Site
0e60d0d8 5a29aa9b [GCFrame: 0e60d0d8]
0e60d118 5a29aa9b [StubDispatchFrame: 0e60d118] System.IDisposable.Dispose()
0e60d198 0e995fca Raven.Server.Rachis.FollowerAmbassador.Run()
0e60e580 0e9934e6 Raven.Server.Rachis.FollowerAmbassador.b__58_0(System.Object)
0e60e584 0c3c8000 Raven.Server.Utils.PoolOfThreads+PooledThread.DoWork()
0e60e5b8 0c3c7770 Raven.Server.Utils.PoolOfThreads+PooledThread.Run()
0e60e5e0 593868dd System.Threading.Thread.ThreadMain_ThreadStart() [E:\A\_work\321\s\corefx\src\System.Threading.Thread\src\System\Threading\Thread.cs @ 93]
0e60e5e8 59cfc00c System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [E:\A\_work\335\s\src\mscorlib\src\System\Threading\Thread.cs @ 62]
0e60e5fc 59cfa31d System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [E:\A\_work\335\s\src\mscorlib\shared\System\Threading\ExecutionContext.cs @ 167]
0e60e630 59cfc0cc System.Threading.ThreadHelper.ThreadStart() [E:\A\_work\335\s\src\mscorlib\src\System\Threading\Thread.cs @ 91]
0e60e718 5a33b0ef [GCFrame: 0e60e718]
0e60e8a8 5a33b0ef [DebuggerU2MCatchHandlerFrame: 0e60e8a8]
We are using unsafe code, but we are pretty sure that we aren’t corrupting the heap in any manner (lots of tests cover that) and if we were, I would expect to see the failure in different locations.
From trying to figure out what is going on, a few really strange things seem to be happening here:
Here is the actual failure:
FAULTING_IP:
KERNELBASE!RaiseException+62
76f21812 8b4c2454 mov ecx,dword ptr [esp+54h]
And the full register usage is:
(d2c.3390): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000000 ebx=00e527a0 ecx=00000010 edx=15554140 esi=5a339280 edi=00e527a0
eip=5a29aa9b esp=0e60ce44 ebp=0e60d0a0 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246
As you can see, the esp
value has a non null value, but checking the memory location with the offset provided to the instruction shows just zeros.
While troubleshooting this, we found a NullReferenceException
in our code. We fixed it, but that made the problem go away.
We suspect that this is some issue related to error handling inside the CoreCLR during JIT generation.
We have run into a different issue with KB4487017 (See: https://github.com/dotnet/coreclr/issues/22597), but we are reproducing this on different versions of Windows and without the KB in question.
We aren’t able to reproduce this issue in 64 bits.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 91 (78 by maintainers)
Commits related to this issue
- JIT: fix filter liveness computation When a filter is finished executing, control can logically pass to the associated handler, any enclosing handler or filter, or any finally or fault handler nested... — committed to AndyAyersMS/coreclr by AndyAyersMS 5 years ago
- JIT: fix filter liveness computation When a filter is finished executing, control can logically pass to the associated handler, any enclosing handler or filter, or any finally or fault handler nested... — committed to AndyAyersMS/coreclr by AndyAyersMS 5 years ago
- JIT: fix filter liveness computation (#23044) When a filter is finished executing, control can logically pass to the associated handler, any enclosing handler or filter, or any finally or fault han... — committed to dotnet/coreclr by AndyAyersMS 5 years ago
- JIT: fix filter liveness computation Port of #23044 to release/2.1. When a filter is finished executing, control can logically pass to the associated handler, any enclosing handler or filter, or any... — committed to AndyAyersMS/coreclr by AndyAyersMS 5 years ago
Yes, it is a GC hole. And I’ve just verified it is still present in 3.0. I’ve ended up placing try catch around the test too and so I can repro the issue reliably in the first successful iteration (with COMPlus_GCStress= 4 set). Now I need to investigate whether the issue causing the hole is in JIT or in EH code.
I’m afraid that
LockAlreadyDisposedException
is actually expected in this scenario. I changed my code to justtry/catch
around the test run inProgram.cs
and keep running it. That allow me to ignore test failures / errors while searching for the actual issue. Really happy that you were able to reproduce this on your side. It is a GC hole, right?