runtime: [Arm32/Linux] GS cookie corruption when running some corefx tests.
While running release build of corefx tests on checked build of coreclr on ARM32 (tested on my RPi3 with Raspbian), I have found that couple of corefx test suites fail due to GS cookie corruption detected at GC stack walk time. This happens:
- in System.Globalization.Extensions.Tests for GS cookie in the frame of System.Globalization.IdnMapping.GetAsciiCore
- in System.Collections.Specialized.Tests and System.Data.Common.Tests for GS cookie in the frame of System.Globalization.CompareInfo.GetHashCodeOfStringCore
The issue reproes in 80…100% runs of the test suites. I was trying to debug both cases, but the functions with corrupted GS cookies are called many times before the issue reproes, so I cannot use something as simple as memory watchpoint to find who’s corrupting the cookie.
Unfortunately, LLDB / sos plugin on this platform are quite unstable together, so e.g. the clrstack sos command kills LLDB. At least the ip2md works so that I can see what’s on the managed stack.
Here is a call stack of the thread with the System.Globalization.IdnMapping.GetAsciiCore on the stack when another thread runs GC and finds the corrupted cookie:
* thread dotnet/runtime#3859, name = 'dotnet'
* frame #0: 0x76f9b9a4 libpthread.so.0`__pthread_cond_wait(cond=0x001317a0, mutex=0x00131788) at pthread_cond_wait.c:188
frame dotnet/coreclr#1: 0x768692a8 libcoreclr.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(ptnwdNativeWaitData=0x00131788, dwTimeout=4294967295, ptwrWakeupReason=0x674ffac8, pdwSignaledObject=0x674ffac4) at synchmanager.cpp:479:28
frame dotnet/coreclr#2: 0x76868814 libcoreclr.so`CorUnix::CPalSynchronizationManager::BlockThread(this=0x0005e4c8, pthrCurrent=0x001315e8, dwTimeout=4294967295, fAlertable=false, fIsSleep=false, ptwrWakeupReason=0x674ffd68, pdwSignaledObject=0x674ffd90) at synchmanager.cpp:302:22
frame dotnet/coreclr#3: 0x7687580c libcoreclr.so`CorUnix::InternalWaitForMultipleObjectsEx(pThread=0x001315e8, nCount=1, lpHandles=0x674ffe90, bWaitAll=NO, dwMilliseconds=4294967295, bAlertable=NO, bPrioritize=NO) at wait.cpp:636:45
frame dotnet/coreclr#4: 0x768761be libcoreclr.so`::WaitForSingleObjectEx(hHandle=0x00000084, dwMilliseconds=4294967295, bAlertable=NO) at wait.cpp:139:13
frame dotnet/coreclr#5: 0x764b1a34 libcoreclr.so`CLREventWaitHelper2(handle=0x00000084, dwMilliseconds=4294967295, alertable=NO) at synch.cpp:377:12
frame dotnet/coreclr#6: 0x764b1924 libcoreclr.so`CLREventWaitHelper(this=0x674fff3c, pParam=0x674fff44)::$_1::operator()(CLREventWaitHelper(void*, unsigned int, int)::Param*) const at synch.cpp:402:26
frame dotnet/coreclr#7: 0x764b073e libcoreclr.so`CLREventWaitHelper(handle=0x00000084, dwMilliseconds=4294967295, alertable=NO) at synch.cpp:404:5
frame dotnet/coreclr#8: 0x764b06a6 libcoreclr.so`CLREventBase::WaitEx(this=0x000bbb68, dwMilliseconds=4294967295, mode=WaitMode_None, syncState=0x00000000) at synch.cpp:471:20
frame dotnet/coreclr#9: 0x764b0550 libcoreclr.so`CLREventBase::Wait(this=0x000bbb68, dwMilliseconds=4294967295, alertable=NO, syncState=0x00000000) at synch.cpp:417:12
frame dotnet/coreclr#10: 0x765d080a libcoreclr.so`GCEvent::Impl::Wait(this=0x000bbb68, timeout=4294967295, alertable=false) at gcenv.os.cpp:1153:24
frame dotnet/coreclr#11: 0x765cff6c libcoreclr.so`GCEvent::Wait(this=0x000bbb50, timeout=4294967295, alertable=false) at gcenv.os.cpp:1231:20
frame dotnet/coreclr#12: 0x766893b6 libcoreclr.so`WKS::GCHeap::WaitUntilGCComplete(this=0x00060c60, bConsiderGCStart=false) at gcee.cpp:309:40
frame dotnet/coreclr#13: 0x764b53e0 libcoreclr.so`Thread::RareDisablePreemptiveGC(this=0x00130860) at threadsuspend.cpp:2576:60
frame dotnet/coreclr#14: 0x7642109c libcoreclr.so`::JIT_PInvokeEndRarePath() at jithelpers.cpp:5456:13
frame dotnet/coreclr#15: 0x6fff238c - this is System.Globalization.IdnMapping.GetAsciiCore(System.String, Char*, Int32)
frame dotnet/coreclr#16: 0x6ffeed3e - this is System.Globalization.IdnMapping.GetAscii(System.String, Int32, Int32)
frame dotnet/coreclr#17: 0x6ffeec2e - this is System.Globalization.IdnMapping.GetAscii(System.String, Int32)
frame dotnet/coreclr#18: 0x6ffeec00 - this is System.Globalization.IdnMapping.GetAscii(System.String)
frame dotnet/coreclr#19: 0x66c4dc8c - this is Xunit.Assert.All[[System.__Canon, System.Private.CoreLib]](System.Collections.Generic.IEnumerable`1<System.__Canon>, System.Action`1<System.__Canon>)
frame dotnet/coreclr#20: 0x66c4c4c2 - this is System.Globalization.Tests.IdnMappingIdnaConformanceTests.GetAscii_Success()
frame dotnet/coreclr#21: 0x764fab22 libcoreclr.so`CallDescrWorkerInternal at asmhelpers.S:79
Disassembling the System.Globalization.IdnMapping.GetAsciiCore, I can see that the GS cookie location matches what the stack walker expects. But instead of having 0x12345678 in the cookie, there is a “random” value at the point of failure.
The same is true for the System.Globalization.CompareInfo.GetHashCodeOfStringCore.
The failure in both of the test suites and the stack traces (at least the frame with corrupted GS cookie and all other frames towards the TOS) is always the same.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 35 (35 by maintainers)
I have found the culprit. First I have found that the cookie location that we compute is in the middle of the stackalloc-ed buffer in both the
System.Globalization.IdnMapping.GetAsciiCore
andSystem.Globalization.CompareInfo.GetHashCodeOfStringCore
cases. The GS cookie offset is decoded inEECodeManager::GetGSCookieAddr
relative to the caller SP. For methods with stackalloc, the R9 is used to save the SP value at the end of the prolog. That way, the unwinder can compute the caller SP based on the R9. When we start stack walking atInlinedCallFrame
though, we don’t have the R9 stored in it and when we extract theREGDISPLAY
from theInlinedCallFrame
inInlinedCallFrame::UpdateRegDisplay
, we set R9 to theInlinedCallFrame::m_pCallSiteSP
, the same value we set the SP to. That value though is the SP at the call site of the pinvoke, which is a wrong value for R9 for functions with stackalloc, as the SP was already updated by the stackalloc at that point. We then use unwinder to get the caller’s SP as a base for getting the GS cookie address. The unwinder starts from the wrong R9 and so it obtains wrong caller’s SP.It seems that we will need to add a new field and save the R9 to the
InlinedCallFrame
for arm32 and update the arm32 version ofInlinedCallFrame::UpdateRegDisplay
accordingly.I am back from my vacation, so I can continue looking into the issue.