runtime: CLRJit Access Violation on when attempting to run Windows ARM64 app

Description

Also tracked: https://github.com/dotnet/performance/issues/2421 In performance runs for Windows ARM64 we are failing due to run benchmarks due to this error:

[2022/05/09 06:29:58][INFO] Process terminated. Infinite recursion during resource lookup within System.Private.CoreLib.  This may be a bug in System.Private.CoreLib, or potentially in certain extensibility points such as assembly resolve events or CultureInfo names.  Resource name: Arg_AccessViolationException
[2022/05/09 06:29:58][INFO]    at System.Environment.FailFast(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.InternalGetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.GetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.AccessViolationException..ctor()
[2022/05/09 06:29:58][INFO]    at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
[2022/05/09 06:29:58][INFO]    at System.String.Ctor(Char*)
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureData.GetLocaleInfoEx(System.String, UInt32)
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.GetUserDefaultLocaleName()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo..cctor()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.get_CachedCulturesByName()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.GetCultureInfo(System.String)
[2022/05/09 06:29:58][INFO]    at System.Resources.ManifestBasedResourceGroveler.GetNeutralResourcesLanguage(System.Reflection.Assembly, System.Resources.UltimateResourceFallbackLocation ByRef)
[2022/05/09 06:29:58][INFO]    at System.Resources.ResourceManager.CommonAssemblyInit()
[2022/05/09 06:29:58][INFO]    at System.SR.get_ResourceManager()
[2022/05/09 06:29:58][INFO]    at System.SR.InternalGetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.GetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.AccessViolationException..ctor()
[2022/05/09 06:29:58][INFO]    at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
[2022/05/09 06:29:58][INFO]    at System.String.Ctor(Char*)
[2022/05/09 06:29:58][INFO]    at System.AppContext.Setup(Char**, Char**, Int32)

Looking deeper into this error and testing our built CoreRun.exe and the hello world console app locally, we were able to get the following error and stack trace:

(32cc.44a8): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** WARNING: Unable to verify checksum for C:\Users\parkerbibus\Desktop\TestPayload\Core_Root\clrjit.dll
clrjit!getLikelyClasses+0x18134:
00007ffb`153bdaf4 b86e78ea ldr         w10,[x7,x14 lsl #2]
 # Child-SP          RetAddr               Call Site
00 (Inline Function) --------`--------     clrjit!LinearScan::RegisterSelection::calculateCoversSets+0xb8 [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 11546] 
01 (Inline Function) --------`--------     clrjit!LinearScan::RegisterSelection::try_COVERS+0xb8 [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 11096] 
02 (Inline Function) --------`--------     clrjit!LinearScan::RegisterSelection::select+0x890 [D:\a\_work\1\s\src\coreclr\jit\lsra_score.h @ 27] 
03 000000ba`d61ca990 00007ffb`153c01a0     clrjit!LinearScan::allocateReg+0x8bc [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 2754] 
04 000000ba`d61caa50 00007ffb`153bc1b8     clrjit!LinearScan::allocateRegisters+0xd08 [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 5381] 
05 000000ba`d61cab40 00007ffb`1532eccc     clrjit!LinearScan::doLinearScan+0x1c8 [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 1246] 
06 (Inline Function) --------`--------     clrjit!Compiler::compCompile::__l2::<lambda_27ba776e5f59c8b10b5aa52fbc87aa8b>::operator()+0x24 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 5059] 
07 000000ba`d61cab80 00007ffb`153ebdbc     clrjit!ActionPhase<<lambda_27ba776e5f59c8b10b5aa52fbc87aa8b> >::DoPhase+0x2c [D:\a\_work\1\s\src\coreclr\jit\phase.h @ 65] 
08 000000ba`d61cab90 00007ffb`1532c588     clrjit!Phase::Run+0x4c [D:\a\_work\1\s\src\coreclr\jit\phase.cpp @ 62] 
09 (Inline Function) --------`--------     clrjit!DoPhase+0x34 [D:\a\_work\1\s\src\coreclr\jit\phase.h @ 78] 
0a 000000ba`d61cabb0 00007ffb`1532d540     clrjit!Compiler::compCompile+0x830 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 5063] 
0b 000000ba`d61cae00 00007ffb`1532ce48     clrjit!Compiler::compCompileHelper+0x6d8 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 6729] 
0c 000000ba`d61caec0 00007ffb`1532dbdc     clrjit!Compiler::compCompile+0x4c8 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 5861] 
0d 000000ba`d61caf90 00007ffb`15335c58     clrjit!jitNativeCode+0x17c [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 7362] 
0e 000000ba`d61cb180 00007ffb`0f0609b8     clrjit!CILJit::compileMethod+0x78 [D:\a\_work\1\s\src\coreclr\jit\ee_il_dll.cpp @ 279] 
*** WARNING: Unable to verify checksum for C:\Users\parkerbibus\Desktop\TestPayload\Core_Root\coreclr.dll
0f (Inline Function) --------`--------     coreclr!invokeCompileMethodHelper+0x1f4 [D:\a\_work\1\s\src\coreclr\vm\jitinterface.cpp @ 12367] 
10 (Inline Function) --------`--------     coreclr!invokeCompileMethod+0x250 [D:\a\_work\1\s\src\coreclr\vm\jitinterface.cpp @ 12430] 
11 000000ba`d61cb1e0 00007ffb`0f0916dc     coreclr!UnsafeJitFunction+0x10c8 [D:\a\_work\1\s\src\coreclr\vm\jitinterface.cpp @ 12904] 
12 000000ba`d61cb600 00007ffb`0f0908dc     coreclr!MethodDesc::JitCompileCodeLocked+0x1ec [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 946] 
13 (Inline Function) --------`--------     coreclr!MethodDesc::JitCompileCodeLockedEventWrapper+0x850 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 817] 
14 (Inline Function) --------`--------     coreclr!MethodDesc::JitCompileCode+0xcfc [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 757] 
15 (Inline Function) --------`--------     coreclr!MethodDesc::PrepareILBasedCode+0xf68 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 426] 
16 000000ba`d61cb7a0 00007ffb`0f092c18     coreclr!MethodDesc::PrepareCode+0xf94 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 323] 
17 (Inline Function) --------`--------     coreclr!CodeVersionManager::PublishVersionableCodeIfNecessary+0x29c [D:\a\_work\1\s\src\coreclr\vm\codeversion.cpp @ 1698] 
18 000000ba`d61cba40 00007ffb`0f092714     coreclr!MethodDesc::DoPrestub+0x490 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 2106] 
19 000000ba`d61ce6a0 00007ffb`0efd11f4     coreclr!PreStubWorker+0x1c4 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 1932] 
1a 000000ba`d61ce790 00007ffb`0cd49200     coreclr!ThePreStub+0x50 [D:\a\_work\1\s\artifacts\obj\coreclr\windows.arm64.Release\vm\wks\AsmHelpers.asm @ 5028] 
*** WARNING: Unable to verify checksum for C:\Users\parkerbibus\Desktop\TestPayload\Core_Root\System.Private.CoreLib.dll
1b 000000ba`d61ce8c0 00007ffb`0cd5fe84     System_Private_CoreLib+0x209200
1c 000000ba`d61ce8e0 00007ffb`0efd1f34     System_Private_CoreLib+0x21fe84
1d 000000ba`d61ce940 00007ffb`0f0e0a60     coreclr!CallDescrWorkerInternal+0x84 [D:\a\_work\1\s\artifacts\obj\coreclr\windows.arm64.Release\vm\wks\CallDescrWorkerARM64.asm @ 4539] 
1e (Inline Function) --------`--------     coreclr!CallDescrWorkerWithHandler+0x34 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 67] 
1f 000000ba`d61ce960 00007ffb`0f00f9c0     coreclr!MethodDescCallSite::CallTargetWorker+0x2e8 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 568] 
20 (Inline Function) --------`--------     coreclr!MethodDescCallSite::Call+0x14 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.h @ 458] 
21 000000ba`d61cebb0 00007ffb`0f36a394     coreclr!CorHost2::CreateAppDomainWithManager+0x1f0 [D:\a\_work\1\s\src\coreclr\vm\corhost.cpp @ 630] 
22 000000ba`d61cf000 00007ff6`107d5588     coreclr!coreclr_initialize+0x4c4 [D:\a\_work\1\s\src\coreclr\dlls\mscoree\exports.cpp @ 254] 
23 000000ba`d61cf0e0 00007ff6`107d6778     corerun+0x5588
24 000000ba`d61cf670 00007ff6`107e65e0     corerun+0x6778
25 000000ba`d61cf780 00007ff6`107e6684     corerun!GetCurrentClrDetails+0xfe40
26 000000ba`d61cf7c0 00007ffb`744584a8     corerun!GetCurrentClrDetails+0xfee4
27 000000ba`d61cf7d0 00007ffb`74643068     KERNEL32!BaseThreadInitThunk+0x38
28 000000ba`d61cf810 00000000`00000000     ntdll!RtlUserThreadStart+0x48

Run with this error can be found here: https://dev.azure.com/dnceng/internal/_build/results?buildId=1765074&view=results, on either of the windows runs.

Reproduction Steps

Using the latest dotnet SDK on a Windows ARM64 machine:

Expected behavior

App should print hello world.

Actual behavior

Failure occurs, printing:

[2022/05/09 06:29:58][INFO] Process terminated. Infinite recursion during resource lookup within System.Private.CoreLib.  This may be a bug in System.Private.CoreLib, or potentially in certain extensibility points such as assembly resolve events or CultureInfo names.  Resource name: Arg_AccessViolationException
[2022/05/09 06:29:58][INFO]    at System.Environment.FailFast(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.InternalGetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.GetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.AccessViolationException..ctor()
[2022/05/09 06:29:58][INFO]    at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
[2022/05/09 06:29:58][INFO]    at System.String.Ctor(Char*)
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureData.GetLocaleInfoEx(System.String, UInt32)
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.GetUserDefaultLocaleName()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo..cctor()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.get_CachedCulturesByName()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.GetCultureInfo(System.String)
[2022/05/09 06:29:58][INFO]    at System.Resources.ManifestBasedResourceGroveler.GetNeutralResourcesLanguage(System.Reflection.Assembly, System.Resources.UltimateResourceFallbackLocation ByRef)
[2022/05/09 06:29:58][INFO]    at System.Resources.ResourceManager.CommonAssemblyInit()
[2022/05/09 06:29:58][INFO]    at System.SR.get_ResourceManager()
[2022/05/09 06:29:58][INFO]    at System.SR.InternalGetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.GetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.AccessViolationException..ctor()
[2022/05/09 06:29:58][INFO]    at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
[2022/05/09 06:29:58][INFO]    at System.String.Ctor(Char*)
[2022/05/09 06:29:58][INFO]    at System.AppContext.Setup(Char**, Char**, Int32)

Regression?

This last worked with runtime commit 3e5517be and first failed with f8fa9f6d, this was made up of these PR’s from what I could tell: https://github.com/dotnet/runtime/pull/67771, https://github.com/dotnet/runtime/pull/69013, and https://github.com/dotnet/runtime/pull/68748.

Known Workarounds

No response

Configuration

This occurs on Windows ARM64 machines when running with the latest dotnet version. This does not seem to be an issue on non-Windows ARM64 configurations.

Other information

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 31 (31 by maintainers)

Most upvoted comments

TLDR

Because of C++ optimization, the code in genLog2() becomes functionally incorrect which leads to AV. It transforms the code:

unsigned lo32 = ulo32(value);  // == 0 for input 0x100000000
unsigned hi32 = uhi32(value); // == 1 for input 0x100000000

if (lo32 != 0) {
    return genLog2(lo32);
} else {
    return genLog2(hi32) + 32;  // returns 32
}

into

lo32 = ...  ; // == 0 for input 0x100000000
hi32 = ... ; // == 1 for input 0x100000000

if (hi32 == 0) {
  return genLog32(hi32) + 32;
} else {
 return genLog32(lo32);  // returns 0xFF
}

Details

Consider the genRegNumFromMask() method which invokes genLog2() method with value= 0x100000000.

https://github.com/dotnet/runtime/blob/88fb9667ca71505ccdbcbdfe0dec1208873725b4/src/coreclr/jit/compiler.hpp#L521-L534

Pasting here for quick reference:

genLog2() and surrounding code
// Given an unsigned 64-bit value, returns the lower 32-bits in unsigned format
//
inline unsigned ulo32(unsigned __int64 value)
{
    return static_cast<unsigned>(value);
}

// Given an unsigned 64-bit value, returns the upper 32-bits in unsigned format
//
inline unsigned uhi32(unsigned __int64 value)
{
    return static_cast<unsigned>(value >> 32);
}

/*****************************************************************************
 *
 *  Given a value that has exactly one bit set, return the position of that
 *  bit, in other words return the logarithm in base 2 of the given value.
 */

inline unsigned genLog2(unsigned __int64 value)
{
    unsigned lo32 = ulo32(value);
    unsigned hi32 = uhi32(value);

    if (lo32 != 0)
    {
        assert(hi32 == 0);
        return genLog2(lo32);
    }
    else
    {
        return genLog2(hi32) + 32;
    }
}

Here is the assembly code of the relevant portion:

...
 ; This code loads the magic number for % operation. See https://godbolt.org/z/o7hEq4feK 
00007ffe`8998da70 529229a2 mov         w2,#0x914D
00007ffe`8998da74 72b759e2 movk        w2,#0xBACF,lsl #0x10  

00007ffe`8998da78 b9407b26 ldr         w6,[x25,#0x78]
00007ffe`8998da7c b9405325 ldr         w5,[x25,#0x50]
00007ffe`8998da80 f9402724 ldr         x4,[x25,#0x48]
00007ffe`8998da84 f94017fb ldr         x27,[sp,#0x28]

; coversCandidates &= ~coversCandidateBit;
00007ffe`8998da88 cb0f03e8 neg         x8,x15
00007ffe`8998da8c ea0f010d ands        x13,x8,x15
00007ffe`8998da90 aa2d03e9 mvn         x9,x13             ; x13 = 0x100000000
00007ffe`8998da94 8a0f012f and         x15,x9,x15

; ------ START OF genLog2(). x13= 0x100000000 (value of coversCandidateBit)
; It directly does `value >> 32` operation which is part of uhi32() method
; and uses its result to check if it is 0 or not.
00007ffe`8998da98 d360fdae lsr         x14,x13,#0x20      ;  x14 becomes 1
00007ffe`8998da9c 54000140 beq         clrjit!LinearScan::allocateReg+0x88c (00007ffe`8998dac4)
00007ffe`8998daa0 9ba27da8 umull       x8,w13,w2
00007ffe`8998daa4 d360fd09 lsr         x9,x8,#0x20
00007ffe`8998daa8 4b0901aa sub         w10,w13,w9
00007ffe`8998daac 0b4a0528 add         w8,w9,w10,lsr #1
00007ffe`8998dab0 53057d09 lsr         w9,w8,#5
00007ffe`8998dab4 528004a8 mov         w8,#0x25
00007ffe`8998dab8 1b08b52c msub        w12,w9,w8,w13 ; <--- w12= 0

; Below instruction loads the element hashTable[hash], where hash's value is in w12==0
; x21 is the address of hashTable (see contents below)
; Here we load hashTable[0] which is 0xff.
00007ffe`8998dabc 38ec4aa8 ldrsb       w8,[x21,w12 uxtw #0]   ; x21= 00007ffe89a1e5a8
00007ffe`8998dac0 1400000a b           clrjit!LinearScan::allocateReg+0x8b0 (00007ffe`8998dae8)
00007ffe`8998dac4 9ba27dc8 umull       x8,w14,w2            ; <--- else part starts
00007ffe`8998dac8 d360fd09 lsr         x9,x8,#0x20
00007ffe`8998dacc 4b0901ca sub         w10,w14,w9
00007ffe`8998dad0 0b4a0528 add         w8,w9,w10,lsr #1
00007ffe`8998dad4 53057d09 lsr         w9,w8,#5
00007ffe`8998dad8 528004a8 mov         w8,#0x25
00007ffe`8998dadc 1b08b92c msub        w12,w9,w8,w14
00007ffe`8998dae0 38ec4aa8 ldrsb       w8,[x21,w12 uxtw #0]
00007ffe`8998dae4 11008108 add         w8,w8,#0x20           ; + 32 present in expr `genLog2(hi32) + 32`.
; ------ END OF genLog2()
00007ffe`8998dae8 d3407d0c uxtw        x12,w8
...

Contents of hashTable https://github.com/dotnet/runtime/blob/88fb9667ca71505ccdbcbdfe0dec1208873725b4/src/coreclr/inc/bitposition.h#L34-L40

hashtable:
0:000> db /c 4 00007ffe89a1e5a8
00007ffe`89a1e5a8  ff 00 01 1a  ....
00007ffe`89a1e5ac  02 17 1b ff  ....
00007ffe`89a1e5b0  03 10 18 1e  ....
00007ffe`89a1e5b4  1c 0b ff 0d  ....
00007ffe`89a1e5b8  04 07 11 ff  ....
00007ffe`89a1e5bc  19 16 1f 0f  ....
00007ffe`89a1e5c0  1d 0a 0c 06  ....
00007ffe`89a1e5c4  ff 15 0e 09  ....
00007ffe`89a1e5c8  05 14 08 13  ....
00007ffe`89a1e5cc  12

Next, we try to access nextIntervalRef with returned value from genLog2() which is 0xffffff and resulting in AV. I will follow up with C++ team.