runtime: Reloc failures with NativeAOT on Apple Silicon

I am trying to enable NativeAOT on OSX arm64. With this patch https://github.com/dotnet/runtime/compare/main...am11:feature/nativeaot/osx-arm64 (tested with both @GOTPAGE and @PAGE assembler directives), it builds the nupkg. Consuming that package results in the following errors during the ilc step:

# with `<add key="TestSource" value="/Users/am11/projects/runtime/artifacts/packages/Release/Shipping" />`
# in NuGet.config
$ dotnet nuget locals all --clear && rm -rf obj bin && dotnet publish --use-current-runtime -v:diag ...
... snip ...
21:06:05.007   1:7>Target "IlcCompile: (TargetId:181)" in file "/Users/am11/.nuget/packages/microsoft.dotnet.ilcompiler/7.0.0-dev/build/Microsoft.NETCore.Native.targets" from project "/Users/am11/projects/naot1/naot1.csproj" (target "LinkNative" depends on it):
                   Building target "IlcCompile" completely.
                   Output file "obj/release/net7.0/osx-arm64/native/naot1.o" does not exist.
                   Task "Message" skipped, due to false condition; ($(_BuildingInCompatibleMode) != 'true') was evaluated as (true != 'true').
                   Task "Message" (TaskId:126)
                     Task Parameter:Text=Generating compatible native code. To optimize for size or speed, visit https://aka.ms/OptimizeCoreRT (TaskId:126)
                     Task Parameter:Importance=high (TaskId:126)
                     Generating compatible native code. To optimize for size or speed, visit https://aka.ms/OptimizeCoreRT (TaskId:126)
                   Done executing task "Message". (TaskId:126)
                   Task "Exec" (TaskId:127)
                     Task Parameter:Command="/Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/tools/ilc" @"obj/release/net7.0/osx-arm64/native/naot1.ilc.rsp" (TaskId:127)
                     "/Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/tools/ilc" @"obj/release/net7.0/osx-arm64/native/naot1.ilc.rsp" (TaskId:127)
                     <unknown>:0: error: ADR/ADRP relocations must be GOT relative (TaskId:127)
                     <unknown>:0: error: unknown AArch64 fixup kind! (TaskId:127)
                     <unknown>:0: error: unknown AArch64 fixup kind! (TaskId:127)
                     <unknown>:0: error: fixup value out of range (TaskId:127)
                     <unknown>:0: error: ADR/ADRP relocations must be GOT relative (TaskId:127)
                     <unknown>:0: error: unknown AArch64 fixup kind! (TaskId:127)
                     <unknown>:0: error: unknown AArch64 fixup kind! (TaskId:127)
                     <unknown>:0: error: fixup value out of range (TaskId:127)
... repeats 1000s of times ...

somewhere after the objwriter has succeeded: https://github.com/dotnet/runtime/blob/071e772d9d3bd8b50a5380bce6214277a1e61c98/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/ObjectWriter.cs#L1183 and before the clang command is executed. While the ilc task does not fail, MSBuild fails on the clang step:

                 Set Property: _IgnoreLinkerWarnings=false
                   Set Property: _IgnoreLinkerWarnings=true
                   Task "Exec" (TaskId:129)
                     Task Parameter:IgnoreStandardErrorWarningFormat=True (TaskId:129)
                     Task Parameter:Command=clang "obj/release/net7.0/osx-arm64/native/naot1.o" -o "bin/release/net7.0/osx-arm64/native/naot1" /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/sdk/libbootstrapper.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/sdk/libRuntime.WorkstationGC.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.Native.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.Globalization.Native.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.IO.Compression.Native.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.Net.Security.Native.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.Security.Cryptography.Native.Apple.a -g -Wl,-rpath,'@executable_path' -lstdc++ -ldl -lm -lz -licucore -framework CoreFoundation -framework Foundation -framework Security -framework GSS (TaskId:129)
                     clang "obj/release/net7.0/osx-arm64/native/naot1.o" -o "bin/release/net7.0/osx-arm64/native/naot1" /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/sdk/libbootstrapper.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/sdk/libRuntime.WorkstationGC.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.Native.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.Globalization.Native.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.IO.Compression.Native.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.Net.Security.Native.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.Security.Cryptography.Native.Apple.a -g -Wl,-rpath,'@executable_path' -lstdc++ -ldl -lm -lz -licucore -framework CoreFoundation -framework Foundation -framework Security -framework GSS (TaskId:129)
                     ld: malformed __LD,__compact_unwind section, bad length file 'obj/release/net7.0/osx-arm64/native/naot1.o' (TaskId:129)
                     clang: error: linker command failed with exit code 1 (use -v to see invocation) (TaskId:129)
21:06:12.873   1:7>/Users/am11/.nuget/packages/microsoft.dotnet.ilcompiler/7.0.0-dev/build/Microsoft.NETCore.Native.targets(337,5): error MSB3073: The command "clang "obj/release/net7.0/osx-arm64/native/naot1.o" -o "bin/release/net7.0/osx-arm64/native/naot1" /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/sdk/libbootstrapper.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/sdk/libRuntime.WorkstationGC.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.Native.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.Globalization.Native.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.IO.Compression.Native.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.Net.Security.Native.a /Users/am11/.nuget/packages/runtime.osx-arm64.microsoft.dotnet.ilcompiler/7.0.0-dev/framework/libSystem.Security.Cryptography.Native.Apple.a -g -Wl,-rpath,'@executable_path' -lstdc++ -ldl -lm -lz -licucore -framework CoreFoundation -framework Foundation -framework Security -framework GSS" exited with code 1. [/Users/am11/projects/naot1/naot1.csproj]
                   Done executing task "Exec" -- FAILED. (TaskId:129)
21:06:12.873   1:7>Done building target "LinkNative" in project "naot1.csproj" -- FAILED.: (TargetId:182)

With objdump, that __LD,__compact_unwind section looks like:

Disassembly of section __LD,__compact_unwind:

00000000003b2858 <ltmp8>:
  3b2858: 40 4b 00 00   udf     #19264
  3b285c: 00 00 00 00   udf     #0
  3b2860: 74 00 00 00   udf     #116
  3b2864: 00 00 00 03   <unknown>
                ...
  3b2874: c0 4b 00 00   udf     #19392
  3b2878: 00 00 00 00   udf     #0
  3b287c: 74 00 00 00   udf     #116
  3b2880: 00 00 00 03   <unknown>
                ...
  3b2890: 40 4c 00 00   udf     #19520
  3b2894: 00 00 00 00   udf     #0
  3b2898: 74 00 00 00   udf     #116
  3b289c: 00 00 00 03   <unknown>
                ...
  3b28ac: c0 4c 00 00   udf     #19648
  3b28b0: 00 00 00 00   udf     #0
  3b28b4: 74 00 00 00   udf     #116
  3b28b8: 00 00 00 03   <unknown>
... repeats ...

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 4
  • Comments: 103 (103 by maintainers)

Most upvoted comments

Pushed a commit to objwriter (https://github.com/dotnet/llvm-project/pull/185/commits/7280b550bb8ac3cce72a1ee288dc744b1ab9b1b6) which fixes type 15 error. After that dotnet publish produced the binary successfully but it does not print Hello World! yet. 😃

 % lldb bin/release/net7.0/osx-arm64/publish/naot1                
Added Microsoft public symbol server
Added symbol directory path: /usr/local/share/dotnet/shared/Microsoft.NETCore.App/6.0.2
Added symbol directory path: /usr/local/share/dotnet/packs/Microsoft.NETCore.App.Host.osx-arm64/6.0.2/runtimes/osx-arm64/native
(lldb) target create "../naot1/bin/release/net7.0/osx-arm64/publish/naot1"
Current executable set to '/Users/am11/projects/naot1/bin/release/net7.0/osx-arm64/publish/naot1' (arm64).
(lldb) r
Process 38291 launched: '/Users/am11/projects/naot1/bin/release/net7.0/osx-arm64/publish/naot1' (arm64)
Process 38291 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x100460988)
    frame #0: 0x0000000100460988 naot1`tls_CurrentThread
naot1`tls_CurrentThread:
->  0x100460988 <+0>:  ldp    x16, x17, [x9, #-0xd0]
    0x10046098c <+4>:  udf    #0x1
    0x100460990 <+8>:  udf    #0x102
    0x100460994 <+12>: udf    #0x0
Target 0: (naot1) stopped.
(lldb) bt all
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x100460988)
  * frame #0: 0x0000000100460988 naot1`tls_CurrentThread
    frame #1: 0x00000001001a6db0 naot1`InitializeModules + 80
    frame #2: 0x0000000100006438 naot1`main [inlined] InitializeRuntime() at main.cpp:169:5 [opt]
    frame #3: 0x00000001000063ac naot1`main(argc=1, argv=0x000000016fdff798) at main.cpp:201:19 [opt]
    frame #4: 0x0000000100c350f4 dyld`start + 520
  thread #2
    frame #0: 0x00000001a96f1eac libsystem_kernel.dylib`mach_absolute_time + 108
    frame #1: 0x00000001a96f3838 libsystem_kernel.dylib`__commpage_gettimeofday_internal + 44
    frame #2: 0x00000001a95f9534 libsystem_c.dylib`gettimeofday + 52
    frame #3: 0x000000010004fcb4 naot1`::QueryPerformanceCounter(lpPerformanceCount=0x000000016fe86f68) at PalRedhawkUnix.cpp:1090:9 [opt]
    frame #4: 0x0000000100012090 naot1`EnsureYieldProcessorNormalizedInitialized() [inlined] PalQueryPerformanceCounter(arg1=0x000000016fe86f68) at PalRedhawkFunctions.h:131:12 [opt]
    frame #5: 0x0000000100012088 naot1`EnsureYieldProcessorNormalizedInitialized() at yieldprocessornormalized.cpp:76:9 [opt]
    frame #6: 0x0000000100012024 naot1`EnsureYieldProcessorNormalizedInitialized() at yieldprocessornormalized.cpp:118:9 [opt]
    frame #7: 0x000000010000801c naot1`FinalizerStart(pContext=0x0000600003000090) at FinalizerHelpers.cpp:54:5 [opt]
    frame #8: 0x00000001a972d240 libsystem_pthread.dylib`_pthread_start + 148

You may be seeing #75298

Yep, that matches what I get. Not every time though.

this assertion is failing:

* thread #87, stop reason = hit program assert
    frame #4: 0x00000001000818bc UnitTests`WKS::gc_heap::background_promote_callback(ppObject=0x000000017018eb90, sc=0x0000000170fc6a00, flags=1) at gc.cpp:35589:5
   35586	    UNREFERENCED_PARAMETER(sc);
   35587	    //in order to save space on the array, mark the object,
   35588	    //knowing that it will be visited later
-> 35589	    assert (settings.concurrent);
   35590	
   35591	    THREAD_NUMBER_FROM_CONTEXT;
   35592	#ifndef MULTIPLE_HEAPS
Target 0: (UnitTests) stopped.

Disabling FEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP fixed it and a few others.

$ src/tests/run.sh --runnativeaottests Debug
...
Time [secs] | Total | Passed | Failed | Skipped | Assembly Execution Summary
============================================================================
      2.193 |    13 |      4 |      9 |       0 | nativeaot.SmokeTests.XUnitWrapper.dll
----------------------------------------------------------------------------
      2.193 |    13 |      4 |      9 |       0 | (total)

I got way further (eg. printing “Hello World” works) but exceptions failed to unwind. That’s why I started poking into it.

Right, it says to subtract an address from where the reloc is pointing to. It should be exactly what we need here.

Here is an example how LLVM creates relative relocs using the subtractor: https://github.com/llvm/llvm-project/blob/6c9f6812523a706c11a12e6cb4119b0cf67bbb21/lld/MachO/EhFrame.cpp#L108-L130

Ok, looks like we have a case of a wrong relocation. When we’re in GCHandle__get_IsAllocated, this in x0 is 0x00000001004c3618, but from the memory map you posted above, that address is part of naot1.__DATA_CONST.__got. So we’re in global offset table, but the code doesn’t expect having to dereference the pointer (GOT table is a table of indirections).

So object writer generated a GOT relocation for something that should just store the address of the destination directly. GOT requires an extra dereference to access what the reloc points to.

This might be responsible for the problem: https://github.com/dotnet/llvm-project/pull/185/files#diff-3dd92a728cef8bf36a3e8104cbfcf2b9b901abff46340d48f5cf0a02d3274a2aR455-R456

That looks related to the TLS access. We’re here:

https://github.com/dotnet/runtime/blob/9b2e2a830a4e2e67c920aa200329533baba5c363/src/coreclr/nativeaot/Runtime/arm64/AllocFast.S#L193-L213

My suspicion is that INLINE_GETTHREAD loaded a bogus address into x3. It’s supposed to load the tls_CurrentThread thread-local static.

https://github.com/dotnet/runtime/blob/9b2e2a830a4e2e67c920aa200329533baba5c363/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosarm64.inc#L215-L217

I would put a breakpoint here:

https://github.com/dotnet/runtime/blob/9b2e2a830a4e2e67c920aa200329533baba5c363/src/coreclr/nativeaot/Runtime/threadstore.inl#L6-L10

and see what value the variable has (and how the compiler got to it in assembly). Then compare with what INLINE_GETTHREAD came up with. (Make sure you’re looking at the same thread, we already have the finalizer thread running at this point in startup).

Digging in Apple’s source code, the error seems to be:

https://github.com/apple-oss-distributions/ld64/blob/dbf8f7feb5579761f1623b004bd468bdea7c6225/src/ld/parsers/macho_relocatable_file.cpp#L5631-L5632

the size of __compact_unwind section is not divisible by the size of the unwind entry. That’s odd because we don’t generate the Apple weird thing, we generate DWARF CFI.

However, looking at LLVM source code, I think this is kicking in:

https://github.com/dotnet/llvm-project/blob/f1120a92d05f1c57e75af7d16504012570ef3409/llvm/lib/MC/MCObjectFileInfo.cpp#L30-L32

And LLVM does generate something on our behalf. Probably broken from the sound of it.

I would dig around that - can we still make do without a __compact_unwind section? If it’s not present in the executable, maybe ld would still do the right thing and convert it from CFI to the compact unwind scheme for us.

Maybe the right thing would be to start generating compact unwinding because Apple tends to unceremoniously cut off things they don’t like anymore after a couple years of supporting both the thing they stopped liking and the new shiny thing. Unwinding codes are currently generated in RyuJIT.

Ah, so those messages are still generated by the object writer in ILC.

For example here: https://github.com/dotnet/llvm-project/blob/f1120a92d05f1c57e75af7d16504012570ef3409/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MachObjectWriter.cpp#L102-L103.

Looks like we need to decide what kind of relocation to generate when we’re generating it.