diagnostics: dotnet dump analyze's `gcroot -all
` crashes on arm64

Description

I created a hello-world ASP.NET Core application (literally dotnet new web; dotnet build; dotnet bin/Debug/net*/App.dll), then created a dump via dotnet-dump.

I then used dotnet-dump analyze to examine it:

$ dotnet dump analyze coredump.34662 --command dso --command exit                                                                              
Loading core dump: coredump.34662 ...                                                                                                          
OS Thread Id: 0x8766 (0)                     
SP/REG           Object           Name                                                  
x14              0000ffbf65437160 System.Object                                   
0000FFFFD8B0D480 0000ffbf65437160 System.Object                                         
0000FFFFD8B0D560 0000ffbf65437160 System.Object     
0000FFFFD8B0D700 0000ffbf65437160 System.Object                              
0000FFFFD8B0D708 0000ffbf65437050 System.Threading.Tasks.Task+SetOnInvokeMres
0000FFFFD8B0D770 0000ffbf65437050 System.Threading.Tasks.Task+SetOnInvokeMres       
0000FFFFD8B0D7A0 0000ffbf65437050 System.Threading.Tasks.Task+SetOnInvokeMres
0000FFFFD8B0D7A8 0000ffbf65436f78 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
0000FFFFD8B0D7C0 0000ffbf65436f78 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
0000FFFFD8B0D7D8 0000ffbf64806bb8 System.Threading.Tasks.TplEventSource
0000FFFFD8B0D800 0000ffbf65436f78 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
0000FFFFD8B0D878 0000ffbf65436f78 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
0000FFFFD8B0D890 0000ffbf6609b808 Program+<>c                 
0000FFFFD8B0D898 0000ffbf6609b820 System.Func`1[[System.String, System.Private.CoreLib]]                 
0000FFFFD8B0D8A0 0000ffbf6609e200 Microsoft.AspNetCore.Builder.RouteHandlerBuilder
0000FFFFD8B0D8A8 0000ffbf6609b820 System.Func`1[[System.String, System.Private.CoreLib]]
0000FFFFD8B0D8B0 0000ffbf66010180 System.String    /
0000FFFFD8B0D8B8 0000ffbf660512e0 Microsoft.AspNetCore.Builder.WebApplication
0000FFFFD8B0D8C8 0000ffbf660512e0 Microsoft.AspNetCore.Builder.WebApplication
0000FFFFD8B0D8D0 0000ffbf660101d0 Microsoft.AspNetCore.Builder.WebApplicationBuilder
0000FFFFD8B0D8D8 0000ffbf660512e0 Microsoft.AspNetCore.Builder.WebApplication
0000FFFFD8B0D8E0 0000ffbf660101d0 Microsoft.AspNetCore.Builder.WebApplicationBuilder
0000FFFFD8B0D8E8 0000ffbf6600efa0 System.String[]
0000FFFFD8B0DA00 0000ffbf6600efa0 System.String[]
0000FFFFD8B0DBF8 0000ffbf6600efa0 System.String[]
0000FFFFD8B0DC20 0000ffbf6600efa0 System.String[]
0000FFFFD8B0DD90 0000ffbf6600efa0 System.String[]
0000FFFFD8B0E068 0000ffbf6600efa0 System.String[]
$ dotnet dump analyze coredump.34662 --command 'gcroot -all 0000ffbf65437050' --command exit

This crashes.

Backtrace from lldb:

Process 37613 stopped                                                  
* thread #1, name = 'dotnet-dump', stop reason = signal SIGSEGV: invalid address (fault address: 0xffffffffffffffff)
    frame #0: 0x0000ffbec3e4aa9c libmscordaccore.so`DacStackReferenceWalker::GCEnumCallbackSOS(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION) + 64 libmscordaccore.so`DacStackReferenceWalker::GCEnumCallbackSOS:
->  0xffbec3e4aa9c <+64>: ldr    x22, [x21]
    0xffbec3e4aaa0 <+68>: mov    x21, xzr
    0xffbec3e4aaa4 <+72>: tbnz   w19, #0x0, 0xffbec3e4aaf4 ; <+152>                                                                                          
    0xffbec3e4aaa8 <+76>: b      0xffbec3e4ab20            ; <+196>
(lldb) bt   
* thread #1, name = 'dotnet-dump', stop reason = signal SIGSEGV: invalid address (fault address: 0xffffffffffffffff)                                          
  * frame #0: 0x0000ffbec3e4aa9c libmscordaccore.so`DacStackReferenceWalker::GCEnumCallbackSOS(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION) + 64 
    frame #1: 0x0000ffbec3e61150 libmscordaccore.so`GcInfoDecoder::EnumerateLiveSlots(REGDISPLAY*, bool, unsigned int, void (*)(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION), void*) + 4984
    frame #2: 0x0000ffbec3e37720 libmscordaccore.so`EECodeManager::EnumGcRefs(REGDISPLAY*, EECodeInfo*, unsigned int, void (*)(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION), void*, unsigned int) + 280
    frame #3: 0x0000ffbec3e4b3f4 libmscordaccore.so`DacStackReferenceWalker::Callback(CrawlFrame*, void*) + 352                                              
    frame #4: 0x0000ffbec3e31820 libmscordaccore.so`Thread::StackWalkFramesEx(REGDISPLAY*, StackWalkAction (*)(CrawlFrame*, void*), void*, unsigned int, __VPtr<Frame>) + 372
    frame #5: 0x0000ffbec3e31b8c libmscordaccore.so`Thread::StackWalkFrames(StackWalkAction (*)(CrawlFrame*, void*), void*, unsigned int, __VPtr<Frame>) + 13$
    frame #6: 0x0000ffbec3e4bd3c libmscordaccore.so`unsigned int DacStackReferenceWalker::WalkStack<unsigned int, _SOS_StackRefData>(unsigned int, _SOS_StackRefData*, void (*)(__DPtr<__DPtr<Object> >, ScanContext*, unsigned int), void (*)(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION)) + 212            
    frame #7: 0x0000ffbec3e4a6bc libmscordaccore.so`DacStackReferenceWalker::GetCount(unsigned int*) + 180                                                   
    frame #8: 0x0000ffbec8218900 libsos.so`___lldb_unnamed_symbol895 + 152
    frame #9: 0x0000ffbec81ce6a4 libsos.so`___lldb_unnamed_symbol391 + 88
    frame #10: 0x0000ffbec81cc384 libsos.so`___lldb_unnamed_symbol371 + 236
    frame #11: 0x0000ffbec81cbe5c libsos.so`___lldb_unnamed_symbol368 + 408
    frame #12: 0x0000ffbec81f6424 libsos.so`GCRoot + 432
    frame #13: 0x0000ffff73b12a2c
    frame #14: 0x0000ffff73b1286c
    frame #15: 0x0000ffff73b12728
    frame #16: 0x0000ffff73b123ac
    frame #17: 0x0000ffffb21f2d74 libcoreclr.so`CallDescrWorkerInternal + 132
    frame #18: 0x0000ffffb204d8f0 libcoreclr.so`CallDescrWorkerWithHandler(CallDescrData*, int) + 132                                                        
    frame #19: 0x0000ffffb20f2ff4 libcoreclr.so`RuntimeMethodHandle::InvokeMethod(Object*, void**, SignatureNative*, bool) + 1672                            
    frame #20: 0x0000ffff711291dc
    frame #21: 0x0000ffff7113e2f8
    frame #22: 0x0000ffff73b120ec
    frame #23: 0x0000ffff71b007b8
    frame #24: 0x0000ffff71b00700
    frame #25: 0x0000ffff71aec5ec
    frame #26: 0x0000ffff71aec0bc
    frame #27: 0x0000ffff71aebab8
    frame #28: 0x0000ffff71aeaf54
    frame #29: 0x0000ffff71ade610
    frame #30: 0x0000ffff71ac4b9c
    frame #31: 0x0000ffffb21f2d74 libcoreclr.so`CallDescrWorkerInternal + 132
    frame #32: 0x0000ffffb204d8f0 libcoreclr.so`CallDescrWorkerWithHandler(CallDescrData*, int) + 132
    frame #33: 0x0000ffffb20f2ff4 libcoreclr.so`RuntimeMethodHandle::InvokeMethod(Object*, void**, SignatureNative*, bool) + 1672
    frame #34: 0x0000ffff711291dc
    frame #35: 0x0000ffff7113e4b0
    frame #36: 0x0000ffff70f08d50
    frame #37: 0x0000ffff71ac0f8c
    frame #38: 0x0000ffff71ac0bfc
    frame #39: 0x0000ffff71ac0b48
    frame #40: 0x0000ffff71ac0ae4
    frame #41: 0x0000ffff71ac08c0
    frame #42: 0x0000ffff71ac06e4
    frame #43: 0x0000ffff71ac0630
    frame #44: 0x0000ffff71ac05d0
    frame #45: 0x0000ffff71abc0d4
    frame #46: 0x0000ffff71ac03a4
    frame #47: 0x0000ffff71ac01d4
    frame #48: 0x0000ffff71ac0120
    frame #49: 0x0000ffff71ac00c0
    frame #50: 0x0000ffff71abc0d4
    frame #51: 0x0000ffff71abfd88
    frame #52: 0x0000ffff71abfc14
    frame #53: 0x0000ffff71abfb60
    frame #54: 0x0000ffff71abfb00
    frame #55: 0x0000ffff71abc0d4
    frame #56: 0x0000ffff71abf694
    frame #57: 0x0000ffff71abf484
    frame #58: 0x0000ffff71abf3d0
    frame #59: 0x0000ffff71abf370
    frame #60: 0x0000ffff71abc0d4
    frame #61: 0x0000ffff71abf13c
    frame #62: 0x0000ffff71abeea4
    frame #63: 0x0000ffff71abedf0
    frame #64: 0x0000ffff71abed90
    frame #65: 0x0000ffff71abc0d4
    frame #66: 0x0000ffff71abeb90                             
    frame #67: 0x0000ffff71abe8dc                                              
    frame #68: 0x0000ffff71abe828                                                                                                                            
    frame #69: 0x0000ffff71abe7c8                        
    frame #70: 0x0000ffff71abc0d4                                   
    frame #71: 0x0000ffff71abe5d0   
    frame #72: 0x0000ffff71abe3f4
    frame #73: 0x0000ffff71abe340
    frame #74: 0x0000ffff71abe2e0
    frame #75: 0x0000ffff71abc0d4
    frame #76: 0x0000ffff71abe084
    frame #77: 0x0000ffff71abdcd4
    frame #78: 0x0000ffff71abdc20
    frame #79: 0x0000ffff71abdbc0
    frame #80: 0x0000ffff71abc0d4
    frame #81: 0x0000ffff71abce70
    frame #82: 0x0000ffff71abcb94
    frame #83: 0x0000ffff71abcae0
    frame #84: 0x0000ffff71abca80
    frame #85: 0x0000ffff71abc0d4
    frame #86: 0x0000ffff71abc888
    frame #87: 0x0000ffff71abc0d4
    frame #88: 0x0000ffff71abc3a8
    frame #89: 0x0000ffff71abc284
    frame #90: 0x0000ffff71abc1d0
    frame #91: 0x0000ffff71abc170
    frame #92: 0x0000ffff71abc0d4
    frame #93: 0x0000ffff71abbce4
    frame #94: 0x0000ffff71abbacc
    frame #95: 0x0000ffff71abba18
    frame #96: 0x0000ffff71abb9b8
    frame #97: 0x0000ffff71abb8dc
    frame #98: 0x0000ffff71abb8dc
    frame #99: 0x0000ffff71abb8dc
    frame #100: 0x0000ffff71abb8dc
    frame #101: 0x0000ffff71abb8dc
    frame #102: 0x0000ffff71abb8dc
    frame #103: 0x0000ffff71abb8dc
    frame #104: 0x0000ffff71abb8dc
    frame #105: 0x0000ffff71abb8dc
    frame #106: 0x0000ffff71abb8dc
    frame #107: 0x0000ffff71abb8dc
    frame #108: 0x0000ffff71aba1cc
    frame #109: 0x0000ffff71ab9f5c
    frame #110: 0x0000ffff71ab9ea8
    frame #111: 0x0000ffff71ab9e44
    frame #112: 0x0000ffff71ab9b7c
    frame #113: 0x0000ffff71ab9a3c
    frame #114: 0x0000ffff71ab9988
    frame #115: 0x0000ffff71ab9924
    frame #116: 0x0000ffff71aa9d40
    frame #117: 0x0000ffff71aa9c04
    frame #118: 0x0000ffff71aa9b50
    frame #119: 0x0000ffff71aa9ab8
    frame #120: 0x0000ffff71a90f48
    frame #121: 0x0000ffff71a90d78
    frame #122: 0x0000ffffb21f2d74 libcoreclr.so`CallDescrWorkerInternal + 132
    frame #123: 0x0000ffffb204dfb8 libcoreclr.so`MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) + 816                       
    frame #124: 0x0000ffffb1f3fc8c libcoreclr.so`RunMain(MethodDesc*, short, int*, PtrArray**) + 756                                                         
    frame #125: 0x0000ffffb1f3ff6c libcoreclr.so`Assembly::ExecuteMainMethod(PtrArray**, int) + 408                                                          
    frame #126: 0x0000ffffb1f6a110 libcoreclr.so`CorHost2::ExecuteAssembly(unsigned int, char16_t const*, int, char16_t const**, unsigned int*) + 636        
    frame #127: 0x0000ffffb1f2cbac libcoreclr.so`coreclr_execute_assembly + 240
    frame #128: 0x0000ffffb24a4d70 libhostpolicy.so`run_app_for_context(hostpolicy_context_t const&, int, char const**) + 1368                               
    frame #129: 0x0000ffffb24a5084 libhostpolicy.so`run_app(int, char const**) + 72                                                                          
    frame #130: 0x0000ffffb24a5a0c libhostpolicy.so`corehost_main + 200
    frame #131: 0x0000ffffb253002c libhostfxr.so`fx_muxer_t::handle_exec_host_command(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, host_startup_info_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<known_options, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, known_options_hash, std::equal_to<known_options>, std::allocator<std::pair<known_options const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, int, char const**, int, host_mode_t, bool, char*, int, int*) + 1256                                                             
    frame #132: 0x0000ffffb252f2a4 libhostfxr.so`fx_muxer_t::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, char const**, host_startup_info_t const&, char*, int, int*) + 704
    frame #133: 0x0000ffffb252bab0 libhostfxr.so`hostfxr_main_startupinfo + 172
    frame #134: 0x0000aaaab08444a0 dotnet-dump`exe_start(int, char const**) + 1252                                                                           
    frame #135: 0x0000aaaab0844748 dotnet-dump`main + 144
    frame #136: 0x0000ffffb25b4384 libc.so.6`__libc_start_main + 220
    frame #137: 0x0000aaaab0837c14 dotnet-dump`_start + 52

The full test is here: https://github.com/redhat-developer/dotnet-regular-tests/blob/1b7774d6367751500e440a78909606cab0303a18/debugging-via-dotnet-dump/test.sh

Configuration

  • Is this related to a specific tool? dotnet dump analyze <corefile>, then dso and then gcroot -all object where object is the first non-System.Object from the output of dso. In my case it’s 0000FFFFD8B0D708 0000ffbf65437050 System.Threading.Tasks.Task+SetOnInvokeMres
  • What OS and version, and what distro if applicable? RHEL 8, arm64, .NET runtime built via source-build
  • What is the architecture (x64, x86, ARM, ARM64)? arm64
  • Do you know whether it is specific to that configuration? It doesn’t happen on x64
  • Are you running in any particular type of environment? (e.g. Containers, a cloud scenario, app you are trying to target is a different user) Running this in a freshly provision VM. .NET was self-built using source-build
  • Is it a self-contained published application? No
  • What’s the output of dotnet info
$ dotnet --info
.NET SDK:
 Version:   7.0.103
 Commit:    6359034b09

Runtime Environment:
 OS Name:     rhel
 OS Version:  8
 OS Platform: Linux
 RID:         rhel.8-arm64
 Base Path:   /usr/lib64/dotnet/sdk/7.0.103/

Host:
  Version:      7.0.3
  Architecture: arm64
  Commit:       0a2bda10e8

.NET SDKs installed:
  7.0.103 [/usr/lib64/dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 7.0.3 [/usr/lib64/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 7.0.3 [/usr/lib64/dotnet/shared/Microsoft.NETCore.App]

Other architectures found:
  None

Environment variables:
  DOTNET_ROOT       [/usr/lib64/dotnet]

global.json file:
  Not found

Learn more:
  https://aka.ms/dotnet/info

Download .NET:
  https://aka.ms/dotnet/download

–>

Regression?

IIRC, this works on x64 without any issues, just fails on arm64. It was working on arm64 in a previous release of .NET as well, though I am not sure whether that was .NET 6 or .NET 7.

Other information

cc @tmds @aslicerh

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18 (18 by maintainers)

Commits related to this issue

Most upvoted comments

Ok, saw your new comment on the PR. x64 should be fixed by another change.

Thanks!

So the big question here is why isn’t the catch block handling this? DacStackReferenceWalker::GetCount is wrapped in an SOS enter/leave:

https://github.com/dotnet/runtime/blob/main/src/coreclr/debug/daccess/daccess.cpp#L7859-L7873

Which is defined here:

https://github.com/dotnet/runtime/blob/main/src/coreclr/debug/daccess/dacimpl.h#L3984-L4004

We should be hitting that EX_END_CATCH(SwallowAllExceptions), and not bringing down the debugger.

Obviously it’s not good that something is causing the underlying crash (I’m working on a fix in .Net 8), but regardless, this should have manifested as a failed function call and not in bringing down the debugger.