runtime: GC suspension & UNIX signals corrupt each other's memory on the stack on macOS

Hi! I found an interesting issue with runtime suspension (aka PAL_InjectActivation()) on macOS.

Sometimes my application crashes in arbitrary places like:

Exception Type:        EXC_BREAKPOINT (SIGTRAP)
Exception Codes:       0x0000000000000002, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Trace/BPT trap: 5
Termination Reason:    Namespace SIGNAL, Code 0x5
Terminating Process:   exc handler [33538]

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libcoreclr.dylib              	0x000000010c044c13 DBG_DebugBreak + 1
1   ???                           	0x00007ffee3cf48b0 0 + 140732720433328
2   ???                           	0x000000011ae3403a 0 + 4746068026
3   ???                           	0x00000001223a1488 0 + 4869198984
4   ???                           	0x0000000125648e75 0 + 4922314357
5   ???                           	0x000000011db90293 0 + 4793631379
6   ???                           	0x000000012141c11b 0 + 4852924699
7   ???                           	0x000000012141c094 0 + 4852924564
8   ???                           	0x000000012141cd1d 0 + 4852927773
9   ???                           	0x000000011db30319 0 + 4793238297
10  ???                           	0x000000011db30279 0 + 4793238137
11  ???                           	0x000000011dbfd70d 0 + 4794078989
12  libcoreclr.dylib              	0x000000010c3b947b CallDescrWorkerInternal + 124
[...]

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x1f80aed1ad2b0076  rbx: 0x00000001b502b008  rcx: 0x00007f89a4018400  rdx: 0x0000000000000000
  rdi: 0x00007ffee3cf4310  rsi: 0x0000000000000000  rbp: 0x00007ffee3cf47e0  rsp: 0x00007ffee3cf4308
   r8: 0x00000000ffffffff   r9: 0x00007f89a4910748  r10: 0x0000001900000001  r11: 0x000000011a4200b0
  r12: 0x0000000000000000  r13: 0x00000001b502b0c8  r14: 0x000000018c056510  r15: 0x00000001b502b008
  rip: 0x000000010c044c13  rfl: 0x0000000000000202  cr2: 0x0000000115a0b058
  
Logical CPU:     0
Error Code:      0x02000131
Trap Number:     133

, or like:

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       EXC_I386_GPFLT
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Segmentation fault: 11
Termination Reason:    Namespace SIGNAL, Code 0xb
Terminating Process:   exc handler [62194]

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libcoreclr.dylib              	0x00000001013ad553 RtlRestoreContext + 12
1   libcoreclr.dylib              	0x00000001013ae86d ActivationHandler + 93
2   ???                           	0x00007ffeee980a80 0 + 140732901362304
3   ???                           	0x0000000110190d95 0 + 4565044629
4   libcoreclr.dylib              	0x000000010170c57b CallDescrWorkerInternal + 124
[...]

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x60597e4c28880097  rbx: 0x00007ffeee980560  rcx: 0x00007fcc86c12d90  rdx: 0x0000000000000000
  rdi: 0x00007ffeee980560  rsi: 0x0000000000000000  rbp: 0x00007ffeee980550  rsp: 0x00007ffeee980500
   r8: 0x00000000000130a8   r9: 0x0000000000000000  r10: 0x00007fff93e0abf8  r11: 0x00007fff93e0abf0
  r12: 0x00007ffeee980c90  r13: 0x0000000181368b28  r14: 0x0000000000000000  r15: 0x00007ffeee980d18
  rip: 0x00000001013ad553  rfl: 0x0000000000010202  cr2: 0x000070000cbd9ff8
  
Logical CPU:     0
Error Code:      0x02000005
Trap Number:     133

, or even like:

Exception Type:        EXC_BAD_ACCESS (SIGBUS)
Exception Codes:       KERN_PROTECTION_FAILURE at 0x0000000104e1cc7e
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Bus error: 10
Termination Reason:    Namespace SIGNAL, Code 0xa
Terminating Process:   exc handler [65838]

VM Regions Near 0x104e1cc7e:
    MALLOC_LARGE           0000000104dc2000-0000000104dd4000 [   72K] rw-/rwx SM=PRV  
--> __TEXT                 0000000104dd4000-0000000105258000 [ 4624K] r-x/rwx SM=COW  /opt/buildAgent/*/*.dylib
    __TEXT                 0000000105258000-0000000105259000 [    4K] r--/rwx SM=COW  /opt/buildAgent/*/*.dylib

Thread 70 Crashed:
0   libcoreclr.dylib              	0x0000000105175228 WKS::GCHeap::Relocate(Object**, ScanContext*, unsigned int) + 120
1   libcoreclr.dylib              	0x00000001050ff56b GcInfoDecoder::ReportUntrackedSlots(GcSlotDecoder&, REGDISPLAY*, unsigned int, void (*)(void*, Object**, unsigned int), void*) + 235
2   libcoreclr.dylib              	0x00000001050fe235 GcInfoDecoder::EnumerateLiveSlots(REGDISPLAY*, bool, unsigned int, void (*)(void*, Object**, unsigned int), void*) + 4341
3   libcoreclr.dylib              	0x0000000104f04b9e EECodeManager::EnumGcRefs(REGDISPLAY*, EECodeInfo*, unsigned int, void (*)(void*, Object**, unsigned int), void*, unsigned int) + 254
4   libcoreclr.dylib              	0x0000000105035d43 GcStackCrawlCallBack(CrawlFrame*, void*) + 643
5   libcoreclr.dylib              	0x0000000104f8bffd Thread::MakeStackwalkerCallback(CrawlFrame*, StackWalkAction (*)(CrawlFrame*, void*), void*) + 157
6   libcoreclr.dylib              	0x0000000104f8c261 Thread::StackWalkFramesEx(REGDISPLAY*, StackWalkAction (*)(CrawlFrame*, void*), void*, unsigned int, Frame*) + 465
7   libcoreclr.dylib              	0x0000000104f8c783 Thread::StackWalkFrames(StackWalkAction (*)(CrawlFrame*, void*), void*, unsigned int, Frame*) + 211
8   libcoreclr.dylib              	0x00000001050335a6 ScanStackRoots(Thread*, void (*)(Object**, ScanContext*, unsigned int), ScanContext*) + 326
9   libcoreclr.dylib              	0x00000001050333f5 GCToEEInterface::GcScanRoots(void (*)(Object**, ScanContext*, unsigned int), int, int, ScanContext*) + 261
10  libcoreclr.dylib              	0x000000010517c9c9 WKS::gc_heap::relocate_phase(int, unsigned char*) + 89
11  libcoreclr.dylib              	0x000000010516d013 WKS::gc_heap::plan_phase(int) + 10835
12  libcoreclr.dylib              	0x000000010516748d WKS::gc_heap::gc1() + 893
13  libcoreclr.dylib              	0x0000000105171357 WKS::gc_heap::garbage_collect(int) + 2007
14  libcoreclr.dylib              	0x00000001051633cd WKS::GCHeap::GarbageCollectGeneration(unsigned int, gc_reason) + 909
15  libcoreclr.dylib              	0x00000001051652e8 WKS::gc_heap::try_allocate_more_space(alloc_context*, unsigned long, unsigned int, int) + 664
16  libcoreclr.dylib              	0x000000010518aaa0 WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) + 80
17  libcoreclr.dylib              	0x0000000105038756 AllocateObject(MethodTable*) + 182
18  libcoreclr.dylib              	0x00000001050549e6 JIT_New(CORINFO_CLASS_STRUCT_*) + 134
19  ???                           	0x000000011fd5f1d5 0 + 4829082069
20  ???                           	0x000000011fd5f17d 0 + 4829081981
21  ???                           	0x000000011fd5e759 0 + 4829079385
22  ???                           	0x000000011fd5ffa4 0 + 4829085604
23  ???                           	0x000000011fd5fee4 0 + 4829085412
24  ???                           	0x000000011a96ab34 0 + 4741049140
25  ???                           	0x000000011fd5fdd4 0 + 4829085140
26  ???                           	0x000000011fd5fd29 0 + 4829084969
27  ???                           	0x0000000117cd3272 0 + 4694291058
28  ???                           	0x000000011a20eb1d 0 + 4733332253
29  ???                           	0x000000011fd5f45a 0 + 4829082714
30  ???                           	0x000000011dfc64d6 0 + 4798047446
31  ???                           	0x000000011b187b33 0 + 4749556531
32  ???                           	0x000000011b181535 0 + 4749530421
33  ???                           	0x000000011b17bf71 0 + 4749508465
34  ???                           	0x000000011b1822ea 0 + 4749533930
35  ???                           	0x000000011b182200 0 + 4749533696
36  ???                           	0x000000011b1820b5 0 + 4749533365
37  ???                           	0x000000011b187765 0 + 4749555557
38  ???                           	0x000000011e4866bf 0 + 4803028671
39  ???                           	0x000000011a96ac74 0 + 4741049460
40  ???                           	0x000000011a20eb1d 0 + 4733332253
41  ???                           	0x000000011fd6053a 0 + 4829087034
42  ???                           	0x000000011dfc64d6 0 + 4798047446
43  ???                           	0x000000011b187740 0 + 4749555520
44  ???                           	0x000000011b1830e4 0 + 4749537508
45  ???                           	0x000000011fd5eabc 0 + 4829080252
46  ???                           	0x000000011a20eb1d 0 + 4733332253
47  ???                           	0x000000011fd6038e 0 + 4829086606
48  ???                           	0x000000011dfc64d6 0 + 4798047446
49  ???                           	0x000000011b187740 0 + 4749555520
50  ???                           	0x000000011b1830e4 0 + 4749537508
51  ???                           	0x000000011a96b94d 0 + 4741052749
52  ???                           	0x000000011a20ec51 0 + 4733332561
53  ???                           	0x000000011fd5d9ac 0 + 4829075884
54  ???                           	0x000000011a20d599 0 + 4733326745
55  libcoreclr.dylib              	0x000000010519147b CallDescrWorkerInternal + 124
[...]

Thread 70 crashed with X86 Thread State (64-bit):
  rax: 0x00000001051751b0  rbx: 0x83d0010fc9310000  rcx: 0x00007ffeeaf1b670  rdx: 0x0000000000000000
  rdi: 0x00007000158cd808  rsi: 0x00007000158cf510  rbp: 0x00007000158cd830  rsp: 0x00007000158cd800
   r8: 0x0000000105035a20   r9: 0x00007000158cf470  r10: 0x0000000000000000  r11: 0x0000000117848ba8
  r12: 0x83d0010fc9310000  r13: 0x000000000000000a  r14: 0x0000000104e1cc7e  r15: 0x0000000000000000
  rip: 0x0000000105175228  rfl: 0x0000000000010286  cr2: 0x0000000112760000
  
Logical CPU:     0
Error Code:      0x02000131
Trap Number:     133

Looking at the EXC_BREAKPOINT failure, I found the place it was called from: https://github.com/dotnet/coreclr/blob/v3.1.8/src/pal/src/exception/machexception.cpp#L1533 https://github.com/dotnet/runtime/blob/cf258a14b70ad9069470a108f13765e0e5988f51/src/coreclr/src/pal/src/exception/machexception.cpp#L1255-L1261

, what happened due to a NULL instruction pointer:

Thread::SuspendRuntime(reason=0x1)
118596103: InjectActivationInternal thread 878759 sp 0x7ffeef92b158 rbp 0x7ffeef92b630 ctx 0x7ffeef92b160 { rip 0x110c87d6f } watch8 0x7ffeef92b258
118596248: ActivationHandler stack 0x7ffeef92b13f frame 0x7ffeef92b150 ctx 0x7ffeef92b160 { rip 0x0 }

(please note sp/rbp differ because taken from another crash)

Setting h/w watchpoints at &pContext->Rip didn’t help, but I caught the stack memory changed after the target thread was suspended:

Thread::SuspendRuntime(reason=0x1)
72878468: InjectActivationInternal thread 3574946 cleaning mem 0x7ffeea151040 - 0x7ffeea1519e0
72879327: InjectActivationInternal thread 3574946 new_value 0x00000000000000DE at 0x7ffeea1514b8 <-- struct mcontext_avx64
72879390: InjectActivationInternal thread 3574946 new_value 0x0000000201635000 at 0x7ffeea1514c0
72879411: InjectActivationInternal thread 3574946 new_value 0x000000000EC3EBEF at 0x7ffeea1514c8
72879433: InjectActivationInternal thread 3574946 new_value 0x000000020160EEE8 at 0x7ffeea1514d0
72879473: InjectActivationInternal thread 3574946 new_value 0x000000000000000F at 0x7ffeea1514d8
72879487: InjectActivationInternal thread 3574946 new_value 0x000000006CAD2534 at 0x7ffeea1514e0
72879498: InjectActivationInternal thread 3574946 new_value 0x0000000189050D54 at 0x7ffeea1514e8
72879510: InjectActivationInternal thread 3574946 new_value 0x0000000000000098 at 0x7ffeea1514f0
72879521: InjectActivationInternal thread 3574946 new_value 0x00007FFEEA1519E0 at 0x7ffeea1514f8 <-- mctxp->mctx_avx64.ss.__rbp
72879531: InjectActivationInternal thread 3574946 new_value 0x00007FFEEA1519E0 at 0x7ffeea151500 <-- mctxp->mctx_avx64.ss.__rsp
72879541: InjectActivationInternal thread 3574946 new_value 0x000000000076006F at 0x7ffeea151508
72879553: InjectActivationInternal thread 3574946 new_value 0x000000011B67D0B8 at 0x7ffeea151510
72879562: InjectActivationInternal thread 3574946 new_value 0x0000000300000001 at 0x7ffeea151518
72879572: InjectActivationInternal thread 3574946 new_value 0x000000011C3D4C50 at 0x7ffeea151520
72879582: InjectActivationInternal thread 3574946 new_value 0x000000020160EEE8 at 0x7ffeea151528
72879592: InjectActivationInternal thread 3574946 new_value 0x0000000195281F58 at 0x7ffeea151530
72879601: InjectActivationInternal thread 3574946 new_value 0x0000000189050D28 at 0x7ffeea151538
72879614: InjectActivationInternal thread 3574946 new_value 0x0000000185B83540 at 0x7ffeea151540
72879627: InjectActivationInternal thread 3574946 new_value 0x000000011C907318 at 0x7ffeea151548
72879637: InjectActivationInternal thread 3574946 new_value 0x0000000000000212 at 0x7ffeea151550
72879647: InjectActivationInternal thread 3574946 new_value 0x000000000000002B at 0x7ffeea151558
72879659: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151560
72879671: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151568
72879684: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151570
72879708: InjectActivationInternal thread 3574946 new_value 0x05FD00000000037F at 0x7ffeea151578
72879723: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151580
72879748: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151588
72879771: InjectActivationInternal thread 3574946 new_value 0x0000FFFF00001FA3 at 0x7ffeea151590
72879783: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151598
72879792: InjectActivationInternal thread 3574946 new_value 0x000000000000FFFF at 0x7ffeea1515a0
72879803: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515a8
72879811: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515b0
72879821: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515b8
72879831: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515c0
72879842: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515c8
72879851: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515d0
72879860: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515d8
72879870: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515e0
72879881: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515e8
72879890: InjectActivationInternal thread 3574946 new_value 0x000000000000FFFF at 0x7ffeea1515f0
72879901: InjectActivationInternal thread 3574946 new_value 0x00000000025EC0C2 at 0x7ffeea1515f8
72879914: InjectActivationInternal thread 3574946 new_value 0x000000000000FFFF at 0x7ffeea151600
72879925: InjectActivationInternal thread 3574946 new_value 0x00000000FFFFD15C at 0x7ffeea151608
72879934: InjectActivationInternal thread 3574946 new_value 0x000000000000FFFF at 0x7ffeea151610
72879945: InjectActivationInternal thread 3574946 new_value 0x0000000189050920 at 0x7ffeea151618
72879955: InjectActivationInternal thread 3574946 new_value 0x00000002011E48A8 at 0x7ffeea151620
72879966: InjectActivationInternal thread 3574946 new_value 0x40967C0000000000 at 0x7ffeea151628
72879977: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151630
72879986: InjectActivationInternal thread 3574946 new_value 0xFFFFFFFFFFFFFFFF at 0x7ffeea151638
72879996: InjectActivationInternal thread 3574946 new_value 0xFFFFFFFFFFFFFFFF at 0x7ffeea151640
72880005: InjectActivationInternal thread 3574946 new_value 0xFFFFFFFFFFFFFFFF at 0x7ffeea151648
72880017: InjectActivationInternal thread 3574946 new_value 0xFFFFFFFFFFFFFFFF at 0x7ffeea151650
72880026: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151658
72880037: InjectActivationInternal thread 3574946 new_value 0x689A6FB900000000 at 0x7ffeea151660
72880047: InjectActivationInternal thread 3574946 new_value 0x00000000432B0000 at 0x7ffeea151668
72880058: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151670
72880067: InjectActivationInternal thread 3574946 new_value 0x4106BF5BDA34CBDF at 0x7ffeea151678
72880078: InjectActivationInternal thread 3574946 new_value 0x4644AC6B477AD54E at 0x7ffeea151680
72880088: InjectActivationInternal thread 3574946 new_value 0xDAA409EE7E78CF9C at 0x7ffeea151688
72880099: InjectActivationInternal thread 3574946 new_value 0xFC9D514261FB9789 at 0x7ffeea151690
72880109: InjectActivationInternal thread 3574946 new_value 0x8B98B3B77629F005 at 0x7ffeea151698
72880121: InjectActivationInternal thread 3574946 new_value 0x9073B2A0D1D92BE2 at 0x7ffeea1516a0
72880130: InjectActivationInternal thread 3574946 new_value 0x0DF575D78915FE45 at 0x7ffeea1516a8
72880142: InjectActivationInternal thread 3574946 new_value 0x9C00965081ADA67A at 0x7ffeea1516b0
72880152: InjectActivationInternal thread 3574946 new_value 0x65F48E1F1CE3921B at 0x7ffeea1516b8
72880162: InjectActivationInternal thread 3574946 new_value 0x7F15DB8B8E6F0FEC at 0x7ffeea1516c0
72880171: InjectActivationInternal thread 3574946 new_value 0xE7765055C60B4C6E at 0x7ffeea1516c8
72880182: InjectActivationInternal thread 3574946 new_value 0x1BEB0117A7F0DBE7 at 0x7ffeea1516d0
72880192: InjectActivationInternal thread 3574946 new_value 0xD6FE60F4B30AEEEB at 0x7ffeea1516d8
72880202: InjectActivationInternal thread 3574946 new_value 0x2784B49358916F18 at 0x7ffeea1516e0
72880212: InjectActivationInternal thread 3574946 new_value 0xFDB143B21AC713E7 at 0x7ffeea1516e8
72880223: InjectActivationInternal thread 3574946 new_value 0x41AA99425A419855 at 0x7ffeea1516f0
72880233: InjectActivationInternal thread 3574946 new_value 0xE658603330A600C7 at 0x7ffeea1516f8
72880244: InjectActivationInternal thread 3574946 new_value 0x994DBBB8BEC90F2B at 0x7ffeea151700
72880254: InjectActivationInternal thread 3574946 new_value 0x6EE7A070DE8F036E at 0x7ffeea151708
72880265: InjectActivationInternal thread 3574946 new_value 0xED6702021215E4E5 at 0x7ffeea151710
72880276: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151718
72880285: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151720
72880295: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151728
72880304: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151730
72880315: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151738
72880325: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151740
72880336: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151748
72880345: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151750
72880356: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151758
72880367: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151760
72880378: InjectActivationInternal thread 3574946 new_value 0x0000000400000001 at 0x7ffeea151768
72880388: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151770
72880398: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151778
72880407: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151780
72880418: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151788
72880428: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151790
72880439: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151798
72880449: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517a0
72880459: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517a8
72880468: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517b0
72880478: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517b8
72880488: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517c0
72880498: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517c8
72880508: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517d0
72880519: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517d8
72880527: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517e0
72880537: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517e8
72880547: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517f0
72880557: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517f8
72880568: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151800
72880579: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151808
72880589: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151810
72880599: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151818
72880608: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151820
72880618: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151828
72880627: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151830
72880637: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151838
72880648: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151840
72880658: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151848
72880668: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151850
72880678: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151858
72880689: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151860
72880699: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151868
72880710: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151870
72880719: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151878
72880730: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151880
72880741: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151888
72880751: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151890
72880762: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151898
72880772: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518a0
72880783: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518a8
72880793: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518b0
72880804: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518b8
72880814: InjectActivationInternal thread 3574946 new_value 0x0000000000000014 at 0x7ffeea1518c0
72880826: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518c8
72880837: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518d0
72880849: InjectActivationInternal thread 3574946 new_value 0x000000011C907318 at 0x7ffeea1518d8
72880860: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518e0
72880871: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518e8
72880882: InjectActivationInternal thread 3574946 new_value 0x00007FFEEA1519E0 at 0x7ffeea1518f0
72880892: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518f8
72880903: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151900
72880913: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151908
72880923: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151910
72880933: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151918
72880943: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151920
72880953: InjectActivationInternal thread 3574946 new_value 0x0000000400000000 at 0x7ffeea151928
72880963: InjectActivationInternal thread 3574946 new_value 0x00007FFEEA151498 at 0x7ffeea151930
72880974: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151938
72880984: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151940
72880995: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151948
72881004: InjectActivationInternal thread 3574946 new_value 0x0000000000000408 at 0x7ffeea151950
72881015: InjectActivationInternal thread 3574946 new_value 0x00007FFEEA1514B8 at 0x7ffeea151958 <-- sp + C_64_REDZONE_LEN + sizeof(new_val)

(also taken from another crash)

So, when we stopping the target thread and saving its context to the stack: https://github.com/dotnet/coreclr/blob/v3.1.8/src/pal/src/exception/machexception.cpp#L1601 https://github.com/dotnet/runtime/blob/cf258a14b70ad9069470a108f13765e0e5988f51/src/coreclr/src/pal/src/exception/machexception.cpp#L1329-L1338

dotnet can receive a signal (SIGCHLD in my case), and sendsig() in macOS kernel will also inject _sigtramp and overwrite our context: https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24aa7443/bsd/dev/i386/unix_signal.c#L257-L267 (ver. xnu-4903.221.2).

Demo app:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;

namespace traffic_csharp
{
    class Program
    {
        static void Main(string[] args)
        {
            Process.Start("/usr/bin/true").WaitForExit(); // for SIGCONT, SIGCHLD, SIGWINCH
            //Console.CancelKeyPress += (sender, eventArgs) => eventArgs.Cancel = true; // for SIGINT, SIGQUIT

            var t = new Thread(AllocTraffic);
            t.Start();

            for (ulong i = 0; i != 1; i += 1)
                i += 1;
        }

        static void AllocTraffic()
        {
            for (var i = 0; i < 1000000; ++i)
            {
                if (i % 10000 == 0)
                    Console.WriteLine("alloc traffic round {0}", i / 10000);

                var list = new List<int>();
                for (var j = 0; j < 100000; ++j)
                    list.Add(j);
            }
        }
    }
}

Build & run:

$ dotnet exec bin/Debug/netcoreapp3.1/traffic_csharp.dll
alloc traffic round 0
alloc traffic round 1
alloc traffic round 2
alloc traffic round 3
Trace/BPT trap: 5
$ dotnet exec bin/Debug/netcoreapp3.1/traffic_csharp.dll
alloc traffic round 0
alloc traffic round 1
Bus error: 10
$ dotnet exec bin/Debug/netcoreapp3.1/traffic_csharp.dll
alloc traffic round 0
alloc traffic round 1
alloc traffic round 2
RestoreState: 1332: thread_set_state(thread) (os/kern) aborted
Abort trap: 6

and in another console do:

$ while kill -SIGCHLD <dotnet PID>; do true; done

Similarly, we can reproduce that with: SIGCONT, SIGCHLD, and SIGWINCH (https://github.com/dotnet/corefx/blob/v3.1.8/src/System.Diagnostics.Process/src/System/Diagnostics/Process.Unix.cs#L371), SIGINT and SIGQUIT (https://github.com/dotnet/corefx/blob/v3.1.8/src/System.Console/src/System/Console.cs#L337) signals.

BTW: blocking signals before saving the context solves this problem.

Linked issues: https://github.com/dotnet/runtime/issues/3947, https://github.com/dotnet/coreclr/pull/1610, https://github.com/dotnet/runtime/issues/11906

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 16 (16 by maintainers)

Most upvoted comments

That sounds like it would not be nice perf-wise to call the syscall at every such transition. And keeping them always blocked would be a problem for interop with libraries that internally use signals.

As for the signal approach, I don’t think there is a problem with using non-realtime signals. The properties of realtime signals are that they are queued and never coalesced and that they have priorities based on their number and relative to non-realtime ones when multiple signals arrive at the same time. We don’t need either of these. This approach would have other benefits:

  • It would eliminate most of the macOS specific code from PAL
  • It would fix another issue with activation injection we have now with running under Rosetta 2

I am currently experimenting with this approach and it looks promising so far. I will definitely test GC suspension performance with this approach too.