runtime: GC suspension & UNIX signals corrupt each other's memory on the stack on macOS
Hi! I found an interesting issue with runtime suspension (aka PAL_InjectActivation()
) on macOS.
Sometimes my application crashes in arbitrary places like:
Exception Type: EXC_BREAKPOINT (SIGTRAP)
Exception Codes: 0x0000000000000002, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Termination Signal: Trace/BPT trap: 5
Termination Reason: Namespace SIGNAL, Code 0x5
Terminating Process: exc handler [33538]
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libcoreclr.dylib 0x000000010c044c13 DBG_DebugBreak + 1
1 ??? 0x00007ffee3cf48b0 0 + 140732720433328
2 ??? 0x000000011ae3403a 0 + 4746068026
3 ??? 0x00000001223a1488 0 + 4869198984
4 ??? 0x0000000125648e75 0 + 4922314357
5 ??? 0x000000011db90293 0 + 4793631379
6 ??? 0x000000012141c11b 0 + 4852924699
7 ??? 0x000000012141c094 0 + 4852924564
8 ??? 0x000000012141cd1d 0 + 4852927773
9 ??? 0x000000011db30319 0 + 4793238297
10 ??? 0x000000011db30279 0 + 4793238137
11 ??? 0x000000011dbfd70d 0 + 4794078989
12 libcoreclr.dylib 0x000000010c3b947b CallDescrWorkerInternal + 124
[...]
Thread 0 crashed with X86 Thread State (64-bit):
rax: 0x1f80aed1ad2b0076 rbx: 0x00000001b502b008 rcx: 0x00007f89a4018400 rdx: 0x0000000000000000
rdi: 0x00007ffee3cf4310 rsi: 0x0000000000000000 rbp: 0x00007ffee3cf47e0 rsp: 0x00007ffee3cf4308
r8: 0x00000000ffffffff r9: 0x00007f89a4910748 r10: 0x0000001900000001 r11: 0x000000011a4200b0
r12: 0x0000000000000000 r13: 0x00000001b502b0c8 r14: 0x000000018c056510 r15: 0x00000001b502b008
rip: 0x000000010c044c13 rfl: 0x0000000000000202 cr2: 0x0000000115a0b058
Logical CPU: 0
Error Code: 0x02000131
Trap Number: 133
, or like:
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: EXC_I386_GPFLT
Exception Note: EXC_CORPSE_NOTIFY
Termination Signal: Segmentation fault: 11
Termination Reason: Namespace SIGNAL, Code 0xb
Terminating Process: exc handler [62194]
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libcoreclr.dylib 0x00000001013ad553 RtlRestoreContext + 12
1 libcoreclr.dylib 0x00000001013ae86d ActivationHandler + 93
2 ??? 0x00007ffeee980a80 0 + 140732901362304
3 ??? 0x0000000110190d95 0 + 4565044629
4 libcoreclr.dylib 0x000000010170c57b CallDescrWorkerInternal + 124
[...]
Thread 0 crashed with X86 Thread State (64-bit):
rax: 0x60597e4c28880097 rbx: 0x00007ffeee980560 rcx: 0x00007fcc86c12d90 rdx: 0x0000000000000000
rdi: 0x00007ffeee980560 rsi: 0x0000000000000000 rbp: 0x00007ffeee980550 rsp: 0x00007ffeee980500
r8: 0x00000000000130a8 r9: 0x0000000000000000 r10: 0x00007fff93e0abf8 r11: 0x00007fff93e0abf0
r12: 0x00007ffeee980c90 r13: 0x0000000181368b28 r14: 0x0000000000000000 r15: 0x00007ffeee980d18
rip: 0x00000001013ad553 rfl: 0x0000000000010202 cr2: 0x000070000cbd9ff8
Logical CPU: 0
Error Code: 0x02000005
Trap Number: 133
, or even like:
Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000104e1cc7e
Exception Note: EXC_CORPSE_NOTIFY
Termination Signal: Bus error: 10
Termination Reason: Namespace SIGNAL, Code 0xa
Terminating Process: exc handler [65838]
VM Regions Near 0x104e1cc7e:
MALLOC_LARGE 0000000104dc2000-0000000104dd4000 [ 72K] rw-/rwx SM=PRV
--> __TEXT 0000000104dd4000-0000000105258000 [ 4624K] r-x/rwx SM=COW /opt/buildAgent/*/*.dylib
__TEXT 0000000105258000-0000000105259000 [ 4K] r--/rwx SM=COW /opt/buildAgent/*/*.dylib
Thread 70 Crashed:
0 libcoreclr.dylib 0x0000000105175228 WKS::GCHeap::Relocate(Object**, ScanContext*, unsigned int) + 120
1 libcoreclr.dylib 0x00000001050ff56b GcInfoDecoder::ReportUntrackedSlots(GcSlotDecoder&, REGDISPLAY*, unsigned int, void (*)(void*, Object**, unsigned int), void*) + 235
2 libcoreclr.dylib 0x00000001050fe235 GcInfoDecoder::EnumerateLiveSlots(REGDISPLAY*, bool, unsigned int, void (*)(void*, Object**, unsigned int), void*) + 4341
3 libcoreclr.dylib 0x0000000104f04b9e EECodeManager::EnumGcRefs(REGDISPLAY*, EECodeInfo*, unsigned int, void (*)(void*, Object**, unsigned int), void*, unsigned int) + 254
4 libcoreclr.dylib 0x0000000105035d43 GcStackCrawlCallBack(CrawlFrame*, void*) + 643
5 libcoreclr.dylib 0x0000000104f8bffd Thread::MakeStackwalkerCallback(CrawlFrame*, StackWalkAction (*)(CrawlFrame*, void*), void*) + 157
6 libcoreclr.dylib 0x0000000104f8c261 Thread::StackWalkFramesEx(REGDISPLAY*, StackWalkAction (*)(CrawlFrame*, void*), void*, unsigned int, Frame*) + 465
7 libcoreclr.dylib 0x0000000104f8c783 Thread::StackWalkFrames(StackWalkAction (*)(CrawlFrame*, void*), void*, unsigned int, Frame*) + 211
8 libcoreclr.dylib 0x00000001050335a6 ScanStackRoots(Thread*, void (*)(Object**, ScanContext*, unsigned int), ScanContext*) + 326
9 libcoreclr.dylib 0x00000001050333f5 GCToEEInterface::GcScanRoots(void (*)(Object**, ScanContext*, unsigned int), int, int, ScanContext*) + 261
10 libcoreclr.dylib 0x000000010517c9c9 WKS::gc_heap::relocate_phase(int, unsigned char*) + 89
11 libcoreclr.dylib 0x000000010516d013 WKS::gc_heap::plan_phase(int) + 10835
12 libcoreclr.dylib 0x000000010516748d WKS::gc_heap::gc1() + 893
13 libcoreclr.dylib 0x0000000105171357 WKS::gc_heap::garbage_collect(int) + 2007
14 libcoreclr.dylib 0x00000001051633cd WKS::GCHeap::GarbageCollectGeneration(unsigned int, gc_reason) + 909
15 libcoreclr.dylib 0x00000001051652e8 WKS::gc_heap::try_allocate_more_space(alloc_context*, unsigned long, unsigned int, int) + 664
16 libcoreclr.dylib 0x000000010518aaa0 WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) + 80
17 libcoreclr.dylib 0x0000000105038756 AllocateObject(MethodTable*) + 182
18 libcoreclr.dylib 0x00000001050549e6 JIT_New(CORINFO_CLASS_STRUCT_*) + 134
19 ??? 0x000000011fd5f1d5 0 + 4829082069
20 ??? 0x000000011fd5f17d 0 + 4829081981
21 ??? 0x000000011fd5e759 0 + 4829079385
22 ??? 0x000000011fd5ffa4 0 + 4829085604
23 ??? 0x000000011fd5fee4 0 + 4829085412
24 ??? 0x000000011a96ab34 0 + 4741049140
25 ??? 0x000000011fd5fdd4 0 + 4829085140
26 ??? 0x000000011fd5fd29 0 + 4829084969
27 ??? 0x0000000117cd3272 0 + 4694291058
28 ??? 0x000000011a20eb1d 0 + 4733332253
29 ??? 0x000000011fd5f45a 0 + 4829082714
30 ??? 0x000000011dfc64d6 0 + 4798047446
31 ??? 0x000000011b187b33 0 + 4749556531
32 ??? 0x000000011b181535 0 + 4749530421
33 ??? 0x000000011b17bf71 0 + 4749508465
34 ??? 0x000000011b1822ea 0 + 4749533930
35 ??? 0x000000011b182200 0 + 4749533696
36 ??? 0x000000011b1820b5 0 + 4749533365
37 ??? 0x000000011b187765 0 + 4749555557
38 ??? 0x000000011e4866bf 0 + 4803028671
39 ??? 0x000000011a96ac74 0 + 4741049460
40 ??? 0x000000011a20eb1d 0 + 4733332253
41 ??? 0x000000011fd6053a 0 + 4829087034
42 ??? 0x000000011dfc64d6 0 + 4798047446
43 ??? 0x000000011b187740 0 + 4749555520
44 ??? 0x000000011b1830e4 0 + 4749537508
45 ??? 0x000000011fd5eabc 0 + 4829080252
46 ??? 0x000000011a20eb1d 0 + 4733332253
47 ??? 0x000000011fd6038e 0 + 4829086606
48 ??? 0x000000011dfc64d6 0 + 4798047446
49 ??? 0x000000011b187740 0 + 4749555520
50 ??? 0x000000011b1830e4 0 + 4749537508
51 ??? 0x000000011a96b94d 0 + 4741052749
52 ??? 0x000000011a20ec51 0 + 4733332561
53 ??? 0x000000011fd5d9ac 0 + 4829075884
54 ??? 0x000000011a20d599 0 + 4733326745
55 libcoreclr.dylib 0x000000010519147b CallDescrWorkerInternal + 124
[...]
Thread 70 crashed with X86 Thread State (64-bit):
rax: 0x00000001051751b0 rbx: 0x83d0010fc9310000 rcx: 0x00007ffeeaf1b670 rdx: 0x0000000000000000
rdi: 0x00007000158cd808 rsi: 0x00007000158cf510 rbp: 0x00007000158cd830 rsp: 0x00007000158cd800
r8: 0x0000000105035a20 r9: 0x00007000158cf470 r10: 0x0000000000000000 r11: 0x0000000117848ba8
r12: 0x83d0010fc9310000 r13: 0x000000000000000a r14: 0x0000000104e1cc7e r15: 0x0000000000000000
rip: 0x0000000105175228 rfl: 0x0000000000010286 cr2: 0x0000000112760000
Logical CPU: 0
Error Code: 0x02000131
Trap Number: 133
Looking at the EXC_BREAKPOINT
failure, I found the place it was called from:
https://github.com/dotnet/coreclr/blob/v3.1.8/src/pal/src/exception/machexception.cpp#L1533
https://github.com/dotnet/runtime/blob/cf258a14b70ad9069470a108f13765e0e5988f51/src/coreclr/src/pal/src/exception/machexception.cpp#L1255-L1261
, what happened due to a NULL instruction pointer:
Thread::SuspendRuntime(reason=0x1)
118596103: InjectActivationInternal thread 878759 sp 0x7ffeef92b158 rbp 0x7ffeef92b630 ctx 0x7ffeef92b160 { rip 0x110c87d6f } watch8 0x7ffeef92b258
118596248: ActivationHandler stack 0x7ffeef92b13f frame 0x7ffeef92b150 ctx 0x7ffeef92b160 { rip 0x0 }
(please note sp/rbp differ because taken from another crash)
Setting h/w watchpoints at &pContext->Rip
didn’t help, but I caught the stack memory changed after the target thread was suspended:
Thread::SuspendRuntime(reason=0x1)
72878468: InjectActivationInternal thread 3574946 cleaning mem 0x7ffeea151040 - 0x7ffeea1519e0
72879327: InjectActivationInternal thread 3574946 new_value 0x00000000000000DE at 0x7ffeea1514b8 <-- struct mcontext_avx64
72879390: InjectActivationInternal thread 3574946 new_value 0x0000000201635000 at 0x7ffeea1514c0
72879411: InjectActivationInternal thread 3574946 new_value 0x000000000EC3EBEF at 0x7ffeea1514c8
72879433: InjectActivationInternal thread 3574946 new_value 0x000000020160EEE8 at 0x7ffeea1514d0
72879473: InjectActivationInternal thread 3574946 new_value 0x000000000000000F at 0x7ffeea1514d8
72879487: InjectActivationInternal thread 3574946 new_value 0x000000006CAD2534 at 0x7ffeea1514e0
72879498: InjectActivationInternal thread 3574946 new_value 0x0000000189050D54 at 0x7ffeea1514e8
72879510: InjectActivationInternal thread 3574946 new_value 0x0000000000000098 at 0x7ffeea1514f0
72879521: InjectActivationInternal thread 3574946 new_value 0x00007FFEEA1519E0 at 0x7ffeea1514f8 <-- mctxp->mctx_avx64.ss.__rbp
72879531: InjectActivationInternal thread 3574946 new_value 0x00007FFEEA1519E0 at 0x7ffeea151500 <-- mctxp->mctx_avx64.ss.__rsp
72879541: InjectActivationInternal thread 3574946 new_value 0x000000000076006F at 0x7ffeea151508
72879553: InjectActivationInternal thread 3574946 new_value 0x000000011B67D0B8 at 0x7ffeea151510
72879562: InjectActivationInternal thread 3574946 new_value 0x0000000300000001 at 0x7ffeea151518
72879572: InjectActivationInternal thread 3574946 new_value 0x000000011C3D4C50 at 0x7ffeea151520
72879582: InjectActivationInternal thread 3574946 new_value 0x000000020160EEE8 at 0x7ffeea151528
72879592: InjectActivationInternal thread 3574946 new_value 0x0000000195281F58 at 0x7ffeea151530
72879601: InjectActivationInternal thread 3574946 new_value 0x0000000189050D28 at 0x7ffeea151538
72879614: InjectActivationInternal thread 3574946 new_value 0x0000000185B83540 at 0x7ffeea151540
72879627: InjectActivationInternal thread 3574946 new_value 0x000000011C907318 at 0x7ffeea151548
72879637: InjectActivationInternal thread 3574946 new_value 0x0000000000000212 at 0x7ffeea151550
72879647: InjectActivationInternal thread 3574946 new_value 0x000000000000002B at 0x7ffeea151558
72879659: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151560
72879671: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151568
72879684: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151570
72879708: InjectActivationInternal thread 3574946 new_value 0x05FD00000000037F at 0x7ffeea151578
72879723: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151580
72879748: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151588
72879771: InjectActivationInternal thread 3574946 new_value 0x0000FFFF00001FA3 at 0x7ffeea151590
72879783: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151598
72879792: InjectActivationInternal thread 3574946 new_value 0x000000000000FFFF at 0x7ffeea1515a0
72879803: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515a8
72879811: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515b0
72879821: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515b8
72879831: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515c0
72879842: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515c8
72879851: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515d0
72879860: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515d8
72879870: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515e0
72879881: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1515e8
72879890: InjectActivationInternal thread 3574946 new_value 0x000000000000FFFF at 0x7ffeea1515f0
72879901: InjectActivationInternal thread 3574946 new_value 0x00000000025EC0C2 at 0x7ffeea1515f8
72879914: InjectActivationInternal thread 3574946 new_value 0x000000000000FFFF at 0x7ffeea151600
72879925: InjectActivationInternal thread 3574946 new_value 0x00000000FFFFD15C at 0x7ffeea151608
72879934: InjectActivationInternal thread 3574946 new_value 0x000000000000FFFF at 0x7ffeea151610
72879945: InjectActivationInternal thread 3574946 new_value 0x0000000189050920 at 0x7ffeea151618
72879955: InjectActivationInternal thread 3574946 new_value 0x00000002011E48A8 at 0x7ffeea151620
72879966: InjectActivationInternal thread 3574946 new_value 0x40967C0000000000 at 0x7ffeea151628
72879977: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151630
72879986: InjectActivationInternal thread 3574946 new_value 0xFFFFFFFFFFFFFFFF at 0x7ffeea151638
72879996: InjectActivationInternal thread 3574946 new_value 0xFFFFFFFFFFFFFFFF at 0x7ffeea151640
72880005: InjectActivationInternal thread 3574946 new_value 0xFFFFFFFFFFFFFFFF at 0x7ffeea151648
72880017: InjectActivationInternal thread 3574946 new_value 0xFFFFFFFFFFFFFFFF at 0x7ffeea151650
72880026: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151658
72880037: InjectActivationInternal thread 3574946 new_value 0x689A6FB900000000 at 0x7ffeea151660
72880047: InjectActivationInternal thread 3574946 new_value 0x00000000432B0000 at 0x7ffeea151668
72880058: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151670
72880067: InjectActivationInternal thread 3574946 new_value 0x4106BF5BDA34CBDF at 0x7ffeea151678
72880078: InjectActivationInternal thread 3574946 new_value 0x4644AC6B477AD54E at 0x7ffeea151680
72880088: InjectActivationInternal thread 3574946 new_value 0xDAA409EE7E78CF9C at 0x7ffeea151688
72880099: InjectActivationInternal thread 3574946 new_value 0xFC9D514261FB9789 at 0x7ffeea151690
72880109: InjectActivationInternal thread 3574946 new_value 0x8B98B3B77629F005 at 0x7ffeea151698
72880121: InjectActivationInternal thread 3574946 new_value 0x9073B2A0D1D92BE2 at 0x7ffeea1516a0
72880130: InjectActivationInternal thread 3574946 new_value 0x0DF575D78915FE45 at 0x7ffeea1516a8
72880142: InjectActivationInternal thread 3574946 new_value 0x9C00965081ADA67A at 0x7ffeea1516b0
72880152: InjectActivationInternal thread 3574946 new_value 0x65F48E1F1CE3921B at 0x7ffeea1516b8
72880162: InjectActivationInternal thread 3574946 new_value 0x7F15DB8B8E6F0FEC at 0x7ffeea1516c0
72880171: InjectActivationInternal thread 3574946 new_value 0xE7765055C60B4C6E at 0x7ffeea1516c8
72880182: InjectActivationInternal thread 3574946 new_value 0x1BEB0117A7F0DBE7 at 0x7ffeea1516d0
72880192: InjectActivationInternal thread 3574946 new_value 0xD6FE60F4B30AEEEB at 0x7ffeea1516d8
72880202: InjectActivationInternal thread 3574946 new_value 0x2784B49358916F18 at 0x7ffeea1516e0
72880212: InjectActivationInternal thread 3574946 new_value 0xFDB143B21AC713E7 at 0x7ffeea1516e8
72880223: InjectActivationInternal thread 3574946 new_value 0x41AA99425A419855 at 0x7ffeea1516f0
72880233: InjectActivationInternal thread 3574946 new_value 0xE658603330A600C7 at 0x7ffeea1516f8
72880244: InjectActivationInternal thread 3574946 new_value 0x994DBBB8BEC90F2B at 0x7ffeea151700
72880254: InjectActivationInternal thread 3574946 new_value 0x6EE7A070DE8F036E at 0x7ffeea151708
72880265: InjectActivationInternal thread 3574946 new_value 0xED6702021215E4E5 at 0x7ffeea151710
72880276: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151718
72880285: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151720
72880295: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151728
72880304: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151730
72880315: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151738
72880325: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151740
72880336: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151748
72880345: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151750
72880356: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151758
72880367: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151760
72880378: InjectActivationInternal thread 3574946 new_value 0x0000000400000001 at 0x7ffeea151768
72880388: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151770
72880398: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151778
72880407: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151780
72880418: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151788
72880428: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151790
72880439: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151798
72880449: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517a0
72880459: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517a8
72880468: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517b0
72880478: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517b8
72880488: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517c0
72880498: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517c8
72880508: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517d0
72880519: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517d8
72880527: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517e0
72880537: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517e8
72880547: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517f0
72880557: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1517f8
72880568: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151800
72880579: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151808
72880589: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151810
72880599: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151818
72880608: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151820
72880618: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151828
72880627: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151830
72880637: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151838
72880648: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151840
72880658: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151848
72880668: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151850
72880678: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151858
72880689: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151860
72880699: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151868
72880710: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151870
72880719: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151878
72880730: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151880
72880741: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151888
72880751: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151890
72880762: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151898
72880772: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518a0
72880783: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518a8
72880793: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518b0
72880804: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518b8
72880814: InjectActivationInternal thread 3574946 new_value 0x0000000000000014 at 0x7ffeea1518c0
72880826: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518c8
72880837: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518d0
72880849: InjectActivationInternal thread 3574946 new_value 0x000000011C907318 at 0x7ffeea1518d8
72880860: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518e0
72880871: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518e8
72880882: InjectActivationInternal thread 3574946 new_value 0x00007FFEEA1519E0 at 0x7ffeea1518f0
72880892: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea1518f8
72880903: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151900
72880913: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151908
72880923: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151910
72880933: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151918
72880943: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151920
72880953: InjectActivationInternal thread 3574946 new_value 0x0000000400000000 at 0x7ffeea151928
72880963: InjectActivationInternal thread 3574946 new_value 0x00007FFEEA151498 at 0x7ffeea151930
72880974: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151938
72880984: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151940
72880995: InjectActivationInternal thread 3574946 new_value 0x0000000000000000 at 0x7ffeea151948
72881004: InjectActivationInternal thread 3574946 new_value 0x0000000000000408 at 0x7ffeea151950
72881015: InjectActivationInternal thread 3574946 new_value 0x00007FFEEA1514B8 at 0x7ffeea151958 <-- sp + C_64_REDZONE_LEN + sizeof(new_val)
(also taken from another crash)
So, when we stopping the target thread and saving its context to the stack: https://github.com/dotnet/coreclr/blob/v3.1.8/src/pal/src/exception/machexception.cpp#L1601 https://github.com/dotnet/runtime/blob/cf258a14b70ad9069470a108f13765e0e5988f51/src/coreclr/src/pal/src/exception/machexception.cpp#L1329-L1338
dotnet can receive a signal (SIGCHLD
in my case), and sendsig()
in macOS kernel will also inject _sigtramp
and overwrite our context: https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24aa7443/bsd/dev/i386/unix_signal.c#L257-L267 (ver. xnu-4903.221.2).
Demo app:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
namespace traffic_csharp
{
class Program
{
static void Main(string[] args)
{
Process.Start("/usr/bin/true").WaitForExit(); // for SIGCONT, SIGCHLD, SIGWINCH
//Console.CancelKeyPress += (sender, eventArgs) => eventArgs.Cancel = true; // for SIGINT, SIGQUIT
var t = new Thread(AllocTraffic);
t.Start();
for (ulong i = 0; i != 1; i += 1)
i += 1;
}
static void AllocTraffic()
{
for (var i = 0; i < 1000000; ++i)
{
if (i % 10000 == 0)
Console.WriteLine("alloc traffic round {0}", i / 10000);
var list = new List<int>();
for (var j = 0; j < 100000; ++j)
list.Add(j);
}
}
}
}
Build & run:
$ dotnet exec bin/Debug/netcoreapp3.1/traffic_csharp.dll
alloc traffic round 0
alloc traffic round 1
alloc traffic round 2
alloc traffic round 3
Trace/BPT trap: 5
$ dotnet exec bin/Debug/netcoreapp3.1/traffic_csharp.dll
alloc traffic round 0
alloc traffic round 1
Bus error: 10
$ dotnet exec bin/Debug/netcoreapp3.1/traffic_csharp.dll
alloc traffic round 0
alloc traffic round 1
alloc traffic round 2
RestoreState: 1332: thread_set_state(thread) (os/kern) aborted
Abort trap: 6
and in another console do:
$ while kill -SIGCHLD <dotnet PID>; do true; done
Similarly, we can reproduce that with: SIGCONT
, SIGCHLD
, and SIGWINCH
(https://github.com/dotnet/corefx/blob/v3.1.8/src/System.Diagnostics.Process/src/System/Diagnostics/Process.Unix.cs#L371), SIGINT
and SIGQUIT
(https://github.com/dotnet/corefx/blob/v3.1.8/src/System.Console/src/System/Console.cs#L337) signals.
BTW: blocking signals before saving the context solves this problem.
Linked issues: https://github.com/dotnet/runtime/issues/3947, https://github.com/dotnet/coreclr/pull/1610, https://github.com/dotnet/runtime/issues/11906
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 16 (16 by maintainers)
That sounds like it would not be nice perf-wise to call the syscall at every such transition. And keeping them always blocked would be a problem for interop with libraries that internally use signals.
As for the signal approach, I don’t think there is a problem with using non-realtime signals. The properties of realtime signals are that they are queued and never coalesced and that they have priorities based on their number and relative to non-realtime ones when multiple signals arrive at the same time. We don’t need either of these. This approach would have other benefits:
I am currently experimenting with this approach and it looks promising so far. I will definitely test GC suspension performance with this approach too.