runtime: [Arm32] Invalid memory access (possibly JIT issue)
I have been chasing down an issue that crash our system on ARM 32 bits machine occasionally.
The error is a SIGSEGV
or SIGABRT
on memory that we are certain that we own.
We are able to reproduce this in a fairly consistent basis, but only by throwing a lot of work on the machine and I don’t have a simple reproduction.
The error occur, at all times, on this line of code: Unsafe.CopyBlockUnaligned()
We have been able to capture this in lldb and have the following information:
(lldb) Process 18813 stopped
* thread dotnet/runtime#3861: tid = 0x49a2, 0x7664d55e, name = 'Raven.Server', stop reason = signal SIGSEGV: address access protected (fault address: 0x520d5000)
frame #0: 0x7664d55e
-> 0x7664d55e: stmdavs r11, {r0, r1, r11, sp, lr}
0x7664d562: stcllt p6, c15, [sp, #-772]!
0x7664d566: .long 0xe92d0000 ; unknown opcode
0x7664d56a: svcge #0x34ff0
The fault address is: 0x520d5000
Looking at smaps
, we can confirm that this is indeed an address that we shouldn’t access:
520c5000-520d5000 rw-s 05b90000 08:01 1310760 /mnt/external/TmpDataDir/Databases/zz/Temp/scratch.0000000002.buffers
Size: 64 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 64 kB
Private_Dirty: 0 kB
Referenced: 64 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
VmFlags: rd wr sh mr mw me ms
520d5000-520d6000 ---p 00000000 00:00 0
Size: 4 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 0 kB
Pss: 0 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 0 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
However, note that we own the memory just before this bit. We have added additional tracing to the code and we believe that the actual failure happened when we call:
Unsafe.CopyBlockUnaligned(0x520D4FFD,0x4DE6F1D4,2);
The 0x4DE6F1D4
source address is allocated on the stack and is used just before the failure with:
Unsafe.CopyBlockUnaligned(0x520D4FF1,0x4DE6F1D4,8);
Unsafe.CopyBlockUnaligned(0x520D4FF9,0x4DE6F1D4,4);
// dies here
Unsafe.CopyBlockUnaligned(0x520D4FFD,0x4DE6F1D4,2);
We have a stackalloc ulong[1]
variable that is used as a buffer to copy to the destination.
We are writing toward the end of the page that we own, but that is expected and should be fine because we aren’t going beyond the boundary of the page.
Here is the disassembly at the time of the crash
(lldb) d
-> 0x7664d55e: stmdavs r11, {r0, r1, r11, sp, lr}
0x7664d562: stcllt p6, c15, [sp, #-772]!
0x7664d566: .long 0xe92d0000 ; unknown opcode
0x7664d56a: svcge #0x34ff0
0x7664d56e: .long 0xf8c3b081 ; unknown opcode
0x7664d572: bgt 0x7828157a
0x7664d576: andeq pc, r4, #-2147483648
0x7664d57a: svceq #0xe8b2
And here are the registers at the crash
(lldb) register read
General Purpose Registers:
r0 = 0x520d4ffd
r1 = 0x4de6f1d4
r2 = 0x00000002
r3 = 0x00000000
r4 = 0x520d4ffd
r5 = 0x4de6f1d4
r6 = 0x00000002
r7 = 0x00000000
r8 = 0x5c48b99c
r9 = 0x5c48b9ac
r10 = 0x4de6fba4
r11 = 0x4de6f1a0
r12 = 0x7664d559
sp = 0x4de6f170
lr = 0x50ea66c5
pc = 0x7664d55e
cpsr = 0x20000030
I’m not an expert on ARM assembly, but it looks like the STM call is writing to the r11
, but while r6
looks like it contains the size, I’m not seeing this actually being used here.
Here is the full disassembly from around the location of the crash:
(lldb) di -s 0x7664d500 -e 0x7664d600
0x7664d500: .long 0xf04f462b ; unknown opcode
0x7664d504: .long 0x94000403 ; unknown opcode
0x7664d508: blx 0x7560968a
0x7664d50c: stmdals r10, {r3, r5, r8, r11, r12, sp, pc}
0x7664d510: .long 0xe8bdb001 ; unknown opcode
0x7664d514: .long 0xb0044ff0 ; unknown opcode
0x7664d518: .long 0x46844770 ; unknown opcode
0x7664d51c: .long 0xe8bdb001 ; unknown opcode
0x7664d520: .long 0xbc0f4ff0 ; unknown opcode
0x7664d524: push {r5, r6, r8, r9, r10, lr}
0x7664d528: stc p15, c4, [sp, #-964]!
0x7664d52c: strlt r0, [r2], #-2824
0x7664d530: stmdage r10, {r0, r7, r12, sp, pc}
0x7664d534: mrc2 p7, #0x5, apsr_nzcv, c10, c11, #0x6
0x7664d538: .long 0xbc02b001 ; unknown opcode
0x7664d53c: bleq 0x76888838
0x7664d540: svchi #0xf1e8bd
0x7664d544: strtvc sp, [r4], r0, asr dotnet/coreclr#20
0x7664d548: strtvc sp, [r4], r4, lsl dotnet/coreclr#22
0x7664d54c: svclt #0x82a00
0x7664d550: stmdavs r3, {r4, r5, r6, r8, r9, r10, lr}
0x7664d554: blt 0x7758b060
0x7664d558: svclt #0x82a00
0x7664d55c: stmdavs r3, {r4, r5, r6, r8, r9, r10, lr}
0x7664d560: .long 0xf6c1680b ; unknown opcode
0x7664d564: .long 0x0000bd6d ; unknown opcode
0x7664d568: svcmi #0xf0e92d
0x7664d56c: addlt r10, r1, r3, lsl dotnet/runtime#3862
0x7664d570: andle pc, r0, r3, asr dotnet/coreclr#17
0x7664d574: .long 0xf102ca70 ; unknown opcode
0x7664d578: .long 0xe8b20204 ; unknown opcode
0x7664d57c: strmi r0, [r8, r0, lsl dotnet/runtime#3862]
0x7664d580: .long 0xe8bdb001 ; unknown opcode
0x7664d584: strlt r8, [r0, #0xff0]
0x7664d588: .long 0xf8c3466f ; unknown opcode
0x7664d58c: ldrmi sp, [r0, r0]
0x7664d590: andeq r11, r0, r0, lsl dotnet/coreclr#27
0x7664d594: andle r2, r6, r0, lsl dotnet/coreclr#20
0x7664d598: .long 0x466fb580 ; unknown opcode
0x7664d59c: stmdavc r11, {r0, r1, r11, r12, sp, lr}
0x7664d5a0: ldcl p6, c15, [r0, #-772]
0x7664d5a4: stmdami r3, {r7, r8, r10, r11, r12, sp, pc}
0x7664d5a8: stmdahs r0, {r11, sp, lr}
0x7664d5ac: .long 0xf7a4bf18 ; unknown opcode
0x7664d5b0: .long 0x4770bebf ; unknown opcode
0x7664d5b4: strtvc sp, [r4], r0, asr dotnet/coreclr#20
0x7664d5b8: andeq r0, r0, r0
0x7664d5bc: andeq r0, r0, r0
0x7664d5c0: svclt #0x4770
0x7664d5c4: svclt #0xbf00
0x7664d5c8: svclt #0xbf00
0x7664d5cc: svclt #0xbf00
0x7664d5d0: svchi #0x5ff3bf
0x7664d5d4: .long 0xf2406001 ; unknown opcode
0x7664d5d8: .long 0xf2c00301 ; unknown opcode
0x7664d5dc: addsmi r0, r9, #0, dotnet/coreclr#6
0x7664d5e0: .long 0xf641d30a ; unknown opcode
0x7664d5e4: .long 0xf2c74324 ; unknown opcode
0x7664d5e8: bl 0x7671a324
0x7664d5ec: ldmdavc r8, {r4, r7, r8, r9, sp}
0x7664d5f0: svclt #0x1c28ff
0x7664d5f4: .long 0x701820ff ; unknown opcode
0x7664d5f8: andeq r4, r0, r0, ror r7
0x7664d5fc: andeq r0, r0, r0
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 19 (9 by maintainers)
Yes, based on the register values, it is what it was doing. And I can see it is a bug in the asm JIT_MemCpy helper. It uses the read to check if the address is valid before it jumps to memcpy. However, reading 4 bytes is obviously wrong. It should use just a byte read instead. Based on the comment in the function code, it seems that there used to be a requirement that this function is called only for a 4 byte aligned addresses, but looking at the Windows version of this helper, the code doesn’t require it. https://github.com/dotnet/coreclr/blob/master/src/vm/arm/crthelpers.S#L44-L58 I’ll create a PR with a fix.
Looking at JIT source, there is only a single place that invokes JIT_MemCpy. And the cpblk IL instruction is compiled at that place. Both Unsafe.CopyBlock and Unsafe.CopyBlockUnaligned use cpblk, as you can see here: https://github.com/dotnet/corefx/blob/64c6d9fe5409be14bdc3609d73ffb3fea1f35797/src/System.Runtime.CompilerServices.Unsafe/src/System.Runtime.CompilerServices.Unsafe.il#L162-L206
@aviviadi it will be part of 2.1.8 release as planned. The 2.1 branch should be open for merging the change after 2.1.7 is out.
Great! Thanks.