runtime: [Arm32] Invalid memory access (possibly JIT issue)

I have been chasing down an issue that crash our system on ARM 32 bits machine occasionally. The error is a SIGSEGV or SIGABRT on memory that we are certain that we own.

We are able to reproduce this in a fairly consistent basis, but only by throwing a lot of work on the machine and I don’t have a simple reproduction.

The error occur, at all times, on this line of code: Unsafe.CopyBlockUnaligned()

We have been able to capture this in lldb and have the following information:

(lldb) Process 18813 stopped
* thread dotnet/runtime#3861: tid = 0x49a2, 0x7664d55e, name = 'Raven.Server', stop reason = signal SIGSEGV: address access protected (fault address: 0x520d5000)
    frame #0: 0x7664d55e
->  0x7664d55e: stmdavs r11, {r0, r1, r11, sp, lr}
    0x7664d562: stcllt p6, c15, [sp, #-772]!
    0x7664d566: .long  0xe92d0000                ; unknown opcode
    0x7664d56a: svcge  #0x34ff0

The fault address is: 0x520d5000

Looking at smaps, we can confirm that this is indeed an address that we shouldn’t access:

520c5000-520d5000 rw-s 05b90000 08:01 1310760    /mnt/external/TmpDataDir/Databases/zz/Temp/scratch.0000000002.buffers
Size:                 64 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  64 kB
Pss:                  64 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:        64 kB
Private_Dirty:         0 kB
Referenced:           64 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms
520d5000-520d6000 ---p 00000000 00:00 0
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB

However, note that we own the memory just before this bit. We have added additional tracing to the code and we believe that the actual failure happened when we call:

Unsafe.CopyBlockUnaligned(0x520D4FFD,0x4DE6F1D4,2);

The 0x4DE6F1D4 source address is allocated on the stack and is used just before the failure with:

Unsafe.CopyBlockUnaligned(0x520D4FF1,0x4DE6F1D4,8);
Unsafe.CopyBlockUnaligned(0x520D4FF9,0x4DE6F1D4,4);

// dies here
Unsafe.CopyBlockUnaligned(0x520D4FFD,0x4DE6F1D4,2);

We have a stackalloc ulong[1] variable that is used as a buffer to copy to the destination.

We are writing toward the end of the page that we own, but that is expected and should be fine because we aren’t going beyond the boundary of the page.

Here is the disassembly at the time of the crash

(lldb) d
->  0x7664d55e: stmdavs r11, {r0, r1, r11, sp, lr}
    0x7664d562: stcllt p6, c15, [sp, #-772]!
    0x7664d566: .long  0xe92d0000                ; unknown opcode
    0x7664d56a: svcge  #0x34ff0
    0x7664d56e: .long  0xf8c3b081                ; unknown opcode
    0x7664d572: bgt    0x7828157a
    0x7664d576: andeq  pc, r4, #-2147483648
    0x7664d57a: svceq  #0xe8b2

And here are the registers at the crash

(lldb) register read
General Purpose Registers:
        r0 = 0x520d4ffd
        r1 = 0x4de6f1d4
        r2 = 0x00000002
        r3 = 0x00000000
        r4 = 0x520d4ffd
        r5 = 0x4de6f1d4
        r6 = 0x00000002
        r7 = 0x00000000
        r8 = 0x5c48b99c
        r9 = 0x5c48b9ac
       r10 = 0x4de6fba4
       r11 = 0x4de6f1a0
       r12 = 0x7664d559
        sp = 0x4de6f170
        lr = 0x50ea66c5
        pc = 0x7664d55e
      cpsr = 0x20000030

I’m not an expert on ARM assembly, but it looks like the STM call is writing to the r11, but while r6 looks like it contains the size, I’m not seeing this actually being used here.

Here is the full disassembly from around the location of the crash:

(lldb) di -s 0x7664d500 -e 0x7664d600
    0x7664d500: .long  0xf04f462b                ; unknown opcode
    0x7664d504: .long  0x94000403                ; unknown opcode
    0x7664d508: blx    0x7560968a
    0x7664d50c: stmdals r10, {r3, r5, r8, r11, r12, sp, pc}
    0x7664d510: .long  0xe8bdb001                ; unknown opcode
    0x7664d514: .long  0xb0044ff0                ; unknown opcode
    0x7664d518: .long  0x46844770                ; unknown opcode
    0x7664d51c: .long  0xe8bdb001                ; unknown opcode
    0x7664d520: .long  0xbc0f4ff0                ; unknown opcode
    0x7664d524: push   {r5, r6, r8, r9, r10, lr}
    0x7664d528: stc    p15, c4, [sp, #-964]!
    0x7664d52c: strlt  r0, [r2], #-2824
    0x7664d530: stmdage r10, {r0, r7, r12, sp, pc}
    0x7664d534: mrc2   p7, #0x5, apsr_nzcv, c10, c11, #0x6
    0x7664d538: .long  0xbc02b001                ; unknown opcode
    0x7664d53c: bleq   0x76888838
    0x7664d540: svchi  #0xf1e8bd
    0x7664d544: strtvc sp, [r4], r0, asr dotnet/coreclr#20
    0x7664d548: strtvc sp, [r4], r4, lsl dotnet/coreclr#22
    0x7664d54c: svclt  #0x82a00
    0x7664d550: stmdavs r3, {r4, r5, r6, r8, r9, r10, lr}
    0x7664d554: blt    0x7758b060
    0x7664d558: svclt  #0x82a00
    0x7664d55c: stmdavs r3, {r4, r5, r6, r8, r9, r10, lr}
    0x7664d560: .long  0xf6c1680b                ; unknown opcode
    0x7664d564: .long  0x0000bd6d                ; unknown opcode
    0x7664d568: svcmi  #0xf0e92d
    0x7664d56c: addlt  r10, r1, r3, lsl dotnet/runtime#3862
    0x7664d570: andle  pc, r0, r3, asr dotnet/coreclr#17
    0x7664d574: .long  0xf102ca70                ; unknown opcode
    0x7664d578: .long  0xe8b20204                ; unknown opcode
    0x7664d57c: strmi  r0, [r8, r0, lsl dotnet/runtime#3862]
    0x7664d580: .long  0xe8bdb001                ; unknown opcode
    0x7664d584: strlt  r8, [r0, #0xff0]
    0x7664d588: .long  0xf8c3466f                ; unknown opcode
    0x7664d58c: ldrmi  sp, [r0, r0]
    0x7664d590: andeq  r11, r0, r0, lsl dotnet/coreclr#27
    0x7664d594: andle  r2, r6, r0, lsl dotnet/coreclr#20
    0x7664d598: .long  0x466fb580                ; unknown opcode
    0x7664d59c: stmdavc r11, {r0, r1, r11, r12, sp, lr}
    0x7664d5a0: ldcl   p6, c15, [r0, #-772]
    0x7664d5a4: stmdami r3, {r7, r8, r10, r11, r12, sp, pc}
    0x7664d5a8: stmdahs r0, {r11, sp, lr}
    0x7664d5ac: .long  0xf7a4bf18                ; unknown opcode
    0x7664d5b0: .long  0x4770bebf                ; unknown opcode
    0x7664d5b4: strtvc sp, [r4], r0, asr dotnet/coreclr#20
    0x7664d5b8: andeq  r0, r0, r0
    0x7664d5bc: andeq  r0, r0, r0
    0x7664d5c0: svclt  #0x4770
    0x7664d5c4: svclt  #0xbf00
    0x7664d5c8: svclt  #0xbf00
    0x7664d5cc: svclt  #0xbf00
    0x7664d5d0: svchi  #0x5ff3bf
    0x7664d5d4: .long  0xf2406001                ; unknown opcode
    0x7664d5d8: .long  0xf2c00301                ; unknown opcode
    0x7664d5dc: addsmi r0, r9, #0, dotnet/coreclr#6
    0x7664d5e0: .long  0xf641d30a                ; unknown opcode
    0x7664d5e4: .long  0xf2c74324                ; unknown opcode
    0x7664d5e8: bl     0x7671a324
    0x7664d5ec: ldmdavc r8, {r4, r7, r8, r9, sp}
    0x7664d5f0: svclt  #0x1c28ff
    0x7664d5f4: .long  0x701820ff                ; unknown opcode
    0x7664d5f8: andeq  r4, r0, r0, ror r7
    0x7664d5fc: andeq  r0, r0, r0

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 19 (9 by maintainers)

Most upvoted comments

Yes, based on the register values, it is what it was doing. And I can see it is a bug in the asm JIT_MemCpy helper. It uses the read to check if the address is valid before it jumps to memcpy. However, reading 4 bytes is obviously wrong. It should use just a byte read instead. Based on the comment in the function code, it seems that there used to be a requirement that this function is called only for a 4 byte aligned addresses, but looking at the Windows version of this helper, the code doesn’t require it. https://github.com/dotnet/coreclr/blob/master/src/vm/arm/crthelpers.S#L44-L58 I’ll create a PR with a fix.

Looking at JIT source, there is only a single place that invokes JIT_MemCpy. And the cpblk IL instruction is compiled at that place. Both Unsafe.CopyBlock and Unsafe.CopyBlockUnaligned use cpblk, as you can see here: https://github.com/dotnet/corefx/blob/64c6d9fe5409be14bdc3609d73ffb3fea1f35797/src/System.Runtime.CompilerServices.Unsafe/src/System.Runtime.CompilerServices.Unsafe.il#L162-L206

@aviviadi it will be part of 2.1.8 release as planned. The 2.1 branch should be open for merging the change after 2.1.7 is out.

Great! Thanks.