zephyr: QEMU bug with branch delay slots on ARC

(Submitting this to Zephyr just to have a fix to link with the full description from the workaround patch. We probably want to open a toolchain bug for the broader problem?)

As of recent commits, qemu_arc_em is failing in the tests/kernel/mem_protect/syscalls test with:

START - test_syscall_torture
Running syscall torture test with 4 threads on 1 cpu(s)
E: ***** Exception vector: 0x2, cause code: 0x1, parameter 0x0
E: Address 0x800014a4
E: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
E: Current thread: 0x804009f0 (unknown)
E: Halting system

The proximate cause was (hilariously) that the patch count since the last release candidate had reached 100. This caused the version string printed by the boot banner to be one byte longer and exposed the bug. (I actually got a test rig created where two Zephyr binaries that differed ONLY in whether the last byte of a fake banner string was a “x” or a newline would differ in crash behavior).

But as it turns out that’s all just timing interaction. The real problem happens in the heap code, and is a compiler bug. @ruuddw pointed out that the fault (at 0x800014a4) is actually flagging an illegal instruction in a branch delay slot. And indeed, the disassembly below shows that the faulting instruction is a ENTER_S (one of the forbidden instructions), and that the instruction immediately preceding is indeed an unconditional branch!

This is a clear optimizer bug. The generated code can’t be allowed to emit a linker section that ends on an instruction with a branch delay slot, because it can’t control the instruction the linker will place next. And indeed, stuffing a NOP instruction at the end of the function fixes the symptom

8000146c <merge_chunks>:
{
8000146c:       c2e8                    enter_s [r13-r16,blink]
8000146e:       4748                    mov_s   r15,r2
80001470:       4508                    mov_s   r13,r0
80001472:       4030                    mov_s   r16,r1
        chunksz_t newsz = chunk_size(h, lc) + chunk_size(h, rc);
80001474:       0e1e ffcf               bl      -484    ;80001290 <chunk_size>
80001478:       41e1                    mov_s   r1,r15
8000147a:       4608                    mov_s   r14,r0
8000147c:       40a1                    mov_s   r0,r13
8000147e:       0e16 ffcf               bl      -492    ;80001290 <chunk_size>
        set_chunk_size(h, lc, newsz);
80001482:       4102                    mov_s   r1,r16
        chunksz_t newsz = chunk_size(h, lc) + chunk_size(h, rc);
80001484:       661e                    add_s   r14,r14,r0
        set_chunk_size(h, lc, newsz);
80001486:       40a1                    mov_s   r0,r13
80001488:       42c1                    mov_s   r2,r14
8000148a:       0e4a ffcf               bl      -440    ;800012d0 <set_chunk_siz
e>
        return c + chunk_size(h, c);
8000148e:       41e1                    mov_s   r1,r15
80001490:       40a1                    mov_s   r0,r13
80001492:       0e02 ffcf               bl      -512    ;80001290 <chunk_size>
        chunk_set(h, c, LEFT_SIZE, size);
80001496:       43c1                    mov_s   r3,r14
80001498:       6719                    add_s   r1,r15,r0
8000149a:       704c                    mov_s   r2,0
8000149c:       40a1                    mov_s   r0,r13
}
8000149e:       c2c8                    leave_s [r13-r16,blink]
800014a0:       05d1 ffcf               b       -560    ;80001270 <chunk_set>

800014a4 <free_list_add>:
{
800014a4:       c2e8                    enter_s [r13-r16,blink]
        return big_heap_chunks(h->end_chunk);
800014a6:       80e2                    ld_s    r15,[r0,0x8]
800014a8:       4628                    mov_s   r14,r1
800014aa:       4508                    mov_s   r13,r0

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 24 (11 by maintainers)

Commits related to this issue

Most upvoted comments

@stephanosio we’re looking into it. In the meantime we’ll try to submit a temporary work-around with -singlestep passed to QEMU.