zephyr: Intel CAVS: Failure in tests/lib/spsc_pbuf

Describe the bug On intel_adsp_cavs25 platform, tests/lib/spsc_pbuf/ failed.

Please also mention any information which could help others to understand the problem you’re facing:

intel_adsp_cavs25
tests/lib/spsc_pbuf No Console Output(Timeout)

To Reproduce Steps to reproduce the behavior:

twister -W --hardware-map /home/ztest/cavs/cavs.map --device-testing -x=CONFIG_BOOT_DELAY=500 -T tests/lib/spsc_pbuf/ --no-skipped-report -vv
See error

Logs and console output

START - test_stress
ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/kernel/sched.c:1764
aborted _current back from dead
E:  ** FATAL EXCEPTION
E:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
E:  **  PC 0xbe0124ee VADDR (nil)
E:  **  PS 0x60b20
E:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:11 CALLINC:2)
E:  **  A0 0xbe0120b8  SP 0xbe01cf00  A2 0x4  A3 0x9e022b5c
E:  **  A4 0x1  A5 0x60b20  A6 0x1f  A7 0x1
E:  **  A8 0x1  A9 (nil) A10 0x1 A11 0x9e022a58
E:  ** A12 0x9e0224c8 A13 0x21 A14 0xbe01cef0 A15 0x4
E:  ** LBEG 0xbe013071 LEND 0xbe013087 LCOUNT (nil)
E:  ** SAR 0x4
Backtrace:0xBE0124EB:0xBE01CF00 0xBE0120B5:0xBE01CF10 0xBE016FF8:0xBE01CF30 0xBE011773:0xBE01CF70 
E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
E: Current thread: 0x9e0224c8 (ztress_0)
E: Halting system
0% remaining:4000 ms
0% remaining:2999 ms
0% remaining:1999 ms
0% remaining:999 ms
0% remaining:0 ms
0% remaining:0 ms
0% remaining:0 ms
0% remaining:0 ms
0% remaining:0 ms
0% remaining:0 ms
0% remaining:0 ms
0% remaining:0 ms
0% remaining:0 ms
0% remaining:0 ms
0% remaining:0 ms
0% remaining:0 ms
0% remaining:0 ms
0% remaining:0 ms

Environment (please complete the following information):

OS: Linux
Toolchain - SDK 14.1
Commit SHA: fa055f743fd870809ed2a22dc41f6fb34a6eea46

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 24 (3 by maintainers)

Most upvoted comments

Xtensa tests on intel_adsp have been dealing with fixes like this consistently since KERNEL_COHERENCE landed, actually. It’s, heh, just the staff that changed. 😃

But yes, the rules are: when enabled (which only happens now for SMP Xtensa platforms, but in principle other cache-incoherent architectures might behave similarly), thread stack memory is cached/incoherent and must be treated as local to the currently executing CPU. The kernel will handle flushing appropriately for you when the thread context switches. In general you shouldn’t share it with other threads at all, but it’s possible to do so as long as you pad and align your data to a full cache line (which is variable-sized, check core-isa.h, though right now all affected devices are 64 bytes) and use the Xtensa cache API (don’t use the HAL, use our code, it’s better) to flush changes and invalidate before use. There is automatic support in the kernel that detects obvious mistakes like putting spinlocks or waitq’s (i.e. typical IPC primitives) on the stack.

andyross on Aug 31, 2022

Stack is not made coherent - that’s the whole issue. Note that while the config option is called “kernel_coherence”, it’s actually there to deal with incoherent cache: https://docs.zephyrproject.org/latest/kconfig.html#CONFIG_KERNEL_COHERENCE.

It’s illegal to share data on the stack among the CPUs, as stack is in the cache (which is incoherent). .bss section is put on uncached memory, thus, doesn’t suffer the incoherence issue.

edersondisouza on Aug 30, 2022

Hmm, I think that this is falling in the cache incoherence trap. And based on some fixes for this issue (like 4796037cf721c6f6416d446e99ec02521542b949), it seems that having static data in the ZTRESS_EXECUTE macro is the way to go. Can you send a PR for that?

edersondisouza on Aug 30, 2022

@smrtos 0cpy means special mode where data is not copied into the buffer but rather first space within the buffer is allocated. Data is written directly to that space and then buffer is commited.

nordic-krch on Aug 11, 2022