zephyr: ARM M4 MPU backed userspace livelocks on stack overflow when FPU enabled

Describe the bug When using the MPU-backed userspace feature on the NRF52840 with the FPU enabled, a correctly-constructed stack overflow of the user thread can lead to an unrecoverable livelock where the overflowed thread never successfully aborts.

I am using the userspace feature to sandbox some 3rd-party code that involves a lot of parsing, etc, and want recover in the case of a crash in userspace. I have implemented a k_sys_fatal_error_handler that returns to the kernel to abort userspace threads that have crashed, and normally (oops, divide-by-zero, etc) this works fine. However, when the FPU is enabled (CONFIG_FPU=y) some instances of stack overflow instead result the system looping hard faults and never aborting the user thread. Note that the FPU doesn’t need to actually be in use for this to happen. I couldn’t reproduce this issue with a kernel thread and CONFIG_HW_STACK_PROTECTION.

Whether or not this occurs depends on by how much the overflowing stack frame exceeds the MPU region. It seems that small overflows behave correctly (or at least are recoverable) where large overflows lead to the livelock behavior.

The exact behavior depends also on the state of the scheduler. Having a second thread runnable at abort-time seems to change the behavior, and exposes another edge case where it double-faults from the overflowing thread but can recover (there are still overflow amounts where it livelocks in this case.)

It’s possible to detect this livelock in k_sys_fatal_error_handler by seeing if k_current_get()->base.thread_state is _THREAD_ABORTING or _THREAD_DEAD, and in that case halt the system.

To Reproduce Clone https://github.com/602p/zephyr-livelock-repro and run on an NRF52840DK. I’d suspect it would happen on other M4 devices too.

Uncomment #define ENABLE_MITIGATION in main.c to detect and halt on livelock condition. Uncomment #define MODE_JOIN to cause the main thread to join on the user thread instead of polling; changes exact livelock behaviour. See char to_try[] = { for the exact different cases of failure depending on the amount by which the user thread overflows it’s stack.

Expected behavior All amounts of overflow should be recoverable or, at minimum, should bugcheck the kernel instead of livelocking.

Impact Mitigable, but still problematic because it changes the security model of using userspace to sandbox code since it could still crash the whole device.

Logs and console output See https://github.com/602p/zephyr-livelock-repro/blob/main/log.txt . The script tries overflowing a user task by increasingly large stack frames. The first few recover normally, the last gets livelocked.

Environment (please complete the following information):

OS: Linux
Toolchain: This repo’s zephyr, gcc-arm-none-eabi-9-2020-q2-update
Zephyr commit f86f6e00251066815b0a7a7ad45047b525b3fab9 (originally identified on Nordic NCS Zephyr)

Cause conjecture Stack overflow is handled at zephyr/arch/arm/core/aarch32/cortex_m/fault.c:325, which resets the stack pointer to a valid address before returning to that task with PendSv set to execute context switching from the user task. Is Zephyr forgetting to also reset some part of the [lazy] FP save/restore info (Floating Point Context Address Register?) and that remains invalid, triggering another MPU fault while trying to ret into the user task to swap it out?

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 25 (7 by maintainers)

Most upvoted comments

@carlescufi @MaureenHelm I think we should discuss this on the dev review call. Flagging it for review/discussion, and @theotherjimmy and myself can attend.

microbuilder on Mar 18, 2022

@602p That check actually halts before the call to k_sys_fatal_error_handler(), so we could reasonably call k_panic there as you suggested and the user would only see the panic.

theotherjimmy on Mar 29, 2022

I have run the sample code on sam_e70_xplained board and can confirm that the problem can be observed also on SAM E70 SoC (Cortex-M7). The exact behavior in regards to how likely the problem is to manifest itself

Whether or not this occurs depends on by how much the overflowing stack frame exceeds the MPU region. It seems that small overflows behave correctly (or at least are recoverable) where large overflows lead to the livelock behavior.

is different. In case of SAM E70 device it is much easier to trigger the invalid behavior. However, the failure mode is exactly the same.

I have analyzed the issue a bit and it does not look good. It seems we have a larger problem in handling stack overflow condition in user mode on ARM platform. As issue’s author has written

Stack overflow is handled at zephyr/arch/arm/core/aarch32/cortex_m/fault.c:325, which resets the stack pointer to a valid address before returning to that task with PendSv set to execute context switching from the user task.

That’s the line of code and relevant excerpt from the comments

	* [...] Therefore,
	* we manually force the stack pointer to the
	* lowest allowed position, inside the thread's
	* stack.
	*
	* Note:
	* [...]
	* The manual adjustment of PSP is safe, as we
	* will not be re-scheduling this thread again
	* for execution; thread stack corruption is a
	* fatal error and a thread that corrupted its
	* stack needs to be aborted.
	*/
__set_PSP(min_stack_ptr);

The stack is reset to the lowest possible stack address, i.e. stack full. That works well unless the stack overflow is triggered while entering SVCall exception. SVCall has higher priority than PendSV exception. Unlike the comment suggests manual adjustment of PSP is not safe in this case since after PSP is adjusted and MemManage exception exits the processor will continue executing SVCall handler. Relevant excerpt from z_arm_svc handler in arch/arm/core/aarch32/swap_helper.S:

SECTION_FUNC(TEXT, z_arm_svc)
    tst lr, #_EXC_RETURN_SPSEL_Msk /* did we come from thread mode ? */
    ite eq  /* if zero (equal), came from handler mode */
        mrseq r0, MSP   /* handler mode, stack frame is on MSP */
        mrsne r0, PSP   /* thread mode, stack frame is on PSP */

    /* Figure out what SVC call number was invoked */

    ldr r1, [r0, #24]   /* grab address of PC from stack frame */
    /* SVC is a two-byte instruction, point to it and read the
     * SVC number (lower byte of SCV instruction)
     */
    ldrb r1, [r1, #-2]

The hard fault seen in the logs is triggered by the last line of the above code. We’ve entered from the thread mode so the routine is using PSP to retrieve SVC number from the corrupted stack. Again, the stack is corrupted since upon SVCall exception entry the processor tried to save the current state (exception frame) on the PSP (application stack). This failure mode happens always when the stack overflow is caused by SVC exception entry. It was observed when FPU is enabled since exception frame with floating-point storage is much larger (26 4-byte words) than exception frame without floating-point storage (8 4-byte words). Additionally the exception frame with floating-point storage is mostly 0s. That’s relevant since when executing

    ldrb r1, [r1, #-2]

the value of r1 register is taken from the stack. If the value of r1 is 0 we get a hard fault. It is a “precise data bus error” which means that upon exception exit we are going to execute exactly the same line of code triggering the same hard fault. Thus the locking behavior. If the value of r1 - 2 happens to point to the location that processor can read then it will retrieve some bogus SVC call number. This will either cause kernel oops if the SVC call number is detected as invalid or we are going to execute some random service call.

Looking at the sample application code

void stack_overflow(int stride) {
	char arr[stride];
	LOG_DBG("Overflowing, &arr=%p", &arr);
	stack_overflow(stride);
}

The invalid behavior was observed when calling

	LOG_DBG("Overflowing, &arr=%p", &arr);

and not when calling stack_overflow(stride);. Behind the scenes LOG_DBG macro is executing SVC instruction.

Since SVCall exception cannot be executed due to corrupted stack MemManage fault handler needs to clear SVCALLPENDED bit in SHCSR register. E.g. by executing

	if (SCB->SHCSR & SCB_SHCSR_SVCALLPENDED_Msk) {
		SCB->SHCSR &= ~SCB_SHCSR_SVCALLPENDED_Msk;
	}

When working on the fix we should take into account that:

MemManage fault can be triggered during exception entry caused by any other exception, not necessary SVCall. However, most likely only SVCall is a problem.
SysTick exception can be generated while we are handling MemManage fault. Again, not necessary a problem but it has higher priority than PendSV exception and will be executed first.

mnkp on Feb 22, 2022