zephyr: problem with CONFIG_STACK_SENTINEL

After pulling the latest zephyr commit my firmware breaks when a task switch/isr is done - somehow reliably

[00:00:05.859,000] <dbg> BATTERY.battery_thread: start battery sensing
[00:00:05.8[00:00:05.867,000] <err> os: r0/a1:  0x00000002  r1/a2:  0x20004a80  r2/a3:  0xf0f0f0f0
[00:00:05.875,000] <err> os: r3/a4:  0x20004c80 r12/ip:  0x00000020 r14/lr:  0x08019f73
[00:00:05.884,000] <err> os:  xpsr:  0x41000000
[00:00:05.889,000] <err> os: s[ 0]:  0x00000000  s[ 1]:  0x00000000  s[ 2]:  0x00000000  s[ 3]:  0x00000000
[00:00:05.900,000] <err> os: s[ 4]:  0x00000000  s[ 5]:  0x00000000  s[ 6]:  0x00000000  s[ 7]:  0x00000000
[00:00:05.910,000] <err> os: s[ 8]:  0x00000000  s[ 9]:  0x00000000  s[10]:  0x00000000  s[11]:  0x00000000
[00:00:05.920,000] <err> os: s[12]:  0x00000000  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0x00000000
[00:00:05.931,000] <err> os: fpscr:  0x00000000
[00:00:05.936,000] <err> os: EXC_RETURN: 0x0
[00:00:05.941,000] <err> os: Faulting instruction address (r15/pc): 0x0801a38a
[00:00:05.949,000] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
[00:00:05.957,000] <err> os: Current thread: 0x20001678 (batteryThread_id)

checking the map file:

.text.z_impl_k_thread_name_set
                0x000000000801a33c       0x24 kernel/libkernel.a(thread.c.obj)
                0x000000000801a33c                z_impl_k_thread_name_set
 .text.z_check_stack_sentinel
                0x000000000801a360       0x30 kernel/libkernel.a(thread.c.obj)
                0x000000000801a360                z_check_stack_sentinel
 .text.schedule_new_thread
                0x000000000801a390       0x1c kernel/libkernel.a(thread.c.obj)```

reveals that the z_check_stack_sentinel function fails.

Inside the battery thread an ADC(DMA driven) conversion is triggered.

// adc dma isr handler
ISR_DIRECT_DECLARE(DMA2_Stream0_IRQHandler) {
	ISR_DIRECT_HEADER();
	// Check transfer-complete interrupt
	if (LL_DMA_IsEnabledIT_TC(ADC1_DMA, ADC1_DMA_STREAM) && LL_DMA_IsActiveFlag_TC0(ADC1_DMA)) {
		LL_DMA_ClearFlag_TC0(ADC1_DMA);             // Clear half-transfer complete flag
		LL_ADC_Disable(ADC1); // disable adc
		// indicate that we are done
		k_sem_give(&adc1_dma.adcDone);
	}
	ISR_DIRECT_FOOTER(1);
	ISR_DIRECT_PM(); // PM done after servicing interrupt for best latency
	return 1; // We should check if scheduling decision should be made
}

A semaphore is set at the isr. The battery thread is waiting for that semaphore. Guess that mechanism breaks the sentinel.

When disabling CONFIG_STACK_SENTINEL my firmware works perfectly.

Anyone that has the same problem?

My arch STM32F412 + custom board + custom shield

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 30 (13 by maintainers)

Most upvoted comments

I am closing that issue, because it can not reproduced. Important lesson learned is that HW_STACK_PROTECTION and STACK_SENTINEL are not necessary compatible. Thx to @dcpleung and @ioannisg

In theory, stack sentinel should not be used together with HW_STACK_PROTECTION on Cortex-M. It’s not needed. I’ve not checked, though, if these can co-exist smoothly.

Thx @dcpleung for testing. I am currently on parental leave. I will try to recheck at the beginning of April.

I got the nucleo_f429zi board but I have not been able to reproduce it with the following apps and CONFIG_STACK_SENTINEL=y:

  • samples/hello_world
  • tests/kernel/sched/schedule_api
  • tests/kernel/threads/thread_apis
  • tests/kernel/workq/work_queue
  • tests/kernel/fatal/exception

Is this issue reproducible with any samples or tests apps?

@StefJar I’d rather move this to question since this rather looks like a question on STACK_SENTINEL behavior for now.

@nashif I let you check how can have a check on this as it rather looks like a kernel question (STM32 code extract is not upstream code)