zephyr: stm32: multi-threading broken after #40173
Describe the bug Following merge of #40173, some multi-threading use cases are not working anymore. This change aimed at introducing a lock on uart tx in order to avoid conflicts between competitive calls to uart_poll and fifo_fill. While it fixed the targeted issue, it introduced a regression in multi-thread use cases.
Regression can be observed with the following samples: samples/kernel/metairq_dispatch/sample.kernel.metairq_dispatch samples/philosophers/sample.kernel.philosopher samples/philosophers/sample.kernel.philosopher.fifos samples/philosophers/sample.kernel.philosopher.lifos samples/philosophers/sample.kernel.philosopher.same_prio samples/philosophers/sample.kernel.philosopher.semaphores samples/philosophers/sample.kernel.philosopher.stacks samples/philosophers/sample.kernel.philosopher.static samples/subsys/portability/cmsis_rtos_v1/philosophers/sample.portability.cmsis_rtos_v1.philosopher and so on …
Logs and console output
Info thread on sample.kernel.philosopher reports the following:
0x08005100 in atomic_cas (new_value=1, old_value=0, target=0x20000020 <uart_stm32_data_0+8>) at /local/mcu/zephyrproject/zephyr/include/sys/atomic_builtin.h:40
40 return __atomic_compare_exchange_n(target, &old_value, new_value,
(gdb) info threads
Id Target Id Frame
* 1 Thread 536871920 (Name: Philosopher 5, prio:-2,useropts:4) 0x08005100 in atomic_cas (new_value=1, old_value=0, target=0x20000020 <uart_stm32_data_0+8>)
at /local/mcu/zephyrproject/zephyr/include/sys/atomic_builtin.h:40
Info : Getting thread 536871744 reg list
2 Thread 536871744 (Name: Philosopher 4, prio:-1,useropts:4) 0x080013c0 in arch_irq_unlock (key=0)
at /local/mcu/zephyrproject/zephyr/include/arch/arm/aarch32/asm_inline_gcc.h:95
Info : Getting thread 536871568 reg list
3 Thread 536871568 (Name: Philosopher 3, prio:0,useropts:4) 0x080013c0 in arch_irq_unlock (key=0)
at /local/mcu/zephyrproject/zephyr/include/arch/arm/aarch32/asm_inline_gcc.h:95
Info : Getting thread 536871392 reg list
4 Thread 536871392 (Name: Philosopher 2, prio:1,useropts:4) 0x080013c0 in arch_irq_unlock (key=0)
at /local/mcu/zephyrproject/zephyr/include/arch/arm/aarch32/asm_inline_gcc.h:95
Info : Getting thread 536871216 reg list
5 Thread 536871216 (Name: Philosopher 1, prio:2,useropts:4) 0x080013c0 in arch_irq_unlock (key=0)
at /local/mcu/zephyrproject/zephyr/include/arch/arm/aarch32/asm_inline_gcc.h:95
Info : Getting thread 536871040 reg list
6 Thread 536871040 (Name: Philosopher 0, prio:3,useropts:4) 0x08005118 in LL_USART_IsActiveFlag_TXE (USARTx=0x40013800)
at /local/mcu/zephyrproject/modules/hal/stm32/stm32cube/stm32l4xx/drivers/include/stm32l4xx_ll_usart.h:3161
Info : Getting thread 536872096 reg list
7 Thread 536872096 (Name: idle 00, prio:40,useropts:1) arch_cpu_idle () at /local/mcu/zephyrproject/zephyr/arch/arm/core/aarch32/cpu_idle.S:110
Philosopher 0 (prio:3) is preempted in uart_stm32_poll_out during TX (while waiting for flag TXE: LL_USART_IsActiveFlag_TXE), tx_lock is granted
Philosopher 5 (prio:-2) is running but stuck in uart_stm32_poll_out, waiting for tx_lock to be released, which won’t happen as it is taken by lower prio thread.
-> Deadlock
To Reproduce Run one of the above on any STM32 board
Expected behavior These samples should be passed
Impact Dead-lock situations in multi-threaded contexts
Environment (please complete the following information):
- Toolchain: Zephyr SDK 0.13.1
- Commit SHA: Starting d42cef17b06465393462210a224fdc5cc58b1490
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 23 (15 by maintainers)
I think it is low-level enough that a generic, API-level solution would not work on all drivers, and might require some drivers some significant gymnastics to work correctly.
Since you have mentioned the NRF driver, one significant difference is that NRF driver does
k_msleep()if it cannot hold the TX lock. Maybe it’s something to be implemented in the STM32 driver?The new thread that run after a k_yield is not necessarily a thread that writes a message to the console.