zephyr: stm32: multi-threading broken after #40173

Describe the bug Following merge of #40173, some multi-threading use cases are not working anymore. This change aimed at introducing a lock on uart tx in order to avoid conflicts between competitive calls to uart_poll and fifo_fill. While it fixed the targeted issue, it introduced a regression in multi-thread use cases.

Regression can be observed with the following samples: samples/kernel/metairq_dispatch/sample.kernel.metairq_dispatch samples/philosophers/sample.kernel.philosopher samples/philosophers/sample.kernel.philosopher.fifos samples/philosophers/sample.kernel.philosopher.lifos samples/philosophers/sample.kernel.philosopher.same_prio samples/philosophers/sample.kernel.philosopher.semaphores samples/philosophers/sample.kernel.philosopher.stacks samples/philosophers/sample.kernel.philosopher.static samples/subsys/portability/cmsis_rtos_v1/philosophers/sample.portability.cmsis_rtos_v1.philosopher and so on …

Logs and console output Info thread on sample.kernel.philosopher reports the following:

0x08005100 in atomic_cas (new_value=1, old_value=0, target=0x20000020 <uart_stm32_data_0+8>) at /local/mcu/zephyrproject/zephyr/include/sys/atomic_builtin.h:40
40		return __atomic_compare_exchange_n(target, &old_value, new_value,
(gdb) info threads
  Id   Target Id                                                  Frame 
* 1    Thread 536871920 (Name: Philosopher 5, prio:-2,useropts:4) 0x08005100 in atomic_cas (new_value=1, old_value=0, target=0x20000020 <uart_stm32_data_0+8>)
    at /local/mcu/zephyrproject/zephyr/include/sys/atomic_builtin.h:40
Info : Getting thread 536871744 reg list
  2    Thread 536871744 (Name: Philosopher 4, prio:-1,useropts:4) 0x080013c0 in arch_irq_unlock (key=0)
    at /local/mcu/zephyrproject/zephyr/include/arch/arm/aarch32/asm_inline_gcc.h:95
Info : Getting thread 536871568 reg list
  3    Thread 536871568 (Name: Philosopher 3, prio:0,useropts:4)  0x080013c0 in arch_irq_unlock (key=0)
    at /local/mcu/zephyrproject/zephyr/include/arch/arm/aarch32/asm_inline_gcc.h:95
Info : Getting thread 536871392 reg list
  4    Thread 536871392 (Name: Philosopher 2, prio:1,useropts:4)  0x080013c0 in arch_irq_unlock (key=0)
    at /local/mcu/zephyrproject/zephyr/include/arch/arm/aarch32/asm_inline_gcc.h:95
Info : Getting thread 536871216 reg list
  5    Thread 536871216 (Name: Philosopher 1, prio:2,useropts:4)  0x080013c0 in arch_irq_unlock (key=0)
    at /local/mcu/zephyrproject/zephyr/include/arch/arm/aarch32/asm_inline_gcc.h:95
Info : Getting thread 536871040 reg list
  6    Thread 536871040 (Name: Philosopher 0, prio:3,useropts:4)  0x08005118 in LL_USART_IsActiveFlag_TXE (USARTx=0x40013800)
    at /local/mcu/zephyrproject/modules/hal/stm32/stm32cube/stm32l4xx/drivers/include/stm32l4xx_ll_usart.h:3161
Info : Getting thread 536872096 reg list
  7    Thread 536872096 (Name: idle 00, prio:40,useropts:1)       arch_cpu_idle () at /local/mcu/zephyrproject/zephyr/arch/arm/core/aarch32/cpu_idle.S:110

Philosopher 0 (prio:3) is preempted in uart_stm32_poll_out during TX (while waiting for flag TXE: LL_USART_IsActiveFlag_TXE), tx_lock is granted Philosopher 5 (prio:-2) is running but stuck in uart_stm32_poll_out, waiting for tx_lock to be released, which won’t happen as it is taken by lower prio thread. -> Deadlock

To Reproduce Run one of the above on any STM32 board

Expected behavior These samples should be passed

Impact Dead-lock situations in multi-threaded contexts

Environment (please complete the following information):

  • Toolchain: Zephyr SDK 0.13.1
  • Commit SHA: Starting d42cef17b06465393462210a224fdc5cc58b1490

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 23 (15 by maintainers)

Most upvoted comments

I think it is low-level enough that a generic, API-level solution would not work on all drivers, and might require some drivers some significant gymnastics to work correctly.

Since you have mentioned the NRF driver, one significant difference is that NRF driver does k_msleep() if it cannot hold the TX lock. Maybe it’s something to be implemented in the STM32 driver?

Yes, this tests was not writed to switch tasks.

@DariusBabrauskas, I had some thinking about this and I wonder if switching task on poll_out would make any sense. When multiple threads are using same console output, switching tasks on poll_out would always have the consequence of mixing the chars at output, and hence application output would be garbled. I can’t think of one application where mixing threads messages (hence useful information) to a garbled output would be useful, even with a performance gain.

The new thread that run after a k_yield is not necessarily a thread that writes a message to the console.