zephyr: posix: pthread: race condition between pthread_create() and pthread_join()

Previous Title posix_common derived test fails on some qemu platforms when the CONFIG_TICKLESS_KERNEL=n is added

Describe the bug posix_common derived test fails on some qemu platforms when the CONFIG_TICKLESS_KERNEL=n is added

Please also mention any information which could help others to understand the problem you’re facing:

  1. add a delay in the test_pthread_descriptor_leak loop can help
  2. use tickless mode can work as well
  • Is this a regression? If yes, have you been able to “git bisect” it to a specific commit? No.

To Reproduce Steps to reproduce the behavior:

  1. add CONFIG_TICKLESS_KERNEL=n to prj.conf
  2. do below
west build -b qemu_cortex_r5 -- -DCONFIG_NEWLIB_LIBC=y
west build -t run

Expected behavior PASS

Impact some potential race condition in pthread handling

Logs and console output

*** Booting Zephyr OS build zephyr-v3.3.0-1516-g1dce3c3ee2e9 ***
...
===================================================================
START - test_pthread_descriptor_leak

    Assertion failed at WEST_TOPDIR/zephyr/tests/posix/common/src/pthread.c:598: posix_apis_test_pthread_descriptor_leak: (pthread_join(pthread1, &unused) is non-zero)
unable to join thread 31
 FAIL - test_pthread_descriptor_leak in 0.032 seconds
===================================================================
START - test_sleep
ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/kernel/sched.c:1797
        aborted _current back from dead
E: r0/a1:  0x00000004  r1/a2:  0x00000705  r2/a3:  0xff00002c
E: r3/a4:  0x000089b9 r12/ip:  0x000242e8 r14/lr:  0x0000b2a1
E:  xpsr:  0x0000013f
E: Faulting instruction address (r15/pc): 0x0000c934
E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
E: Current thread: 0x20038 (unknown)
E: Halting system
...

Environment (please complete the following information):

  • OS: (e.g. Linux, )
  • Toolchain (e.g Zephyr SDK, …)
  • Commit SHA or Version used: zephyr-v3.3.0-1516-g1dce3c3ee2e9

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 16 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Can you file that as a bug? There really shouldn’t be as I’m reading the code.

See commit message in linked PR. The root cause was a hand-rolled attempt at pthread_join() when it needed to be using the underlying k_thread_join() instead.