zephyr: Various problems with pipes (Not unblocking, Data Access Violation, unblocking wrong thread...)
Describe the bug I’m trying to use a pipe object to send data from a UART IRQ (async) to a thread. Since f86027f this should be possible, but there is some weird behaviour. It could be multiple bugs, but I suspect that these are related, so I’m combining them here. Here is a short description of the bugs (details below):
k_pipe_get(..., K_FOREVER)does not unlock if data is written after the read call is blocking. I worked around this issue by pollingk_pipe_getwith a small timeout. Code to reproduce this error is added below.- Sporadic Data Access Violation because of NULL-Pointer in
dest_listinz_impl_k_pipe_put(), ifk_pipe_put()is called from IRQ. It seems that the list is ok after being filled inpipe_waiter_list_populate(), but by the timepipe_write()is called the list isn’t valid anymore. This can be caught by checkingprevandnextinsys_dlist_remove(). Code to reproduce this error is added below. - Unused pipes that wait for data sometimes unlock with “return value==0” and “bytes_read==buffer_size” although no data was received. I was not able to produce a minimal working example for this error or narrow this down. But this happens with completly unrelated pipes, so I suspect, that the
waiter_listfrom the previous point leads to this problem and unblocks a wrong thread.
The code to reproduce these issues can be seen below. I could only really narrow down the second error. Here the stack trace and log output for the second error.
k_sys_fatal_error_handler(unsigned int reason, const z_arch_esf_t * esf) (c:\ncs\v2.2.0-rc1\zephyr\kernel\fatal.c:41)
z_fatal_error(unsigned int reason, const z_arch_esf_t * esf) (c:\ncs\v2.2.0-rc1\zephyr\kernel\fatal.c:131)
z_arm_fatal_error(unsigned int reason, const z_arch_esf_t * esf) (c:\ncs\v2.2.0-rc1\zephyr\arch\arm\core\aarch32\fatal.c:63)
z_arm_fault(uint32_t msp, uint32_t psp, uint32_t exc_return, _callee_saved_t * callee_regs) (c:\ncs\v2.2.0-rc1\zephyr\arch\arm\core\aarch32\cortex_m\fault.c:1097)
z_arm_usage_fault() (c:\ncs\v2.2.0-rc1\zephyr\arch\arm\core\aarch32\cortex_m\fault_s.S:102)
<signal handler called> (Unbekannte Quelle:0)
sys_dlist_remove(sys_dnode_t * node) (c:\ncs\v2.2.0-rc1\zephyr\include\zephyr\sys\dlist.h:501)
sys_dlist_get(sys_dlist_t * list) (c:\ncs\v2.2.0-rc1\zephyr\include\zephyr\sys\dlist.h:522)
pipe_write(struct k_pipe * pipe, sys_dlist_t * src_list, sys_dlist_t * dest_list, _Bool * reschedule) (c:\ncs\v2.2.0-rc1\zephyr\kernel\pipes.c:307)
z_impl_k_pipe_put(struct k_pipe * pipe, void * data, size_t bytes_to_write, size_t * bytes_written, size_t min_xfer, k_timeout_t timeout) (c:\ncs\v2.2.0-rc1\zephyr\kernel\pipes.c:420)
k_pipe_put(size_t min_xfer, size_t * bytes_written, size_t bytes_to_write, void * data, struct k_pipe * pipe) (c:\ncs\projects\eblem\build_eblem\zephyr\include\generated\syscalls\kernel.h:1284)
test_write(const char * str) (c:\ncs\projects\eblem\src\main.c:213)
test_write_irq() (c:\ncs\projects\eblem\src\main.c:237)
uart_cb(const struct device * dev, struct uart_event * evt, void * user_data) (c:\ncs\projects\eblem\src\uart\ngp_uart.c:27)
user_callback(const struct device * dev, struct uart_event * evt) (c:\ncs\v2.2.0-rc1\zephyr\drivers\serial\uart_nrfx_uarte.c:849)
notify_uart_rx_rdy(const struct device * dev, size_t len) (c:\ncs\v2.2.0-rc1\zephyr\drivers\serial\uart_nrfx_uarte.c:863)
rx_timeout(struct k_timer * timer) (c:\ncs\v2.2.0-rc1\zephyr\drivers\serial\uart_nrfx_uarte.c:1102)
z_timer_expiration_handler(struct _timeout * t) (c:\ncs\v2.2.0-rc1\zephyr\kernel\timer.c:63)
sys_clock_announce(int32_t ticks) (c:\ncs\v2.2.0-rc1\zephyr\kernel\timeout.c:270)
sys_clock_timeout_handler(int32_t chan, uint64_t expire_time, void * user_data) (c:\ncs\v2.2.0-rc1\zephyr\drivers\timer\nrf_rtc_timer.c:422)
process_channel(int32_t chan) (c:\ncs\v2.2.0-rc1\zephyr\drivers\timer\nrf_rtc_timer.c:494)
rtc_nrf_isr(const void * arg) (c:\ncs\v2.2.0-rc1\zephyr\drivers\timer\nrf_rtc_timer.c:518)
_isr_wrapper() (c:\ncs\v2.2.0-rc1\zephyr\arch\arm\core\aarch32\isr_wrapper.S:259)
<signal handler called> (Unbekannte Quelle:0)
arch_swap(unsigned int key) (c:\ncs\v2.2.0-rc1\zephyr\arch\arm\core\aarch32\swap.c:53)
z_swap_irqlock(unsigned int key) (c:\ncs\v2.2.0-rc1\zephyr\kernel\include\kswap.h:185)
z_swap(struct k_spinlock * lock) (c:\ncs\v2.2.0-rc1\zephyr\kernel\include\kswap.h:196)
z_pend_curr(struct k_spinlock * lock, k_spinlock_key_t key, _wait_q_t * wait_q, k_timeout_t timeout) (c:\ncs\v2.2.0-rc1\zephyr\kernel\sched.c:859)
z_sched_wait(struct k_spinlock * lock, k_spinlock_key_t key, _wait_q_t * wait_q, k_timeout_t timeout, void ** data) (c:\ncs\v2.2.0-rc1\zephyr\kernel\sched.c:1912)
pipe_get_internal(k_spinlock_key_t key, struct k_pipe * pipe, void * data, size_t bytes_to_read, size_t * bytes_read, size_t min_xfer, k_timeout_t timeout) (c:\ncs\v2.2.0-rc1\zephyr\kernel\pipes.c:635)
z_impl_k_pipe_get(struct k_pipe * pipe, void * data, size_t bytes_to_read, size_t * bytes_read, size_t min_xfer, k_timeout_t timeout) (c:\ncs\v2.2.0-rc1\zephyr\kernel\pipes.c:668)
k_pipe_get(size_t min_xfer, size_t * bytes_read, size_t bytes_to_read, void * data, struct k_pipe * pipe) (c:\ncs\projects\eblem\build_eblem\zephyr\include\generated\syscalls\kernel.h:1309)
test_read(char * data, size_t len) (c:\ncs\projects\eblem\src\main.c:226)
test_thread_entry(void * x, void * y, void * z) (c:\ncs\projects\eblem\src\main.c:245)
z_thread_entry(k_thread_entry_t entry, void * p1, void * p2, void * p3) (c:\ncs\v2.2.0-rc1\zephyr\lib\os\thread_entry.c:36)
z_thread_entry(k_thread_entry_t entry, void * p1, void * p2, void * p3) (c:\ncs\v2.2.0-rc1\zephyr\lib\os\thread_entry.c:36)
[Unknown code] (Unbekannte Quelle:0)
*** Booting Zephyr OS build v3.2.99-ncs1-rc1 ***
[00:00:05.693,939] <inf> main: test_write: test_write("Hello World")
[00:00:05.694,305] <inf> main: test_read: Read OK (11 bytes, "Hello World")
[00:00:11.715,454] <inf> main: test_write: test_write("Hello World")
[00:00:11.715,606] <inf> main: test_read: Read OK (11 bytes, "Hello World")
[00:00:11.715,667] <inf> main: test_write: test_write("Hello World")
[00:00:11.715,850] <inf> main: test_write: test_write("Hello World")
[00:00:11.715,911] <inf> main: test_read: Read OK (11 bytes, "Hello World")
[00:00:11.716,094] <inf> main: test_read: Read OK (11 bytes, "Hello World")
....
[00:00:56.939,178] <err> os: mem_manage_fault: ***** MPU FAULT *****
[00:00:56.939,208] <err> os: mem_manage_fault: Data Access Violation
[00:00:56.939,208] <err> os: mem_manage_fault: MMFAR Address: 0x0
[00:00:56.939,239] <err> os: esf_dump: r0/a1: 0x20000bf8 r1/a2: 0x20010208 r2/a3: 0x00000000
[00:00:56.939,239] <err> os: esf_dump: r3/a4: 0x00000000 r12/ip: 0x00000000 r14/lr: 0x0003be4f
[00:00:56.939,270] <err> os: esf_dump: xpsr: 0x21000021
[00:00:56.939,300] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x0004673a
[00:00:56.939,331] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:56.939,331] <err> os: z_fatal_error: Fault during interrupt handling
[00:00:56.939,392] <err> os: z_fatal_error: Current thread: 0x20001f98 (test_thread_tid)
[00:01:01.753,631] <err> os: k_sys_fatal_error_handler: Halting system
The second error can be caught early if following code is added to sys_dlist_remove:
__ASSERT(prev, "prev is null");
__ASSERT(next, "next is null");
The second error can also be worked around if I remove the call to pipe_waiter_list_populate() in z_impl_k_pipe_put(). But I’m pretty sure that this messes with the wakeup of other threads. But it shows, that there is some problem with the waiter_list.
To Reproduce
Following code reproduces error 1 (k_pipe_get does not unblock). It’s just one write thread and one read thread, while the order of read/write is mixed. There are two variants in test_read(). The first is the expected solution that does not work (read does not unblock and thread 2 never ends) and the second variant is my workaround that works but is definitely not great. The log output of both variants is included as a comment.
static void test_thread1_entry(void * x, void * y, void * z);
static void test_thread2_entry(void * x, void * y, void * z);
K_THREAD_DEFINE(test_thread1_tid, 512, test_thread1_entry, NULL, NULL, NULL, 5, 0, 0);
K_THREAD_DEFINE(test_thread2_tid, 512, test_thread2_entry, NULL, NULL, NULL, 5, 0, 0);
K_PIPE_DEFINE(test_pipe, 256, 4);
static void test_write(const char * str) {
LOG_INF("test_write(\"%s\")", str);
size_t bytes_written;
int err = k_pipe_put(&test_pipe, (void *) str, strlen(str), &bytes_written, 1, K_NO_WAIT);
if (err || (strlen(str) != bytes_written)){
LOG_WRN(" test_write failed (%d, %d)", err, (int) bytes_written);
} else {
LOG_INF(" Write OK");
}
}
static void test_read(char * data, size_t len) {
LOG_INF("test_read()");
size_t bytes_read;
int err;
/* Variant 1: Expected behaviour (does not unlock) */
// [00:00:04.264,099] <inf> main: test_write: test_write("Hello World")
// [00:00:04.264,129] <inf> main: test_write: Write OK
// [00:00:05.264,373] <inf> main: test_read: test_read()
// [00:00:05.264,434] <inf> main: test_read: Read OK (11 bytes, "Hello World")
// [00:00:06.264,617] <inf> main: test_read: test_read()
// [00:00:07.264,831] <inf> main: test_write: test_write("asdasdf")
// [00:00:07.264,862] <inf> main: test_write: Write OK
// [00:00:08.264,984] <inf> main: test_thread1_entry: Thread 1 done...
err = k_pipe_get(&test_pipe, data, len, &bytes_read, 1, K_FOREVER);
if (err) {
LOG_WRN(" test_read failed (%d, %d)", err, (int) bytes_read);
} else {
LOG_INF(" Read OK (%d bytes, \"%s\")", (int) bytes_read, data);
}
/* Variant 2: Workaround using polling */
// [00:00:02.977,935] <inf> main: test_write: test_write("Hello World")
// [00:00:02.977,966] <inf> main: test_write: Write OK
// [00:00:03.978,179] <inf> main: test_read: test_read()
// [00:00:03.978,240] <inf> main: test_read: Read OK (11 bytes, "Hello World")
// [00:00:04.978,393] <inf> main: test_read: test_read()
// [00:00:05.178,527] <wrn> main: test_read: test_read failed (-11)
// [00:00:05.378,692] <wrn> main: test_read: test_read failed (-11)
// [00:00:05.578,796] <wrn> main: test_read: test_read failed (-11)
// [00:00:05.778,900] <wrn> main: test_read: test_read failed (-11)
// [00:00:05.978,576] <inf> main: test_write: test_write("asdasdf")
// [00:00:05.978,607] <inf> main: test_write: Write OK
// [00:00:05.979,003] <inf> main: test_read: Read OK (7 bytes, "asdasdf")
// [00:00:06.978,759] <inf> main: test_thread1_entry: Thread 1 done...
// [00:00:07.979,278] <inf> main: test_thread2_entry: Thread 2 done...
// do {
// bytes_read = 0;
// err = k_pipe_get(&test_pipe, data, len, &bytes_read, 1, K_MSEC(200));
// if (err != 0) {
// LOG_WRN(" test_read failed (%d)", (int) err);
// }
// } while (err != 0);
// LOG_INF(" Read OK (%d bytes, \"%s\")", (int) bytes_read, data);
}
static void test_thread1_entry(void * x, void * y, void * z) {
/* First Write, then read -> OK */
test_write("Hello World");
k_sleep(K_MSEC(1000));
k_sleep(K_MSEC(1000));
/* First Read, then write -> Fail */
k_sleep(K_MSEC(1000));
test_write("asdasdf");
k_sleep(K_MSEC(1000));
LOG_INF("Thread 1 done...");
while (1) {k_sleep(K_MSEC(1000));}
}
static void test_thread2_entry(void * x, void * y, void * z) {
char data[100];
/* First Write, then read -> OK */
memset(data, 0, sizeof(data));
k_sleep(K_MSEC(1000));
test_read(data, sizeof(data));
k_sleep(K_MSEC(1000));
/* First Read, then write -> Fail */
memset(data, 0, sizeof(data));
test_read(data, sizeof(data));
k_sleep(K_MSEC(1000));
k_sleep(K_MSEC(1000));
LOG_INF("Thread 2 done...");
while (1) {k_sleep(K_MSEC(1000));}
}
The second bug can be reproduced with following code. The function test_write_irq() has to be called periodically from an IRQ. In my case I use an UART IRQ that calls this function on incoming data. But I guess that this should also fail when using another IRQ.
static void test_thread_entry(void * x, void * y, void * z);
K_THREAD_DEFINE(test_thread_tid, 1024, test_thread_entry, NULL, NULL, NULL, 5, 0, 0);
K_PIPE_DEFINE(test_pipe, 256, 4);
static void test_write(const char * str) {
LOG_INF("test_write(\"%s\")", str);
size_t bytes_written;
int err = k_pipe_put(&test_pipe, (void *) str, strlen(str), &bytes_written, 1, K_NO_WAIT);
if (err || (strlen(str) != bytes_written)) {
LOG_WRN(" test_write failed (%d, %d)", err, (int) bytes_written);
}
}
static void test_read(char * data, size_t len) {
int err;
size_t bytes_read;
do {
bytes_read = 0;
err = k_pipe_get(&test_pipe, data, len, &bytes_read, 1, K_MSEC(1));
if ((err != 0) && (err != -EAGAIN)) {
LOG_WRN(" test_read failed (%d)", (int) err);
}
} while (err != 0);
LOG_INF(" Read OK (%d bytes, \"%s\")", (int) bytes_read, data);
}
/* Function has to be called periodically from IRQ */
void test_write_irq(void) {
test_write("Hello World");
}
static void test_thread_entry(void * x, void * y, void * z) {
char data[100];
while (1) {
memset(data, 0, sizeof(data));
test_read(data, sizeof(data));
}
}
Expected behavior I expect the pipe objects to work according to the documentation and not affect unrelated kernel objects.
Impact With these bugs there is basically no way to use pipes to asynchronously wait for data or use them from IRQs. I could use other kernel objects (e.g. FIFO), but I like the copying of the pipe, to make memory management easier.
Logs and console output Logs are included in the top sections.
Environment (please complete the following information):
- OS: Windows 10
- Toolchain: Nordic nRF Connect SDK v2.2.0-rc1 (Based on Zephyr cd16a83; Release candidate used because it includes f86027f)
- MCU: nRF52840
Additional context None
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 21 (2 by maintainers)
Commits related to this issue
- kernel: pipes: ISRs use a private pipe descriptor Fixes sporadic data access violations that were occuring when pipes were being used from an ISR. The ISR was incorrectly using the pipe descriptor be... — committed to peter-mitsis/zephyr by peter-mitsis a year ago
- kernel: pipes: ISRs use a private pipe descriptor Fixes sporadic data access violations that were occuring when pipes were being used from an ISR. The ISR was incorrectly using the pipe descriptor be... — committed to peter-mitsis/zephyr by peter-mitsis a year ago
- kernel: pipes: ISRs use a private pipe descriptor Fixes sporadic data access violations that were occuring when pipes were being used from an ISR. The ISR was incorrectly using the pipe descriptor be... — committed to peter-mitsis/zephyr by peter-mitsis a year ago
- kernel: pipes: ISRs use a private pipe descriptor Fixes sporadic data access violations that were occuring when pipes were being used from an ISR. The ISR was incorrectly using the pipe descriptor be... — committed to peter-mitsis/zephyr by peter-mitsis a year ago
- kernel: pipes: ISRs use a private pipe descriptor Fixes sporadic data access violations that were occuring when pipes were being used from an ISR. The ISR was incorrectly using the pipe descriptor be... — committed to zephyrproject-rtos/zephyr by peter-mitsis a year ago
- kernel: pipes: ISRs use a private pipe descriptor Fixes sporadic data access violations that were occuring when pipes were being used from an ISR. The ISR was incorrectly using the pipe descriptor be... — committed to asemjonovs/zephyr by peter-mitsis a year ago
- kernel: pipes: ISRs use a private pipe descriptor Fixes sporadic data access violations that were occuring when pipes were being used from an ISR. The ISR was incorrectly using the pipe descriptor be... — committed to coreboot/zephyr-cros by peter-mitsis a year ago
- kernel: pipes: ISRs use a private pipe descriptor Fixes sporadic data access violations that were occuring when pipes were being used from an ISR. The ISR was incorrectly using the pipe descriptor be... — committed to hardwario/ncs-zephyr by peter-mitsis a year ago
- kernel: pipes: ISRs use a private pipe descriptor Fixes sporadic data access violations that were occuring when pipes were being used from an ISR. The ISR was incorrectly using the pipe descriptor be... — committed to hardwario/ncs-zephyr by peter-mitsis a year ago
- kernel: pipes: ISRs use a private pipe descriptor Fixes sporadic data access violations that were occuring when pipes were being used from an ISR. The ISR was incorrectly using the pipe descriptor be... — committed to recogni/zephyr by peter-mitsis a year ago
- Backport pipes ISR support fixes (#71) * kernel: Add routine to walk a wait queue Adds a routine to safely walk a specified wait queue and invoke a custom callback function on each waiting thread... — committed to recogni/zephyr by berendo a year ago
I am back from holidays and am continuing to work on this.