zephyr: can: iso-tp: implementation test failed on nucleo_g474re

Describe the bug Running the tests/subsys/canbus/isotp/implementation/ on the nucleo_g474re board, the test fails on

 - FAIL - [isotp_implementation.test_send_receive_net_single_blocks] duration = 1.066 seconds
 - FAIL - [isotp_implementation.test_send_receive_sf] duration = 1.013 seconds
 - FAIL - [isotp_implementation.test_send_receive_single_block] duration = 0.020 seconds

Please also mention any information which could help others to understand the problem you’re facing:

To Reproduce Steps to reproduce the behavior:

  1. west build -p auto -b nucleo_g474re tests/subsys/canbus/isotp/implementation/
  2. west flash
  3. See error

Expected behavior Test PASSED

Impact Ci fails on this target paltform

Logs and console output

*** Booting Zephyr OS build v3.3.0-rc3-1-gbbefcd29a9dd ***
Running TESTSUITE isotp_implementation
===================================================================
START - test_bind_unbind
 PASS - test_bind_unbind in 2.509 seconds
===================================================================
START - test_buffer_allocation
 PASS - test_buffer_allocation in 1.551 seconds
===================================================================
START - test_buffer_allocation_wait
 PASS - test_buffer_allocation_wait in 10.151 seconds
===================================================================
START - test_send_receive_blocks
 PASS - test_send_receive_blocks in 0.454 seconds
===================================================================
START - test_send_receive_net_blocks
 PASS - test_send_receive_net_blocks in 1.019 seconds
===================================================================
START - test_send_receive_net_sf
 PASS - test_send_receive_net_sf in 0.006 seconds
===================================================================
START - test_send_receive_net_single_blocks

E: Reception of next FC has timed out
    Assertion failed at WEST_TOPDIR/zephyr/tests/subsys/canbus/isotp/implementation/src/main.c:289: isotp_implementation_test_send_receive_net_single_)
recv returned -14

    Assertion failed at WEST_TOPDIR/zephyr/tests/subsys/canbus/isotp/implementation/src/main.c:55: send_complete_cb: (error_nr not equal to ISOTP_N_OK)
Sending failed (-2)
 FAIL - test_send_receive_net_single_blocks in 1.066 seconds
===================================================================
START - test_send_receive_sf

    Assertion failed at WEST_TOPDIR/zephyr/tests/subsys/canbus/isotp/implementation/src/main.c:91: get_sf: (ret not equal to 1)
recv returned -14
 FAIL - test_send_receive_sf in 1.013 seconds
===================================================================
START - test_send_receive_single_block
E: No buffer for FF left

    Assertion failed at WEST_TOPDIR/zephyr/tests/subsys/canbus/isotp/implementation/src/main.c:313: isotp_implementation_test_send_receive_single_bloc)
Binding failed (-11)
 FAIL - test_send_receive_single_block in 0.020 seconds
===================================================================
TESTSUITE isotp_implementation failed.

------ TESTSUITE SUMMARY START ------

SUITE FAIL -  66.67% [isotp_implementation]: pass = 6, fail = 3, skip = 0, total = 9 duration = 17.789 seconds
 - PASS - [isotp_implementation.test_bind_unbind] duration = 2.509 seconds
 - PASS - [isotp_implementation.test_buffer_allocation] duration = 1.551 seconds
 - PASS - [isotp_implementation.test_buffer_allocation_wait] duration = 10.151 seconds
 - PASS - [isotp_implementation.test_send_receive_blocks] duration = 0.454 seconds
 - PASS - [isotp_implementation.test_send_receive_net_blocks] duration = 1.019 seconds
 - PASS - [isotp_implementation.test_send_receive_net_sf] duration = 0.006 seconds
 - FAIL - [isotp_implementation.test_send_receive_net_single_blocks] duration = 1.066 seconds
 - FAIL - [isotp_implementation.test_send_receive_sf] duration = 1.013 seconds
 - FAIL - [isotp_implementation.test_send_receive_single_block] duration = 0.020 seconds

------ TESTSUITE SUMMARY END ------

===================================================================
PROJECT EXECUTION FAILED

Environment (please complete the following information):

  • OS: ubuntu
  • Toolchain Zephyr SDK 0.15.0
  • Commit SHA v3.3.0-rc3 (bb8b74502091a90bff08de1b66b560a04b7c28e1)

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18 (7 by maintainers)

Most upvoted comments

Changing the thread priority fixes the pb

diff --git a/tests/subsys/canbus/isotp/implementation/prj.conf b/tests/subsys/canbus/isotp/implementation/prj.conf
index af7b76b9fc3..4ab0fe51d38 100644
--- a/tests/subsys/canbus/isotp/implementation/prj.conf
+++ b/tests/subsys/canbus/isotp/implementation/prj.conf
@@ -7,3 +7,4 @@ CONFIG_ISOTP_ENABLE_CONTEXT_BUFFERS=y
 CONFIG_ISOTP_RX_BUF_COUNT=2
 CONFIG_ISOTP_RX_BUF_SIZE=56
 CONFIG_ISOTP_RX_SF_FF_BUF_COUNT=2
+CONFIG_ZTEST_THREAD_PRIORITY=0
START - test_send_receive_single_block
E: Reception of next FC has timed out
Assertion failed at WEST_TOPDIR/zephyr/tests/subsys/canbus/isotp/implementation/src/main.c:321: isotp_implementation_test_send_receive_single_block: (ret not equal to send_len)
data should be received at once (ret: -14)
Assertion failed at WEST_TOPDIR/zephyr/tests/subsys/canbus/isotp/implementation/src/main.c:55: send_complete_cb: (error_nr not equal to ISOTP_N_OK)
Sending failed (-2)
FAIL - test_send_receive_single_block in 1.070 seconds

I think this is an issue with the test.

isotp_send is called on the same struct isotp_send_ctx before the last send is complete.

The test thread and the system workqueue have the same priority, so it is a bit arbitrary whether the workqueue cleans up the send context before the next send.

I think it shows up on MCAN because the loopback RX ISR occurs before the TX complete ISR (i.e. different order to can_loopback driver). The work queue yields after completing each work item, so the order is “WQ RX -> test recv + send -> WQ TX complete”

Try adding a “TX complete” semaphore to the test, and/or changing a thread priority (CONFIG_ZTEST_THREAD_PRIORITY=0)

While it is good to know that reverting that allows the test to pass again, it is not yet clear to me why it has that effect. As an aside, that particular addition to z_unpend_thread_no_timeout() was added to guard against a race condition in the pipes implementation.

My initial investigation into the kernel side of this issue shows that all uses of the pended_on field are guarded by the sched_spinlock (which on an UP system such as the nucleo degenerates into a mere interrupt lock). Unless there is some higher priority interrupt at play that is not masked by our standard interrupt locking (which is a scenario that I doubt is coming into play here), I can imagine a couple more potential conditions to search for that could lead to this behaviour. Unfortunately, I am not yet able to convert these ideas from abstract mental pictures into words for a proper description–but I will continue to dig.