zephyr: STM32 with Ethernet crashes when receiving packets early

The STM32 Ethernet driver can receive data before the interface is ready to handle it, leading to a crash.

The problem can typically be reproduced within a couple of tries if the network is filled with traffic, I do that by pinging the IPv4 broadcast address with a 0.001 s interval, when starting the STM32 board.

Different versions have crashed in different ways, but in stock 2.5.0 this is what I get before even main() is called. Asserts enabled:

ASSERTION FAIL [net_if_get_link_addr(iface)->addr != ((void *)0)] @ WEST_TOPDIR/oc_zephyr/subsys/net/ip/net_if.c:3468
[00:00:04.368,000] <err> eth_stm32_hal: Failed to enqueue frame into RX queue: -62
[00:00:04.368,000] <err> eth_stm32_hal: Failed to enqueue frame into RX queue: -62
[00:00:04.368,000] <err> eth_stm32_hal: Failed to enqueue frame into RX queue: -62
[00:00:04.368,000] <err> eth_stm32_hal: Failed to enqueue frame into RX queue: -62
[00:00:04.379,000] <err> os: r0/a1:  0x00000004  r1/a2:  0x00000d8c  r2/a3:  0xffffffbf
[00:00:04.379,000] <err> os: r3/a4:  0x0802f52d r12/ip:  0x00000000 r14/lr:  0x0803317d
[00:00:04.379,000] <err> os:  xpsr:  0x41000000
[00:00:04.379,000] <err> os: Faulting instruction address (r15/pc): 0x0804761c
[00:00:04.379,000] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[00:00:04.379,000] <err> os: Current thread: 0x20004f98 (sysworkq)
[00:00:04.457,000] <err> os: Halting system

The above assertion indicates that the net_if_set_link_addr call could benefit from happening earlier in eth_iface_init() in eth_stm32_hal.c. But it’s still crashing after that:

[00:00:04.418,000] <err> os: ***** USAGE FAULT *****
[00:00:04.418,000] <err> os:   Illegal use of the EPSR
[00:00:04.418,000] <err> os: r0/a1:  0x20004448  r1/a2:  0x20004ec8  r2/a3:  0x20004ec8
[00:00:04.418,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000000 r14/lr:  0x080462df
[00:00:04.418,000] <err> os:  xpsr:  0x6000000f
[00:00:04.418,000] <err> os: Faulting instruction address (r15/pc): 0x00000000
[00:00:04.418,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:04.418,000] <err> os: Fault during interrupt handling

The above exception appears to be because net_if_up is called (in k_sys_work_q?) before net_if_init has finished executing.

A bit of k_sleep before the first call to net_eth_carrier_on in rx_thread() takes care of that, and another k_sleep after the call takes care of the “Failed to enqueue frame into RX queue” errors. But that’s hardly an elegant solution.

Environment:

Zephyr 2.5.0 (although it’s been happening since at least 2.0)
STM32F429

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 17 (6 by maintainers)

Commits related to this issue

drivers: ethernet: stm32: enable IRQ at the end of iface init This avoid IRQ to be handle before iface init is finished (especially before iface address is set) Fixes #32771 Signed-off-by: Alexandre... — committed to ABOSTM/zephyr by ABOSTM 3 years ago
drivers: ethernet: stm32: enable IRQ at the end of iface init This avoid IRQ to be handle before iface init is finished (especially before iface address is set) Fixes #32771 Signed-off-by: Alexandre... — committed to zephyrproject-rtos/zephyr by ABOSTM 3 years ago

Most upvoted comments

I still get a bunch of <err> eth_stm32_hal: Failed to enqueue frame into RX queue: -62 in the log shortly after start, but they don’t seem to actually be a problem.

The errno 62 is ENETDOWN, which means that from upper network stack point of view, the network interface is still down and packets cannot be accepted yet. Probably we could check this ENETDOWN value and not print anything in this case as this is kind of normal situation.

jukkar on May 14, 2021

@FRASTM: Yes, I have noticed that the reproducer code has stopped working against Zephyr master. It appears that it’s no longer possible to get a device pointer before the device is initialized, or something.

Exact.

Though I mostly experienced the code trying to jump to a NULL pointer and the MPU doesn’t protect against that currently (there is an open PR for that, last time i looked).

If this is CORTEX_M_DEBUG_NULL_POINTER_EXCEPTION_DETECTION it has been merged. You can enable it using CONFIG_CORTEX_M_DEBUG_NULL_POINTER_EXCEPTION_DETECTION_MPU=y

erwango on Apr 7, 2021

@erwango I will work on a sample.

ghost on Mar 2, 2021