zephyr: modem_cellular: Receive buffer overrun in ISR causes network stall
Describe the bug Using the recently implemented modem_cellular/PPP to connect to web services (over TCP & MQTT), using a U-blox SARA-R5 modem (baudrate 921600 with flowcontrol).
When stress-testing the system (MQTT publish, 300-ish bytes payload, as often as possible, not concurrent), eventually the following warning comes. NOTE: The actual message is Receive buffer overrun however I added ISR to clarify where it happens (it’s in two places). It comes from modem_backend_uart_isr.c.
At this point, net_l2_ppp continues to re-transmit its messages (send L2) but it never receives any. I can see on the actual UART communication that the modem replies. For example, among the PPP jibberish, I can see a DNS request, and I see the modem replies with the DNS response but it never reaches the PPP L2 stack. From this point on, no packets inbound are seen.
In addition, from this point the CEREG poll times out all the time, and on the MQTT stack I’ve seen “Transport read error” from mqtt_rx.c sometimes.
Using STM32H743BIT6 (own PCB) with U-blox SARA-R5. Probably not a regression, always seen this problem.
To Reproduce Basically cause the uart receive buffer to overrun using TCP: stress-test the system, at a high baudrate.
Expected behavior UART overruns and dropped frames shouldn’t cause receive to lock up. Impact Cellular modem not usable.
Environment (please complete the following information):
- OS: Arch
- Toolchain: Zephyr SDK
- Commit SHA or Version used: 25ad0283769d2165838cbddaf4be446dc816e97f
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Comments: 20 (8 by maintainers)
It worked for me forcing an RX error in
zephyr/subsys/modem/backends /modem_backend_uart_isr.cas suggested.Attached is a snippet of the log:
I was able to ping, and send CoAP packets past the error.
Thanks!
Ah, that explains it, I found the resync in this document section 5.1, and believed it to be in the standard, that’s why the resync mechanism is only working on my BG95… I will look into updating the resync mechanism to just drop frames until it happens to be in sync again, see if that works.
I’m already looking into it 😃