PX4-Autopilot: [Bug] STM32H7 Serial DMA Mavlink on Main Locks Up
Describe the bug
On current main with the recent DMA changes, I cannot keep serial with DMA mavlink connected to a companion computer. It will run for a few minutes before locking up and needing a reboot of the flight controller. With the DMA changes reverted, it stays connected indefinitely.
It also stays connected indefinitely on serial without DMA.
To Reproduce
Run mavlink on a serial with DMA on main with an STM32H7 connected directly to a companion computer/GCS.
Using an FTDI set to 921600 baud, I can reproduce on Telem1 and Telem2.
If I am running mavlink on Telem2 and Telem1, when one locks up, I can move the FTDI to the other port and it is running fine. So its the specific serial port thats stuck.
Expected behavior
Stays connected.
Screenshot / Media
No response
Flight Log
NA
Software Version
main after nuttx st dma changes
Flight controller
ARKV6X
Vehicle type
None
How are the different components wired up (including port information)
No response
Additional context
When its working.
instance #1:
GCS heartbeat valid
mavlink chan: #1
type: GENERIC LINK OR RADIO
flow control: OFF
rates:
tx: 20128.7 B/s
txerr: 0.0 B/s
tx rate mult: 1.000
tx rate max: 46080 B/s
rx: 46.9 B/s
rx loss: 0.0%
Received Messages:
sysid:255, compid:190, Total: 768 (lost: 17)
FTP enabled: YES, TX enabled: YES
mode: Onboard
Forwarding: Off
MAVLink version: 2
transport protocol: serial (/dev/ttyS4 @921600)
ping statistics:
last: 20.21 ms
mean: 21.24 ms
max: 50.09 ms
min: 5.83 ms
dropped packets: 11
When it locks up.
instance #1:
GCS heartbeat valid
mavlink chan: #1
type: GENERIC LINK OR RADIO
flow control: OFF
rates:
tx: 0.0 B/s
txerr: 756.5 B/s
tx rate mult: 0.050
tx rate max: 46080 B/s
rx: 21.0 B/s
rx loss: 0.0%
Received Messages:
sysid:255, compid:190, Total: 2186 (lost: 0)
msgid: 0, Rate: 1.0 Hz, last 0.84s ago
FTP enabled: YES, TX enabled: YES
mode: Onboard
Forwarding: Off
MAVLink version: 2
transport protocol: serial (/dev/ttyS4 @921600)
ping statistics:
last: 28.57 ms
mean: 24.05 ms
max: 65.86 ms
min: 3.57 ms
dropped packets: 0
Not sure if the serial configs in nuttx make a difference.
Here is a state where Telem1 and Telem2 both locked up while Telem3 is still working due it having no DMA.
nsh> mavlink status
instance #0:
mavlink chan: #0
type: GENERIC LINK OR RADIO
flow control: OFF
rates:
tx: 0.0 B/s
txerr: 1279.6 B/s
tx rate mult: 0.050
tx rate max: 46080 B/s
rx: 0.0 B/s
rx loss: 4.7%
Received Messages:
sysid:255, compid:190, Total: 179 (lost: 835)
FTP enabled: YES, TX enabled: YES
mode: Onboard
Forwarding: On
MAVLink version: 2
transport protocol: serial (/dev/ttyS6 @921600)
ping statistics:
last: 32.58 ms
mean: 29.08 ms
max: 69.63 ms
min: 4.30 ms
dropped packets: 874
instance #1:
mavlink chan: #1
type: GENERIC LINK OR RADIO
flow control: OFF
rates:
tx: 0.0 B/s
txerr: 1040.8 B/s
tx rate mult: 0.050
tx rate max: 46080 B/s
rx: 0.0 B/s
rx loss: 0.1%
Received Messages:
sysid:255, compid:190, Total: 10110 (lost: 513)
FTP enabled: YES, TX enabled: YES
mode: Onboard
Forwarding: Off
MAVLink version: 2
transport protocol: serial (/dev/ttyS4 @921600)
ping statistics:
last: 8.36 ms
mean: 9.90 ms
max: 50.09 ms
min: 1.21 ms
dropped packets: 284
instance #2:
GCS heartbeat valid
mavlink chan: #2
type: GENERIC LINK OR RADIO
flow control: OFF
rates:
tx: 20158.1 B/s
txerr: 0.0 B/s
tx rate mult: 1.000
tx rate max: 46080 B/s
rx: 47.0 B/s
rx loss: 2.5%
Received Messages:
sysid:255, compid:190, Total: 185 (lost: 461)
msgid: 4, Rate: 1.0 Hz, last 0.23s ago
msgid: 0, Rate: 1.0 Hz, last 0.79s ago
FTP enabled: YES, TX enabled: YES
mode: Onboard
Forwarding: Off
MAVLink version: 2
transport protocol: serial (/dev/ttyS1 @921600)
ping statistics:
last: 28.03 ms
mean: 25.40 ms
max: 49.17 ms
min: 8.92 ms
dropped packets: 8820
instance #3:
GCS heartbeat valid
mavlink chan: #3
type: USB CDC
flow control: ON
rates:
tx: 21047.9 B/s
txerr: 0.0 B/s
tx rate mult: 1.000
tx rate max: 100000 B/s
rx: 47.0 B/s
rx loss: 0.0%
Received Messages:
sysid:255, compid:190, Total: 18413 (lost: 0)
msgid: 126, Rate: 3.2 Hz, last 0.08s ago
msgid: 0, Rate: 1.0 Hz, last 0.90s ago
msgid: 4, Rate: 1.0 Hz, last 0.11s ago
FTP enabled: YES, TX enabled: YES
mode: Config
Forwarding: On
MAVLink version: 2
transport protocol: serial (/dev/ttyACM0 @2000000)
ping statistics:
last: 0.85 ms
mean: 1.09 ms
max: 321.23 ms
min: 0.29 ms
dropped packets: 0
nsh>
About this issue
- Original URL
- State: closed
- Created 6 months ago
- Comments: 33 (33 by maintainers)
Here is what was happening:
Normal
Caused the hang
I’m soak testing this change on our CI for the next 12h, so we’ll know if it impacts our setup too.
I’ve been running the serial_test for several hours and cannot reproduce this on our Skynode X config (FMUv6x with the same config as yours).
I’m also working on understanding max throughput better with the serial test via tracing. However, DMA utilization seems very high on first glance (we use a 3Mbps UART link to the companion). I’ll investigate further. orbetto.perf.gz -> ui.perfetto.io