PX4-Autopilot: [Bug] STM32H7 Serial DMA Mavlink on Main Locks Up

Describe the bug

On current main with the recent DMA changes, I cannot keep serial with DMA mavlink connected to a companion computer. It will run for a few minutes before locking up and needing a reboot of the flight controller. With the DMA changes reverted, it stays connected indefinitely.

It also stays connected indefinitely on serial without DMA.

To Reproduce

Run mavlink on a serial with DMA on main with an STM32H7 connected directly to a companion computer/GCS.

Using an FTDI set to 921600 baud, I can reproduce on Telem1 and Telem2.

If I am running mavlink on Telem2 and Telem1, when one locks up, I can move the FTDI to the other port and it is running fine. So its the specific serial port thats stuck.

Expected behavior

Stays connected.

Screenshot / Media

No response

Flight Log

NA

Software Version

main after nuttx st dma changes

Flight controller

ARKV6X

Vehicle type

None

How are the different components wired up (including port information)

No response

Additional context

When its working.

instance #1:
    GCS heartbeat valid
    mavlink chan: #1
    type:        GENERIC LINK OR RADIO
    flow control: OFF
    rates:
      tx: 20128.7 B/s
      txerr: 0.0 B/s
      tx rate mult: 1.000
      tx rate max: 46080 B/s
      rx: 46.9 B/s
      rx loss: 0.0%
    Received Messages:
      sysid:255, compid:190, Total: 768 (lost: 17)
    FTP enabled: YES, TX enabled: YES
    mode: Onboard
    Forwarding: Off
    MAVLink version: 2
    transport protocol: serial (/dev/ttyS4 @921600)
    ping statistics:
      last: 20.21 ms
      mean: 21.24 ms
      max: 50.09 ms
      min: 5.83 ms
      dropped packets: 11

When it locks up.

instance #1:
    GCS heartbeat valid
    mavlink chan: #1
    type:        GENERIC LINK OR RADIO
    flow control: OFF
    rates:
      tx: 0.0 B/s
      txerr: 756.5 B/s
      tx rate mult: 0.050
      tx rate max: 46080 B/s
      rx: 21.0 B/s
      rx loss: 0.0%
    Received Messages:
      sysid:255, compid:190, Total: 2186 (lost: 0)
        msgid:    0, Rate:  1.0 Hz, last 0.84s ago
    FTP enabled: YES, TX enabled: YES
    mode: Onboard
    Forwarding: Off
    MAVLink version: 2
    transport protocol: serial (/dev/ttyS4 @921600)
    ping statistics:
      last: 28.57 ms
      mean: 24.05 ms
      max: 65.86 ms
      min: 3.57 ms
      dropped packets: 0

Not sure if the serial configs in nuttx make a difference. image

Here is a state where Telem1 and Telem2 both locked up while Telem3 is still working due it having no DMA.

nsh> mavlink status

instance #0:
    mavlink chan: #0
    type:        GENERIC LINK OR RADIO
    flow control: OFF
    rates:
      tx: 0.0 B/s
      txerr: 1279.6 B/s
      tx rate mult: 0.050
      tx rate max: 46080 B/s
      rx: 0.0 B/s
      rx loss: 4.7%
    Received Messages:
      sysid:255, compid:190, Total: 179 (lost: 835)
    FTP enabled: YES, TX enabled: YES
    mode: Onboard
    Forwarding: On
    MAVLink version: 2
    transport protocol: serial (/dev/ttyS6 @921600)
    ping statistics:
      last: 32.58 ms
      mean: 29.08 ms
      max: 69.63 ms
      min: 4.30 ms
      dropped packets: 874

instance #1:
    mavlink chan: #1
    type:        GENERIC LINK OR RADIO
    flow control: OFF
    rates:
      tx: 0.0 B/s
      txerr: 1040.8 B/s
      tx rate mult: 0.050
      tx rate max: 46080 B/s
      rx: 0.0 B/s
      rx loss: 0.1%
    Received Messages:
      sysid:255, compid:190, Total: 10110 (lost: 513)
    FTP enabled: YES, TX enabled: YES
    mode: Onboard
    Forwarding: Off
    MAVLink version: 2
    transport protocol: serial (/dev/ttyS4 @921600)
    ping statistics:
      last: 8.36 ms
      mean: 9.90 ms
      max: 50.09 ms
      min: 1.21 ms
      dropped packets: 284

instance #2:
    GCS heartbeat valid
    mavlink chan: #2
    type:        GENERIC LINK OR RADIO
    flow control: OFF
    rates:
      tx: 20158.1 B/s
      txerr: 0.0 B/s
      tx rate mult: 1.000
      tx rate max: 46080 B/s
      rx: 47.0 B/s
      rx loss: 2.5%
    Received Messages:
      sysid:255, compid:190, Total: 185 (lost: 461)
        msgid:    4, Rate:  1.0 Hz, last 0.23s ago
        msgid:    0, Rate:  1.0 Hz, last 0.79s ago
    FTP enabled: YES, TX enabled: YES
    mode: Onboard
    Forwarding: Off
    MAVLink version: 2
    transport protocol: serial (/dev/ttyS1 @921600)
    ping statistics:
      last: 28.03 ms
      mean: 25.40 ms
      max: 49.17 ms
      min: 8.92 ms
      dropped packets: 8820

instance #3:
    GCS heartbeat valid
    mavlink chan: #3
    type:        USB CDC
    flow control: ON
    rates:
      tx: 21047.9 B/s
      txerr: 0.0 B/s
      tx rate mult: 1.000
      tx rate max: 100000 B/s
      rx: 47.0 B/s
      rx loss: 0.0%
    Received Messages:
      sysid:255, compid:190, Total: 18413 (lost: 0)
        msgid:  126, Rate:  3.2 Hz, last 0.08s ago
        msgid:    0, Rate:  1.0 Hz, last 0.90s ago
        msgid:    4, Rate:  1.0 Hz, last 0.11s ago
    FTP enabled: YES, TX enabled: YES
    mode: Config
    Forwarding: On
    MAVLink version: 2
    transport protocol: serial (/dev/ttyACM0 @2000000)
    ping statistics:
      last: 0.85 ms
      mean: 1.09 ms
      max: 321.23 ms
      min: 0.29 ms
      dropped packets: 0
nsh>

About this issue

  • Original URL
  • State: closed
  • Created 6 months ago
  • Comments: 33 (33 by maintainers)

Most upvoted comments

Here is what was happening:

Normal

normal

Caused the hang

hang

I’m soak testing this change on our CI for the next 12h, so we’ll know if it impacts our setup too.

I’ve been running the serial_test for several hours and cannot reproduce this on our Skynode X config (FMUv6x with the same config as yours).

/dev/ttymxc3: count for this session: rx=579016685, tx=571762328, rx err=6024
/dev/ttymxc3: TIOCGICOUNT: ret=0, rx=579107680, tx=571859587, frame = 0, overrun = 0, parity = 0, brk = 0, buf_overrun = 0

I’m also working on understanding max throughput better with the serial test via tracing. However, DMA utilization seems very high on first glance (we use a 3Mbps UART link to the companion). I’ll investigate further. orbetto.perf.gz -> ui.perfetto.io