zephyr: STM32 I2C v2 lockup with invalid data and read/write

Fairly regularly (around ~1.5 hours, on a 100Hz interrupt, with other transactions alongside), I find that the system has locked up - some interrupts are still being serviced (e.g: I2C, CAN), but the application flow has entirely halted.

I’m running f51c8ee739, with an out-of-tree ICM-20948 driver using I2C address 0x68.

On investigation, I have found the following:

  • This always appears to happen for a specific sequence and this device,
    • The sequence should be Wx1, Rx4, Wx1, Rx14, all presented as one transaction via i2c_transfer_dt()
  • Transaction 3 / the final write has invalid data
  • Transaction 4 / the final read appears on the bus as a write, with only 13x bytes
  • The memory for the struct i2c_msg and stacks are confirmed as in-tact (not corrupt / overwritten / overflowed)
    • Both STACK_SENTINEL and STACK_CANARIES are enabled
  • The I2C interrupt is firing constantly
  • Disabling I2C interrupts briefly allows the application to continue, even if the I2C device receives invalid writes.
    • set ((I2C_TypeDef*)0x40005400)->CR1 = 0
    • cont
    • set ((I2C_TypeDef*)0x40005400)->CR1 = 0xf7

Good” Transaction

image

Bad” Transaction

Scrolled for decode visibility on what should be the final read, of 14x bytes (only 13x present). Green is my trigger point (SCL staying low for too long). Red is set high immediately before the call to i2c_transfer_dt(), and low immediately after - execution never returns.

In this specific instance, noticed that the final i2c_msg.buf (should be a destination buffer that gets populated) was pointing at memory containing: 0x00 (possibly seen in transaction 3), 0x9f, 0x8d, 0xb8, 0x00, 0x00, 0x00, 0x00, 0xc1, 0xcc, 0x01, 0x08, 0xff, 0xff (seen in transaction 4).

image

(gdb) print/x *((I2C_TypeDef*)0x40005400)
$174 = {
  CR1 = 0xf7,
  CR2 = 0xe00d0,
  OAR1 = 0x0,
  OAR2 = 0x0,
  TIMINGR = 0x20b90d1e,
  TIMEOUTR = 0x0,
  ISR = 0x8003,
  ICR = 0x0,
  PECR = 0x0,
  RXDR = 0x0,
  TXDR = 0xff
}

Expected behavior

I2C operates correctly

Impact

Frequent lockups, and heavily reduced reliability. Near showstopper.

Additional Context

I will be continuing to work on this. Until earlier today, I was suspicious of my code, but in light of some recent discoveries I plan to take a close look at the I2C driver.

About this issue

  • Original URL
  • State: open
  • Created 4 months ago
  • Reactions: 1
  • Comments: 23 (19 by maintainers)

Commits related to this issue

Most upvoted comments

I must admit, I missed that - good spot. Unfortuantely, even with limit=1, the same issue occurs, and my patch is required for normal operation.

Ok, that was worth trying … and I think this should be fixed anyway. Btw, I’m seeing a lof I2C drievrs using this same value, likely a copy/paste effect, unless I’m missing something. @teburd any opinion on that ?

@aescolar - understood, thanks for clarifying.

I’ve got some ideas on how to approach this, but I’d like to discuss with others (and need to get time to spend on it)… the workaround above seems to do the trick for the moment - I’ve had it running for over 50 hours without issue recently.