zephyr: STM32 I2C v2 lockup with invalid data and read/write
Fairly regularly (around ~1.5 hours, on a 100Hz interrupt, with other transactions alongside), I find that the system has locked up - some interrupts are still being serviced (e.g: I2C, CAN), but the application flow has entirely halted.
I’m running f51c8ee739, with an out-of-tree ICM-20948 driver using I2C address 0x68.
On investigation, I have found the following:
- This always appears to happen for a specific sequence and this device,
- The sequence should be Wx1, Rx4, Wx1, Rx14, all presented as one transaction via
i2c_transfer_dt()
- The sequence should be Wx1, Rx4, Wx1, Rx14, all presented as one transaction via
- Transaction 3 / the final write has invalid data
- Transaction 4 / the final read appears on the bus as a write, with only 13x bytes
- The memory for the
struct i2c_msgand stacks are confirmed as in-tact (not corrupt / overwritten / overflowed)- Both
STACK_SENTINELandSTACK_CANARIESare enabled
- Both
- The I2C interrupt is firing constantly
CR1=0xf7,ISR=0x8003- entering via
stm32_i2c_event_isr() stm32_i2c_event()does nothing
- Disabling I2C interrupts briefly allows the application to continue, even if the I2C device receives invalid writes.
set ((I2C_TypeDef*)0x40005400)->CR1 = 0contset ((I2C_TypeDef*)0x40005400)->CR1 = 0xf7
“Good” Transaction
“Bad” Transaction
Scrolled for decode visibility on what should be the final read, of 14x bytes (only 13x present).
Green is my trigger point (SCL staying low for too long).
Red is set high immediately before the call to i2c_transfer_dt(), and low immediately after - execution never returns.
In this specific instance, noticed that the final i2c_msg.buf (should be a destination buffer that gets populated) was pointing at memory containing: 0x00 (possibly seen in transaction 3), 0x9f, 0x8d, 0xb8, 0x00, 0x00, 0x00, 0x00, 0xc1, 0xcc, 0x01, 0x08, 0xff, 0xff (seen in transaction 4).
(gdb) print/x *((I2C_TypeDef*)0x40005400)
$174 = {
CR1 = 0xf7,
CR2 = 0xe00d0,
OAR1 = 0x0,
OAR2 = 0x0,
TIMINGR = 0x20b90d1e,
TIMEOUTR = 0x0,
ISR = 0x8003,
ICR = 0x0,
PECR = 0x0,
RXDR = 0x0,
TXDR = 0xff
}
Expected behavior
I2C operates correctly
Impact
Frequent lockups, and heavily reduced reliability. Near showstopper.
Additional Context
I will be continuing to work on this. Until earlier today, I was suspicious of my code, but in light of some recent discoveries I plan to take a close look at the I2C driver.
About this issue
- Original URL
- State: open
- Created 4 months ago
- Reactions: 1
- Comments: 23 (19 by maintainers)
Ok, that was worth trying … and I think this should be fixed anyway. Btw, I’m seeing a lof I2C drievrs using this same value, likely a copy/paste effect, unless I’m missing something. @teburd any opinion on that ?
@aescolar - understood, thanks for clarifying.
I’ve got some ideas on how to approach this, but I’d like to discuss with others (and need to get time to spend on it)… the workaround above seems to do the trick for the moment - I’ve had it running for over 50 hours without issue recently.