esp-idf: [TW#17765] I2C crashing - watchdog timeout (master & v3.0 branch)
I have a project that has three I2C slave devices on a single bus (running at 100kHz). For some time I was developing with ESP-IDF 2.1.1 and everything was working pretty well, except for a weird problem where the I2C master would freeze up after a few minutes. I did some research and it looks like this is a problem with the I2C master hardware state machine which has been addressed in more recent commits of ESP-IDF. So to make use of this fix I migrated my project to use master (595688a32ad653d8e6cb1c7682b813f96125853e). I had to make a few changes (remove references to FreeRTOS heap measurement commands, add nvs_flash_init()
before initialising WiFi) but then everything seemed to work well. The slave devices are all being polled correctly and everything seems happy.
The project is here: https://github.com/DavidAntliff/esp32-poolmon/tree/ESP-IDF_master
I came back a little while later and the application is crashing over and over with the following console output shortly after boot:
Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)
Register dump:
PC : 0x400859f3 PS : 0x00060034 A0 : 0x80084685 A1 : 0x3ffb0590
0x400859f3: xQueueGenericSendFromISR at /Users/david/esp32/esp-idf-master/components/freertos/./queue.c:2037
A2 : 0x00000001 A3 : 0x00000000 A4 : 0x3ffb05b0 A5 : 0x00000002
A6 : 0x3ffbc970 A7 : 0x00060021 A8 : 0x800859f3 A9 : 0x3ffb0570
A10 : 0x00000000 A11 : 0x00000000 A12 : 0x00000002 A13 : 0x3ffbadc0
A14 : 0x00000000 A15 : 0x400849fc SAR : 0x00000012 EXCCAUSE: 0x00000005
0x400849fc: i2c_isr_handler_default at /Users/david/esp32/esp-idf-master/components/driver/./i2c.c:1023
EXCVADDR: 0x00000000 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff
Backtrace: 0x400859f3:0x3ffb0590 0x40084682:0x3ffb05b0 0x40084a89:0x3ffb05e0 0x40082ba5:0x3ffb0610 0x4000bfed:0x00000000
0x400859f3: xQueueGenericSendFromISR at /Users/david/esp32/esp-idf-master/components/freertos/./queue.c:2037
0x40084682: i2c_master_cmd_begin_static at /Users/david/esp32/esp-idf-master/components/driver/./i2c.c:1023
0x40084a89: i2c_isr_handler_default at /Users/david/esp32/esp-idf-master/components/driver/./i2c.c:1023
0x40082ba5: _xt_lowint1 at /Users/david/esp32/esp-idf-master/components/freertos/./xtensa_vectors.S:1105
Rebooting...
A software or on-board reset does not stop this endless reset behaviour, however removing power for a short period of time does “fix” the issue. It is strange that a brief ESP32 reset does not clear it. (EDIT: but a long reset press does).
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 78 (59 by maintainers)
Just wanted to let you know that there is a new commit in master from today that seems to fix this issue. I have not tested it yet. 😃
@koobest I just found a way to reset the I2C peripherals that seems to do a power on reset. It clears out all registers, resets the Bus_busy flags, initializes the hardware state back to a Power On condition. I work on the Arduino branch, a patch merge containing just this reset is arduino-esp32 pr 1201.
Chuck.
@DavidAntliff
i2c_hw_fsm_reset()
is attempting to work around the problem that the fsm will see a glitch on SDA as a multiMaster bus capture. Once the FSM sees ‘another’ master transacting the bus it will initiate a Bus_Busy State until it sees a complete transaction with a Valid STOP. The basis of this problem is the FSM’s interpretation of START:Since all of the problems have been encountered by people using the ESP32 in a SINGLE Master I2C configuration, the FSM will infinitely hang waiting for the ‘other’ master to complete it’s transaction.
The use of additional GPIO pins to act as another I2C master will solve the Bus_Busy problems, Actual TIMEOUT interrupt cascades I haven’t solved.
static esp_err_t i2c_master_clear_bus(i2c_port_t i2c_num)
acts like an I2C master to send out a null transaction. Except, there is no initial START(It drops SCL before SDA). So, it is another illegal transaction. i2c_master_clear_busThis code should be changed to something like this:
The history of needing this function traces back to hardware glitches that occur when attaching the GPIO pins to the hardware peripheral. @ESP32DE and I solved these glitches for the Arduino environment proposal to i2c. I don’t know the equivalent pin assignment sequence for IDF. I don’t use directly use esp32-IDF.
In my testing I no longer need to execute this function at every boot.
with a quick looking through of the IDF i2c code, I see a few design idea I don’t support.
portYIELD_FROM_ISR();
at every opportunity.portYIELD_FROM_ISR();
should only be called at the end of the ISR. If one of the OS calls returns(HPTaskAwoken == pdTRUE)
, Then at the END of the ISR callportYIELD_FROM_ISR()
instead of just exiting the ISR.HPTaskAwoken
just means that the FOREGROUND process must Yield. not the ISR. so, as soon as you complete your ISR (which should be Short, Deterministic, and NO waits) CallportYIELD_FROM_ISR();
to task switch to the higher priority FOREGROUND task.I think the multiple
portYIELD_FROM_ISR()
throughout the ISR is the cause/basis of this Issue. A single interrupt is not completing before the next byte moves.HPTaskAwoken
only needs to be acted on ONCE, just before the end of the ISR. I would create a SINGLEHPTaskAwoken
variable at the top of the ISR and pass it by reference to all sub-Functions. The FreeRTOS ISR functions will not clear it, only SET it. So, if any of the OS functions set it, it will cascade until that last step before the ISR returns. I haven’t studied this ISR, in my verion, I have a subfunction that does my exit operations it is the only place that can callportYIELD_FROM_ISR()
. This code is from my ISR statemachine.inside
i2cIsrExit()
all of my ISR cleanup, foreground notifications, and a potentialportYIELD_FROM_ISR()
is executed, thereturn;
exits from my ISR back to the interrupted foreground task.From my point of view, ISR’s must complete. They are atomic. If an ISR can’t complete in a short FINITE timespan, it is coded wrong.
I2C_CMD_END
is used, when the command[] buffer is refilled, the positions of command[] at and beyond whereI2C_CMD_END
was placed cannot be reused. In my implementation of I2C for Arduino, If I need to useI2C_CMD_END
because of data block length I only use command[15] for end. The FSM allows each command to move upto 255 bytes, so before I need to use an ‘END’ a Master Write would have to send over 3060 bytes. A single Read is a little more limited because of the last byte NAK. I don’t see any consideration for these ‘rules’.Chuck.
@luisonoff thank you for the alert!
I have tested ESP-IDF commit 391c3ff959f9eb1b2975cb0d7b29c0478f3b6a48 with my reproduction project on one of my “DOIT” boards and I can happily report that I am unable to reproduce my issue by rubbing SDA and SCL together rapidly. I spent maybe 4 minutes rapidly mechanically shorting them and did not see a single crash.
Then I reverted back to 2e7613b6560775b27c50eb81e81d5c3ff712b866 (just prior to the “fix”) and verified that the issue can be reproduced. In fact it was extremely easy to reproduce it, many times per minute.
So I can conclude from this that merge 892f3907fa2e074943e865b68f2fda3da600584b appears to resolve the issue, for me at least, on this board.
I’ll try it on my Wemos LoLin32 Lite next, and report back if the results are any different.
EDIT: looks good on the LoLin32 also - no crashes seen. Good work Espressif! Thank you.
Hi Luis, If there are many doubts, I suggest you to test with 3K3 resistors for 3.3V. 4K7 resistors are for 5V. Did you scope I2C interface with Logic analyzer ?
@Gustavomurta thanks for the advice. Here’s the thing - I know that my I2C bus isn’t perfect, and it would be good if I could condition my signals to avoid an issue, but the problem is that there’s always a risk of errors on the bus due to noise. In the event of a failed I2C transaction, the bus will be in an error state, and that’s fine if the software can detect that and return an error code to the caller. The problem is that the ESP32 I2C peripheral has a bug that causes its internal finite-state-machine (FSM) to lock up if SDA or SCK are electrically affected in certain ways. This is a known issue and acknowledged by Espressif. There is a fix in the 3.0 stream that attempts a FSM reset when there is a transaction timeout and the hardware busy flag is still raised. I see this fix activate sometimes and it seems to work. The issues that I have documented here are related to this, I think, but take it further:
Because 2. happens almost every single time 1. does, I suspect that 1. is related to the FSM failure. It may be a cause, or it may be incidental, I’m not sure. I don’t know enough about the FSM failure to know whether it can cause a flood of interrupts.
So my point is that although there’s a lot I can do to improve the I2C bus in my particular circuit, there’s a real issue with the ESP32 software interaction with the hardware at the moment that is causing I2C for multiple people, and Espressif are in the best possible place to investigate this now that there’s a way to reproduce it.
I am using external pull-ups BTW. The issue is also unrelated to bus speed. It happens at 10 kHz almost as often as it happens at 100 kHz.
@panfeng-espressif https://github.com/DavidAntliff/esp-mqtt/
It is a submodule of his other project