esp-idf: Guru Meditation (LoadProhibited) in _Unwind_RaiseException (IDFGH-3388)

Environment

  • Development Kit: none
  • Module or chip used: ESP32-WROVER
  • IDF version (run git describe --tags to find it): v4.0-386-gb0f053d82
  • Build System: CMake, idf.py
  • Compiler version (run xtensa-esp32-elf-gcc --version to find it): 8.2.0, both 2019r2 and 2020r1
  • Operating System: Linux
  • Using an IDE?: No
  • Power Supply: external 3.3V

Problem Description

Sometimes, our ESP32 code randomly panics with a LoadProhibited error on our custom ESP32 board:

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC      : 0x402d05f0  PS      : 0x00060b30  A0      : 0x80296b66  A1      : 0x3ffebcd0  
A2      : 0x00000007  A3      : 0x3fff7470  A4      : 0x3fff7490  A5      : 0x3ffdb490  
A6      : 0x00000003  A7      : 0x00060023  A8      : 0x802d05f0  A9      : 0x3ffebcb0  
A10     : 0x00000000  A11     : 0x3ffebcd0  A12     : 0x00000001  A13     : 0x00000004  
A14     : 0x00000003  A15     : 0x00000004  SAR     : 0x00000020  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x0006003b  LBEG    : 0x40098a2c  LEND    : 0x40098a48  LCOUNT  : 0x00000000  

ELF file SHA256: fbfbb5f5b950a2d7

Backtrace: 0x402d05ed:0x3ffebcd0 0x40296b63:0x3ffebdd0 0x4010ec05:0x3ffebfd0 0x40101bb6:0x3ffec050 0x402d65cd:0x3ffec0f0 0x4023cacd:0x3ffec120 0x4023cb89:0x3ffec160 0x40093ddd:0x3ffec180

Rebooting...

addr2line matches the backtrace addresses we gathered from the logs above with the following call trace:

0x402d05ed: _Unwind_RaiseException at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libgcc/unwind.inc:140
0x40296b63: __cxa_throw at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:90
0x4010ec05: hapbridge::set_response_ble_change(unsigned short, unsigned int, blemesh_val_t, std::optional<blemesh_val_t>) at /home/marco/Workspace/our_project/build/../components/proj_homekit/hapbridge_callbacks_notify_ble.cpp:438
0x40101bb6: blemesh_set_report_handler at /home/marco/Workspace/our_project/build/../components/proj_gweventhandlers/blemesh_event_handler.cpp:102 (discriminator 3)
0x402d65cd: handler_execute at /home/marco/Workspace/esp-homekit-sdk/esp-idf/components/esp_event/esp_event.c:147
0x4023cacd: esp_event_loop_run at /home/marco/Workspace/esp-homekit-sdk/esp-idf/components/esp_event/esp_event.c:553 (discriminator 3)
0x4023cb89: esp_event_loop_run_task at /home/marco/Workspace/esp-homekit-sdk/esp-idf/components/esp_event/esp_event.c:115
0x40093ddd: vPortTaskWrapper at /home/marco/Workspace/esp-homekit-sdk/esp-idf/components/freertos/port.c:143

As you may see above, the crash site reported is deep inside GCC’s stack unwinding code, and in particular it seems like C++ exceptions are somewhat involved in this. We are almost sure this could not caused by an exception escaping due to no mention of std::terminate()/ std::abort() being invoked.

We already stumbled in issues similar to this one several times before, and we’ve never been able to pinpoint the exact reason why it happens. We’ve had a hard time reproducing this issue reliably and we saw it popping out in several parts of our code; we also noticed that shuffling the code around a bit helped reducing (but not mitigating) the issue (i.e. trying to change the order functions are invoked, where exceptions are catch()ed, etc).

In particular, we noticed that when this issue occours, the situation is often very similar to the following:

  • all stack traces have a frame 1 (_Unwind_RaiseException in unwind.inc) and 2 (__cxa_throw in eh_throw.cc) identical or very similar to the one I posted above:
0x402d05ed: _Unwind_RaiseException at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libgcc/unwind.inc:140
0x40296b63: __cxa_throw at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:90
  • the third frame refers to the last/only statement of a C++ function:
void set_response_ble_change([...]) {
   some_other_function(...); // this is hapbridge_callbacks_notify_ble.cpp:438
}
  • the aforementioned function is being called inside of a try {} catch() with a sigle statement in frame 4:
void blemesh_set_report_handler([...]) {
        // ...
	try {
		hapbridge::set_response_ble_change( // this is blemesh_event_handler.cpp:102
			[...]
		);
	} catch (hapbridge::exception &e) {
		ESP_LOGE(...);
	}

The project is composed of a lot of components written in C++17, and several of them rely on C++ exceptions.

Expected Behavior

The system does not crash, or the crash can be clearly tracked to an underlying cause in our application code.

Actual Behavior

The system crashes, and the generated backtrace is not helpful at resolving the issue.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 21

Most upvoted comments

@mcilloni Thanks for the detailed report! We’ve been working on this issue and recently developed a workaround which mitigates this error in all our tests. This workaround should come up soon on master.

The problem lies in fact inside the libgcc unwinding code and occurs during catching an exception. In the code restoring the context of the catch to resume execution from there, it may happen that the registers A4-A7 get mixed up due to a window-underflow cpu exception. At the point of the exception, the code there will try to load from the address stored in register a7, which is 0x60023 in your case, which is not valid memory. Hence the LoadProhibited cpu exception.

Here’s the place where the cpu exception happens: https://github.com/espressif/gcc/blob/esp-develop/libgcc/unwind.inc#L140 This is actually this macro: https://github.com/espressif/gcc/blob/esp-develop/libgcc/config/xtensa/unwind-dw2-xtensa.c#L486, which calls the function uw_install_context_1, which installs the context on the stack.