esp-idf: Crash in xEventGroupSetBits after a few hours of running (IDFGH-6707)

Environment

  • Development Kit: none
  • Module or chip used: ESP32-WROOM-32D
  • IDF version (run git describe --tags to find it): v4.3.1-dirty (we have updated nimble to https://github.com/espressif/esp-nimble/commit/94afe2711752ba08bef93780be1b9e5143a87159, there are no other changes)
  • Build System: idf.py
  • Compiler version (run xtensa-esp32-elf-gcc --version to find it): xtensa-esp32-elf-gcc (crosstool-NG esp-2021r1) 8.4.0
  • Operating System: Linux
  • Using an IDE?: No
  • Power Supply: external 5V

Problem Description

After a few hours of running, a crash happens due to null pointer dereference in FreeRTOS code. We have reproduced this on multiple hardware units and multiple times on each unit.

We are currently running the test with latest esp-idf, and trying to create a minimal project for reproduction.

Our firmware is setting event bits every second to update time information on a screen. Before the crash, everything works as expected.

It looks like there might be a race condition where the task is already unblocked from a timeout when xEventGroupSetBits is trying to unblock it?

This looks similar to https://github.com/espressif/esp-idf/issues/242. The commit message for the fix says the race condition is less likely to happen. Does this mean that the issue is still present?

Expected Behaviour

No crashes.

Actual Behaviour

See description above.

Steps to reproduce

  1. Start firmware
  2. After a while, the crash happens

Code to reproduce this issue

// This is how we set the bits:
// The DISPLAY_INFO bits and change value are provided by a UART peripheral connected to the system
    EventBits_t bits = 0;
    if(change & DISPLAY_INFO_MASK_NEW_INDEX)
    {
        bits |= CD_NEW_INDEX;
    }
    if(change & DISPLAY_INFO_MASK_NEW_TRACK)
    {
        bits |= CD_NEW_TRACK;
    }
    if(change & DISPLAY_INFO_MASK_TIME_CHANGED)
    {
        bits |= CD_NEW_TIME;
    }

    xEventGroupSetBits(xCdEventGroup, bits);


// This is how we are waiting for them in a different task:
    xEventGroupValue = xEventGroupWaitBits( xCdEventGroup,
                                            CD_NEW_INDEX |
                                            CD_NEW_TRACK |
                                            CD_NEW_TIME |
                                            CD_STOP_OPEN_CLOSE_PRESSED |
                                            CD_STATE_CHANGED,
                                            pdTRUE,
                                            pdFALSE,
                                            20 / portTICK_PERIOD_MS);

Debug Logs

backtrace: https://gist.github.com/maBarabas/c0d683adfdaf6c8769a0b310e78c93a8

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 20 (4 by maintainers)

Commits related to this issue

Most upvoted comments

@maBarabas Backport for v4.3 has already been merged internally and awaiting GitHub sync, backport for v4.4 is still pending internal review. So I guess maybe a week or two.