esp-idf: Incorrect semaphore behavior in FreeRTOS under esp-idf (IDFGH-7260)

Environment

Development Kit: WEMOS Lolin D32 (non-espressif tool)
Kit version (for WroverKit/PicoKit/DevKitC): N/A
Module or chip used: ESP32-WROOM32
IDF version (run git describe --tags to find it): v4.3 v4.4.1 v5.0-dev-2443-g047903c612
Build System: idf.py / cmake
Compiler version (run xtensa-esp32-elf-gcc --version to find it): when building on v5.0-dev-2443-g047903c612: xtensa-esp32-elf-gcc.exe (crosstool-NG esp-2021r2-patch3) 8.4.0 when building on v4.4.1: xtensa-esp32-elf-gcc.exe (crosstool-NG esp-2021r2-patch3) 8.4.0 when building on v4.3: xtensa-esp32-elf-gcc.exe (crosstool-NG esp-2020r3) 8.4.0
Operating System: Windows 10 x86_64
(Windows only) environment type: esp-idf PowerShell
Using an IDE?: No
Power Supply: USB

Problem Description

We have identified a problem with the FreeRTOS Semaphore API under esp-idf. When creating two tasks which each attempt to give and take a mutex repeatedly inside a loop, one task (task A) can become blocked permanently during a take operation. Meanwhile, the other task (task B) is able to give and take the mutex inside itself repeatedly for over five seconds, completely starving out the first task of access.

Procedure

The table below outlines the basic process of each of the two tasks in our sample code

Task A	Task B
Take the mutex	Take the mutex
busywait (esp_rom_delay_us)	vTaskDelay
Give the semaphore	busywait (esp_rom_delay_us)
vTaskDelay	Give the semaphore

Notes

The example code provided below should reproduce this problem on esp-idf v4.3, v4.4.1, and v5.0-dev-2443-g047903c612.
Setting the menuconfig option FREERTOS_UNICORE causes the issue to disappear with our test code under all three esp-idf versions tested.
Note that v4.3 is also affected by this problem in the case where the priority of task A is set above the priority of task B

Affected Versions

All three tested esp-idf versions were affected, but not under the same conditions. The matrix below indicates the conditions under which the two different log outputs were generated.

1 means generated log output # 1 was generated (see logs below)
2 means generated log output # 2 was generated (see logs below)

v5.0	v4.4.1	v4.3
2	2	1	Unpinned	Same Priority
1	1	1		A higher priority than B
2	2	2		B higher priority than A
2	2	1	Pinned to same core	Same Priority
1	1	1		A higher priority than B
2	2	2		B higher priority than A
2	2	2	Pinned to different core	Same Priority
2	2	2		A higher priority than B
2	2	2		B higher priority than A
1	1	1	FREERTOS_UNICORE	Same Priority
1	1	1		A higher priority than B
2	2	2		B higher priority than A

Expected Behavior

The intuitive expectation is that a “first-come-first-serve” system should exist for tasks of the same priority, with threads waiting the longest for the mutex being serviced first as the mutex becomes available.

Indeed, in FreeRTOS, semaphores are implemented as queues, which appear to offer this guarantee: Queues can have multiple readers, so it is possible for a single queue to have more than one task blocked on it waiting for data. […] If the blocked tasks have equal priority, then the task that has been waiting for data the longest will be unblocked. Taken from: https://www.freertos.org/fr-content-src/uploads/2018/07/161204_Mastering_the_FreeRTOS_Real_Time_Kernel-A_Hands-On_Tutorial_Guide.pdf

It appears that this guarantee is being violated here: although A has been waiting to receive from the queue for many seconds and hundreds of queue sends and receives, xQueueSemaphoreTake in B continues to be serviced first before the same call in A.

#define xSemaphoreGive( xSemaphore )    xQueueGenericSend( ...
#define xSemaphoreTake( xSemaphore, xBlockTime )    xQueueSemaphoreTake( ...

In our example, we would expect Task A to be able to intermittently acquire the mutex during the test period.

Actual Behavior

After an initial period, Task A blocks forever while Task B takes and gives the semaphore to itself, starving task A entirely until the loop in Task B exits.

Possible Explanations

During our testing with v5.0-dev-2443-g047903c612, we determined that xQueueGenericSend in Task B was indeed identifying that another task of equal priority was waiting to acquire the mutex (or, receive from the queue), and correctly yielding control to the scheduler. The scheduler, however, chose to once again schedule task B to run again, as tasks A and B are both of the same priority. Task B then called xQueueSemaphoreTake, which quickly acquired the mutex / dequeued with no need to block. Task A does run at the next opportunity, when Task B calls vTaskDelay, but it is too late; the mutex has already been taken, and it must block again and wait for another opportunity.

As v4.3 is also affected when the priority of task A is set above the priority of task B, some other unidentified mechanism may be involved in that case.

Steps to reproduce

Create a fresh project with idf.py create-project.
Paste the example code into a .c file.
Flash the image to an ESP32-WROOM32
Monitor the serial console
Compare the output to the attached logs / Observe that following the initial period, B repeatedly prints “Taken” while A doesn’t get the semaphore until the loop in B exits
In the case that the problem cannot be reproduced, the attached binary can also be tested

Code to reproduce this issue

#include "esp_log.h"

#include "freertos/FreeRTOS.h"
#include "freertos/FreeRTOSConfig.h"
#include "freertos/task.h"
#include "freertos/semphr.h"

SemaphoreHandle_t sem;

// print is set to true while Task A is waiting for the mutex.
// Task B will print its "Taken" message only while Task A is
// waiting for the mutex, to avoid cluttering the console output
bool print = false;

static void task_a(void *arg) {
    static const char* TAG = "A";
    for (int i = 0; i < 3; i++) {
        ESP_LOGI(TAG, "Take");
        print = true;
        xSemaphoreTake(sem, portMAX_DELAY);
        print = false;
        ESP_LOGI(TAG, "Taken");
        esp_rom_delay_us(5000);
        ESP_LOGI(TAG, "Give");
        xSemaphoreGive(sem);
        vTaskDelay(pdMS_TO_TICKS(1000));
    }
    vTaskDelete(NULL);
}

static void task_b(void *arg) {
    static const char* TAG = "B";
    for (int i = 0; i < 500; i++) {
        xSemaphoreTake(sem, portMAX_DELAY);
        if (print) ESP_LOGI(TAG, "Taken");
        vTaskDelay(pdMS_TO_TICKS(10));
        esp_rom_delay_us(3546);
        xSemaphoreGive(sem);
    }
    vTaskDelete(NULL);
}

void app_main() {
    sem = xSemaphoreCreateMutex();

// For v4.4.1 and v5.0, PA and PB we have reproduced the issue with PA == PB == 1
// For v4.3, we have reproduced the issue with PA == PB == 1 and PA == 2, PB == 1
#define PA 1
#define PB 1

    // Unpinned (for versions 4.4 and 5.0)
    xTaskCreate(task_b, "taskb", 2048, NULL, tskIDLE_PRIORITY + PB, NULL);
    vTaskDelay(pdMS_TO_TICKS(100));
    xTaskCreate(task_a, "taska", 2048, NULL, tskIDLE_PRIORITY + PA, NULL);

    // Pinned to same core
    // xTaskCreatePinnedToCore(task_b, "taskb", 2048, NULL, tskIDLE_PRIORITY + PB, NULL, 1);
    // vTaskDelay(pdMS_TO_TICKS(100));
    // xTaskCreatePinnedToCore(task_a, "taska", 2048, NULL, tskIDLE_PRIORITY + PA, NULL, 1);

    // Pinned to different core (for version 4.3)
    // xTaskCreatePinnedToCore(task_b, "taskb", 2048, NULL, tskIDLE_PRIORITY + PB, NULL, 0);
    // vTaskDelay(pdMS_TO_TICKS(100));
    // xTaskCreatePinnedToCore(task_a, "taska", 2048, NULL, tskIDLE_PRIORITY + PA, NULL, 1);

    for (;;) {
        vTaskDelay(pdMS_TO_TICKS(1000));
    }
}

Debug Logs

Note that A successfully takes the semaphore prior to B starting Once B starts A is not able to take the semaphore again until B’s loop has ended and its task has exited. Then A is able to take the semaphore and complete its loop

Output 1 (Expected):

I (512) A: Take
I (522) A: Taken
I (522) A: Give
I (1522) A: Take
I (1522) A: Taken
I (1522) A: Give
I (2522) A: Take
I (2522) A: Taken
I (2522) A: Give

Output 2 (Actual):

I (510) A: Take
I (510) A: Taken
I (510) A: Give
I (1510) A: Take
I (1510) B: Taken
I (1520) B: Taken
... (Continue to loop "B: Taken")
I (5380) B: Taken
I (5390) B: Taken
I (5400) B: Taken
I (5410) A: Taken
I (5410) A: Give
I (6410) A: Take
I (6410) A: Taken
I (6410) A: Give

Other items

good-log.txt bad-log.txt source.zip binaries.zip

About this issue

Original URL
State: open
Created 2 years ago
Comments: 17 (11 by maintainers)

Most upvoted comments

@Dazza0

You said there is no easy way to fix this. But you are aware the bug is real and this has been reported by many people. So what make you think it’s ok to live with this bug?

AxelLin on Sep 22, 2023

@acf-bwl @AxelLin I’ve taken a look at this issue and have been able to recreate this locally on multiple releases. The issue you’re seeing is one scenario of the more general “spinlock acquire race condition” previously described in this comment.

This issue is quite difficult to explain, so I’ll provide a quick, summary here and leave the full explanations below.

This bug essentially involves a lower priority task A on core X stealing a (queue message/mutex/semaphore) from a higher priority blocked task B on core Y. It is a result of some single-core FreeRTOS assumptions no longer being true under SMP.
The good news is that this race condition shouldn’t be an issue in IDF applications, thus why the same queue/mutex/semaphore implementation has been used in IDF for many releases without this race condition causing a noticeable error (see explanation below).
The bad news is that there is not easy way to fix this race condition. Any immediate fixes can either cause more undesirable scheduling behavior, or increased complexity in the queue/semaphore/mutex implementation. I’ve described some possible approaches and (why they won’t work/cost benefit).

Critical Section Race Condition

All FreeRTOS queues/semaphores/mutex read/write functions will internally use a loop. The pseudo code below illustrates the rough behavioral flow of these functions:

BaseType_t xQueueSend/Receive/SemaphoreTake(...)
{
    // Main fucntion loop
    for(;;) {
        // Race condition happens here, before we enter the critical section.
        taskENTER_CRITICAL();
        // Check for empty/full queue/semaphore/mutex
        if (not empty/full) {
            // Read/write here
        else {
            // Flow downwards to enter timeout
        }
        taskEXIT_CRITICAL();

        // Handle timeout stuff
        vTaskSuspendAll();
        prvLockQueue( pxQueue );
        if (not_timed_out) {
            // Haven't timed out yet. Yield (i.e., block) on the queue/sempahore/mutex
            prvUnlockQueue( pxQueue );
            ( void ) xTaskResumeAll();
            portYIELD_WITHIN_API();
        } else {
            prvUnlockQueue( pxQueue );
            ( void ) xTaskResumeAll();
        }
        /* Once we are here, we either:
        - Have timed out, in which case we return just return from this function
        - Have blocked from portYIELD_WITHIN_API(), then been unblocked because a message has been received/emptied. So we...
            - loop back around
            - Enter the critical section
            - Check for a message/empty space
            - Read/write our message
        */
    }
}

Note: In IDF’s SMP FreeRTOS, we replace the queue locks with a critical section, but this doesn’t affect the race condition.

From the pseudo code above, you’ll notice that after a task blocks on a queue/semaphore/mutex, it will remain blocked until it times out, or is unblocked by another task/ISR by reading/writing a message to the queue/semaphore/mutex.

For example, if a “task A” tires to read an empty queue, it will block at the point of the portYIELD_WITHIN_API() call. If another “task B” then writes to the queue, “task A” will unblock and resume execution right after the portYIELD_WITHIN_API(). You’ll notice that “task A” must then…

Loop back around to the top of the loop
Then re-enter a critical section so that the queue can be read safely.

It is at this point where the race condition can occur. If a “task C” of lower priority but running on the other core also attempts to read the queue at the same time, “task C” can beat “task A” in entering that critical section first (in SMP this means “task C” acquires the spinlock first). Thus, the following occurs:

“task C” of lower priority enters the critical section first (i.e., acquires the spinlock first) while "task A’ is still looping back around
“task C” sees that the queue has a message and reads it. Meanwhile, “task A” is still stuck trying to enter the critical section (waiting for the spinlock to be released)
“task C” has read a message so it doesn’t need to block. It exits the critical and simply returns from the function
“task A” is now able to enter the critical section, but now the queue is empty (because “task C” stole the message). “task A” follows the same flow of blocking on an empty queue.

Thus, as a result, under some narrow timing circumstances, it is possible for a lower priority “task C” to “cut in line” of the queue and get a message before a blocked higher priority “task A”.

On single-core, this race condition cannot occur because only one task (which is the highest priority task) can run at any one time. Thus, in single core, if the same “task A” is looping back around after being unblocked:

No lower priority tasks will run because there is only one core, and the scheduler will never scheduler a lower priority task
If a higher priority “task B” preempts “task A” while it is looping back around, and “task B” also reads the same queue, this should be fine (might be intended behavior by FreeRTOS) as “task B” is higher priority (thus is should rightfully get the message first.

Implications on SMP in IDF

This race condition isn’t an issue in most IDF applications for the following reasons:

Narrow Timing Window

This race condition requires a very specific timing window (with one core looping back around just after being unblocked, and the other core contesting the queue at the same time). Thus to get a full starvation scenario requires very careful and repeatable timing (as demonstrated in the example code you have provided).

In the vast majority of queue/semaphore/mutex use cases, a task will receive/take the queue/sempahore/mutex, and then carry out some some sort of action (either process the message, handle the event indicated by the semaphore, or access the resource protected by the mutex). These actions are usually variable time, thus making it unlikely for the race condition timing window to consistently occur.

Furthermore, timing variance introduced by timer interrupts and cache misses further makes it unlikely for a task to be consistently starved.

Priority Inversion Tolerance

Even if this race condition occasionally occurs, the usual result is that a task simply blocks until the next message (while the previous message was stolen by the lower priority task). However IDF components don’t rely on task priority for correct behavior. For example, some single core FreeRTOS applications may relay on the fact that a task is the highest priority task, thus has exclusive access to a particular resource. But this assumption is never made in IDF (because a task on the other core is running concurrently). Thus (sequential order/exclusive access) in IDF components is always guarded by using queues/semaphores/mutexes/spinlocks.

Possible Fixes and their trade-offs

Reserving the next reader

It might be possible to maintain a next_reader_task field in the queue structure somewhere indicating the task that was just unblocked on the queue. This would mean that any contesting task would need to check if there is a next_reader_task and block themselves even if a message is available. For example, “task A” (priority 5, core 0) has just been unblocked to read the queue, “task B” (priority 4, core 1) gets the critical section first but does not read the message as there is next_read_task is set. "task B’ blocks itself thus allowing “task A” to read the message as intended.

This would solve the immediate race condition mentioned above, but introduces further issues:

If next_read_task is reserved for “task_A”, but “task A” gets deleted before it can fully loop back around, we will need to maintain another data structure in every task so that the next_read_task can be cleaned up properly.
What if the next_read_task is reserved for “task A”, and while it is looping back around “task A” gets preempted by a higher priority “task X”, where “task X” continues to run for a long while without yielding to “task A”. Meanwhile, a “task Y” of the same priority as “task A” but on the other core attempts to read the queue which has been reserved for “task A”. In this case…
- Should “task Y” block and leave the message for “task A”, waiting for “task A” to schedule back to read the message (i.e., strict FIFO order)?
- Or should “task Y” read the message immediately given that it would be more efficient as “task A” cannot immediately read the message (i.e., first come first serve model)?

Ensuring only the highest priority tasks run

In this approach, only tasks of the current highest priority would run. For example, if the highest ready priority was 10, then only priority 10 tasks would run. If a core does not have an available task of that priority to run, it will run the idle task. This approach is actually provided in Amazon SMP FreeRTOS (see the configRUN_MULTIPLE_PRIORITIES option) and would restore the single core assumption of the highest priority task running at any one time. However, the down side to this is a severe reduction in processing throughput, as the cores cannot run tasks of different priorities at the same time.

If you have any ideas/considerations/suggestions about this issue, I’m happy to hear them.

Dazza0 on Dec 5, 2022

@Dazza0 Any comment abut this issue? The starvation looks like a fundamental issue.

I tested above code with v5.1-dev-1879-g96b0cb2828 using esp32c3 (which is single core), and confirm the same behavior. i.e. After an initial period, Task A blocks forever while Task B takes and gives the semaphore to itself, starving task A entirely until the loop in Task B exits.

AxelLin on Nov 20, 2022

Implications on SMP in IDF

This race condition isn’t an issue in most IDF applications for the following reasons:

But @acf-bwl and @matt-mjm already prove this issue can happen in a real project, especially with “the same priority” case.

AxelLin on Dec 23, 2022