esp-idf: [v5.1][NimBLE] os_mbuf list is corrupt, causing ble_gattc_indicate_custom to hit infinite loop (IDFGH-10182)

IDF version.

release/v4.4 & release/v5.1

Development Kit.

ESP32-S3 Dev Kit C

What is the expected behavior?

NimBLE sends an indication with hitting Task WDT.

What is the actual behavior?

E (202103) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (202103) task_wdt:  - IDLE (CPU 1)
E (202103) task_wdt: Tasks currently running:
E (202103) task_wdt: CPU 0: IDLE
E (202103) task_wdt: CPU 1: btApiTask
E (202103) task_wdt: Print CPU 0 (current core) backtrace


Backtrace: 0x421B7656:0x3FCBB980 0x4037DC31:0x3FCBB9C0 0x4218407E:0x3FCF61F0 0x42007B1E:0x3FCF6220 0x403944DE:0x3FCF6250
0x421b7656: task_wdt_isr at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/esp_system/task_wdt.c:183 (discriminator 3)

0x4037dc31: _xt_lowint1 at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/freertos/port/xtensa/xtensa_vectors.S:1114

0x4218407e: cpu_ll_waiti at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/hal/esp32s3/include/hal/cpu_ll.h:185
 (inlined by) esp_pm_impl_waiti at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/esp_pm/pm_impl.c:830

0x42007b1e: esp_vApplicationIdleHook at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/esp_system/freertos_hooks.c:63

0x403944de: prvIdleTask at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/freertos/tasks.c:3997 (discriminator 1)


E (202103) task_wdt: Print CPU 1 backtrace


Backtrace: 0x4038809D:0x3FCBC1C0 0x4037DC31:0x3FCBC1F0 0x42139CD5:0x3FCCBE90 0x42139E50:0x3FCCBEC0 0x4213AA93:0x3FCCBEF0 0x4213AAFE:0x3FCCBF20 0x4213BD9B:0x3FCCBF50 0x42121312:0x3FCCC080 0x421366EB:0x3FCCC0B0 0x421367C7:0x3FCCC0F0 0x42132AED:0x3FCCC120 0x4212BCE2:0x3FCCC160 0x4212B565:0x3FCCC190 0x42128B9E:0x3FCCC1C0 0x4202A9E2:0x3FCCC1F0 0x4202AB26:0x3FCCC230 0x4201F0F6:0x3FCCC270 0x4201F45E:0x3FCCC2B0 0x420AB341:0x3FCCC300 0x420225C7:0x3FCCC330 0x420F9571:0x3FCCC360 0x420A5E1A:0x3FCCC600 0x420242EE:0x3FCCC630 0x420249D9:0x3FCCC670 0x4201E4E5:0x3FCCC6B0
0x4038809d: esp_crosscore_isr at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/esp_system/crosscore_int.c:92

0x4037dc31: _xt_lowint1 at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/freertos/port/xtensa/xtensa_vectors.S:1114

0x42139cd5: os_memblock_put_from_cb at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/bt/host/nimble/nimble/porting/nimble/src/os_mempool.c:267

0x42139e50: os_memblock_put at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/bt/host/nimble/nimble/porting/nimble/src/os_mempool.c:327

0x4213aa93: os_mbuf_free at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/bt/host/nimble/nimble/porting/nimble/src/os_mbuf.c:300

0x4213aafe: os_mbuf_free_chain at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/bt/host/nimble/nimble/porting/nimble/src/os_mbuf.c:320

0x4213bd9b: ble_hci_trans_hs_acl_tx at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/bt/host/nimble/esp-hci/src/esp_nimble_hci.c:141

0x42121312: ble_hs_tx_data at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_hs.c:716

0x421366eb: ble_hs_hci_acl_tx_now at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_hs_hci.c:533

0x421367c7: ble_hs_hci_acl_tx at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_hs_hci.c:591

0x42132aed: ble_l2cap_tx at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_l2cap.c:452

0x4212bce2: ble_att_tx at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_att_cmd.c:75

0x4212b565: ble_att_clt_tx_indicate at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_att_clt.c:941

0x42128b9e: ble_gattc_indicate_custom at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_gattc.c:4336

0x4202a9e2: indicate_msg(unsigned short, unsigned short, unsigned char const*, unsigned short, pd_error_t*) at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/jamcorder-firmware/jamcorder_common/jcommon/bluetooth/pd_bluetooth.cpp:1146

0x4202ab26: pd_bluetooth_indicate_client(unsigned short, pd_ble_chr_t, unsigned char const*, unsigned short, pd_error_t*) at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/jamcorder-firmware/jamcorder_common/jcommon/bluetooth/pd_bluetooth.cpp:1194

0x4201f0f6: pd_bthttp_socket_sendmsg_ex(pd_bthttp_socket_t*, pd_btsockmsg_delim_t, pd_btsockmsg_delim_t, unsigned char const*, unsigned int, pd_error_t*) at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/jamcorder-firmware/jamcorder_common/jcommon/api/xcoms/pd_bthttp_socket.cpp:112 (discriminator 3)

0x4201f45e: pd_bthttp_socket_resp_send_body_bytes(pd_bthttp_socket_t*, pd_bthttp_resp_t*, unsigned char const*, unsigned int, bool, pd_error_t*) at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/jamcorder-firmware/jamcorder_common/jcommon/api/xcoms/pd_bthttp_socket.cpp:430

0x420ab341: pd_bthttp_resp_send(pd_bthttp_request_t*, unsigned char const*, unsigned int, pd_error_t*) at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/jamcorder-firmware/jamcorder_common/jcommon/api/xcoms/pd_bthttp_txrx.cpp:50

0x420225c7: pd_xtrans_resp_send(pd_xtrans_t*, unsigned char const*, unsigned int, pd_error_t*) at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/jamcorder-firmware/jamcorder_common/jcommon/api/xcoms/pd_xtrans.cpp:319

0x420f9571: pd_api_midi_stone_download_stone(pd_xtrans_t*, char const*, int, pd_error_t*) at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/jamcorder-firmware/jamcorder_common/jcommon/api/core/pd_api_midi_stone.cpp:104 (discriminator 4)

0x420a5e1a: pd_api2_midi_stone_download_stone_json(pd_xtrans_t*, cJSON*, pd_error_t*) at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/jamcorder-firmware/jamcorder_common/jcommon/api/interface/pd_api2_midi_stone.cpp:27 (discriminator 4)

0x420242ee: pd_xhandler_jsonToDownload(pd_xtrans_t*, pd_xhandler_args_t, pd_error_t*) at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/jamcorder-firmware/jamcorder_common/jcommon/api/xcoms/pd_xhandlers.cpp:357 (discriminator 4)

0x420249d9: pd_xhandler_call_api_generic(pd_xtrans_t*, pd_error_t*) at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/jamcorder-firmware/jamcorder_common/jcommon/api/xcoms/pd_xhandlers.cpp:454

0x4201e4e5: bthttp_task(void*) at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/jamcorder-firmware/jamcorder_common/jcommon/api/xcoms/pd_bthttp_server.cpp:231

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 20 (20 by maintainers)

Most upvoted comments

🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉

The issue is in my calling code!!!

The issue: ble_gattc_indicate_custom consumes the txom. When BLE_HS_ENOMEM is returned by ble_gattc_indicate_custom, we loop trying to use the same txom even though that txom is no longer valid.

The solution: whenever ble_gattc_indicate_custom we need to call ble_hs_mbuf_from_flat again.

BAD CODE:

    //
    // get txom loop
    //
    do{
        txom = ble_hs_mbuf_from_flat(msg, msgLen);

        if (txom == NULL){

            vTaskDelay(10 / portTICK_PERIOD_MS);

            if (millis() - tStart > 4000){
                pd_err_fail(error, TAG , "timed out (4 secs) waiting for BLE resources - ""B");
                return;
            }
        }
    } while (txom == NULL);
    
    //
    // loop send indication
    //
    int rc = 0;
    do {
        printf("indicate_msgB %p\n", txom);
        rc = ble_gattc_indicate_custom(gapConnHandle, attrHandle, txom);

        if (rc == BLE_HS_ENOMEM){

            vTaskDelay(10 / portTICK_PERIOD_MS);

            if (millis() - tStart > 4000){
                pd_err_fail(error, TAG , "timed out (4 secs) waiting for BLE resources - ""C");
                return;
            }
        }
    }while (rc == BLE_HS_ENOMEM);

Good Code:

    //
    // send indication
    //
    int rc = 0;
    do {
        // get txom
        do {
            txom = ble_hs_mbuf_from_flat(msg, msgLen);

            if (txom == NULL) {

                vTaskDelay(10 / portTICK_PERIOD_MS);

                if (millis() - tStart > 4000){
                    pd_err_fail(error, TAG , "timed out (4 secs) waiting for BLE resources - ""B");
                    return;
                }
            }
        } while (txom == NULL);
        
        // send indication
        rc = ble_gattc_indicate_custom(gapConnHandle, attrHandle, txom);

        if (rc == BLE_HS_ENOMEM) {

            if (millis() - tStart > 4000) {
                pd_err_fail(error, TAG , "timed out (4 secs) waiting for BLE resources - ""C");
                return;
            }

            vTaskDelay(10 / portTICK_PERIOD_MS);
        }
    } while (rc == BLE_HS_ENOMEM);

Proposal: it would be great if nimble could detect this issue, i.e. using submitting a block that was already freed. It is incredibly tricky to debug, and we really ought to detect it so no one has to do this painful debugging process again.