NimBLE-Arduino: Occasional delay (blocking) after calling advertising start

This is annoying me because it’s not happening consistently (and when I turn on debug messages it never happens)!

After setting up my server, I am starting to advertise as follows…

    NimBLEAdvertising *advertising = NimBLEDevice::getAdvertising();
    advertising->setScanResponse(false);
    advertising->start();

This is the last thing I do in the setup() method before loop() takes over. Occasionally, there is an (almost exact) 10000 ms delay before the code in my loop() method starts executing. If it works “correctly”, the advertising start takes less than 10 ms.

Any ideas what might be the “random” cause? Thanks!

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 29 (17 by maintainers)

Most upvoted comments

Thanks @elguiri I’m happy with it as well. Will commit to master shortly.

h2zero on Sep 7, 2020

Depending on the priority (not documented) of the high priority tasks mentioned, it may be a demotion from the default for BLE tasks (could that be priority 21)?

Just looked this up and the priority is 24 (configMAX_PRIORITIES -1)

h2zero on Sep 6, 2020

Fantastic @h2zero

I’ll test it (taking out the ble_gap_adv_stop() fudge on the advertising start)!

elguiri on Sep 6, 2020

Did some experimenting and it was successful!

Dead simple to implement and seems to be working well. Needs serious testing however to verify no bugs have been introduced.

Here is a patch for anyone interested in helping with the testing (also a bit of performance improvement).

Edit: wrong patch (fixed now) hci_patch.txt

h2zero on Sep 6, 2020

Do you think this is a race condition? Is the controller code imperfect and any fix in NimBLE-Arduino is going to be a workaround? Does this warrant a closer look at the controller code?

The problem occurs at this line after the semaphore is taken it seem’s to not get released right away, which happens in the callback from the controller here. Looks like some sort of delay in the inter-processor communication. This only happens when sending commands to the controller in a tight loop, that’s why we don’t see the delay with debugging turned on, it slows things down enough. I wish we could look at the controller code but espressif has kept that as a closed source library.

I have also tested by removing the semaphore and looping on !esp_vhci_host_check_send_available() until true and the issue is also resolved, clearly not a great solution but the max loops counted on that was 15, so not too terrible either.

I presume you are talking about these IPC functions, so the approach still needs some care avoiding a possible deadlock and managing stack usage.

Yes those are the calls, I haven’t used them before but I’ll test them for interest sake.

h2zero on Sep 5, 2020

Pushed commit ae3be89 which fixes this issue.

h2zero on Sep 2, 2020

The scan is assumed to have started if the m_pScan is pointing at the scan object, but that singleton object gets created just to set the scan parameters and not necessarily to start it. It’s quite a mess to assume restart like that, because the duration has not been set (-16843010 is random) neither the callback at that point.

Yes that code is incomplete, Resets are rare occurrences with multiple reasons for being triggered so I just put in some code to hopefully keep things working if they happen. I will definitely need to revisit the logic in that handler.

h2zero on Sep 2, 2020