zephyr: Bluetooth: BLE stack notifications blocks host side for too long (`drivers/bluetooth/hci/spi.c` and `hci_spi`)
First, I have no pretention in stating it is a bug, it may be the normal way of working of the BLE stack architecture at this moment. In this case, it could be reclassified as an enhancement request if relevant. I may be expecting too much.
Preliminary context
I am using a classic dual-chip configuration:
- Host-side : custom board (stm32h7, similar to a nucleo-h743zi), using a host-build only
- 3 services (HID, BAS, DIS)
- HID reporting rate @133Hz (7.5ms)
- Controller-side : standard hci_spi sample project flashed on nrf52840 addon board connected through SPI@1MHz
Describe the bug
From the host, using a simple bt_gatt_notify_cb to update a BLE service characteristic (let’s say a 17-bytes long HID report update). I have noticed this call is blocking (OK) for a unexpectedly long period of time (NOK) of approximately 1.5ms degrading others thread “performances” in a non-negligible way.
It’s basically a 1.5ms penalty every 7.5ms that is spent in a busy-wait style fashion starving other actions in the system. More details below on my investigations.
On the other hand, a similar system reporting through USB-HID at the same rate, do not suffer from the same deficiency and a reasonable 0.1 <-> 0.7ms is spent.
- What have you tried to diagnose or workaround this issue?
- Independently of my full system I have been able to reproduce this blocking behavior by setting up a sample test application for the host side, reporting a classic BAS notification (battery level %) periodically (every 50ms down to every 7.5ms (target)) and measuring by software the time spent for a bt_gatt_notify_cb operation.
// Sample test - pseudo code
period = 50ms;
while(1) {
time;
bt_gatt_notify(); // reporting BAS notification
elapsed = now - time;
printf("%d", elapsed); // connected:~750us <--> ~980us, not connected: ~5us
wait( period - elapsed );
}
The host-side prj.conf, I use for the sample test is provided below.
prj.conf.txt
-
I have also tried to pimp my BLE stack configuration using the throughput maximization sample project but with no effect.
-
When debugging, I noticed that the vast majority of time was spent in “waiting” for a rescheduling operation to complete in the implementation of the BLE stack, for some reason.
In a nutshell my call stack (host):
bt_gatt_notify_cb // app waiting here
// GATT
// ATT
// ACL
// KERNEL QUEUE - k_queue_append_list
z_reschedule(&queue->lock, key); <<<< Waiting here for ~80% of the time
// HCI_SPI transfer happening at some point
// ... returns and continues normally up to the next notification
// Transfer OTA happening on controller -> at next connection interval -> OK
// callback notification sent received -> OK
- I have not tried yet but I envisioned moving to a single-chip configuration (Host + Controller using HCI RAM) running on nrf52840 and connecting my current “Host” MCU with a custom protocol over SPI to receive raw notifications.
This way I could have a fully externalized BLE stack not impacting the rest of my system (hopefully) With the big cons of maintaining two FWs instead of a single one…
Do you think it’s a valid way of working around the issue?
- Is this a regression? If yes, have you been able to “git bisect” it to a specific commit? I don’t think it’s a regression, probably the way it has always worked
Expected behavior
From a system architecture point of view, I was expecting the dual-chip configuration to work in tandem:
- Host-side -> serializing and publishing notification very fast over HCI -> achieving a minimal time waiting
- Controller-side -> offloading the full BLE-OTA transfer
- Tuned with sufficiently large HCI events queues according to the report rate/size/types
It’s a pity to have two powerful MCUs (STM32H7 on host + nrf52840 on controller) and having such a poor behavior.
Impact
As stated this “poor notification performances” are very annoying as it considerably degrades the full system performance, and sometimes delays time-critical operations (acquisition threads…).
Environment (please complete the following information):
- Building on Windows 10, connecting my device (BLE) on the same laptop.
- Toolchain
- Zephyr SDK 0.14.1
- Zephyr v3.0 > Similar behavior on Zephyr v3.1
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 29 (14 by maintainers)
Hi, yeah we’ll try with UART. Yeah we were using two 840s, 1 host 1 ctlr. We’ve successfully reproduced the issue on the stm32f303re nucleo board (only fast stm32 I had). Unfortunately it doesn’t have trace pins so we’ll have to make do without a trace probe, the investigation will be a bit longer.
edit: tried with UART, we get a ~170us processing time (which is still high), at 1Mbits. We will try to investigate what’s happening.
Hi again,
As you had a concern on the HCI_SPI driver implementation, I investigated a bit further and realized that adding a delay (randomly placed and tuned at 200us) in the
bt_spi_sendfunction was totally solving my issue.Making the overall
bt_gatt_notifycall duration -> ~15us (instead of 1.5ms as stated in the issue) which is in line with what I expected.It seems that there is some kind of data race in there, what’s interesting is that a simple
k_yieldis not sufficient to solve the issue.@flo-carrere
There is no STM32M4 (STmicro SoC) in nordic chips. I suppose you’re rather referring to the Cortext-M4 CPU core.
In any case >1ms processing is excessive I totally agree. We have tests with real HW downstream (in nRF Connect SDK) that measure all kinds of things, including performance, so that comes as a surprise to me. The performance tests are measuring throughput so it’s possible that we are missing a test case though, I’ll double-check.
I’m definitely suspecting either a misconfiguration or the HCI driver itself, I will try to reproduce it but right now #25917 is my priority.