zephyr: Bluetooth: Host: Extended advertising reports may block the host

Is your enhancement proposal related to a problem? Please describe. In the case that we are doing scanning in a very busy environment (e.g. hundreds of extended advertising reports per second), then the host will receive so many events that it may effectively block the host from working, if it receives more reports than it can handle. Many reports may cause the events to be queued, and thus calling an HCI command (e.g. bt_le_scan_stop()) will timeout waiting for the return event.

This is only an issue for extended advertising reports (BT_HCI_EVT_LE_EXT_ADVERTISING_REPORT) as legacy reports (BT_HCI_EVT_LE_ADVERTISING_REPORT) are usually marked as discardable, e.g. by

		if (rx.evt.evt == BT_HCI_EVT_LE_META_EVENT &&
		    (rx.hdr[sizeof(*hdr)] == BT_HCI_EVT_LE_ADVERTISING_REPORT)) {
			BT_DBG("Marking adv report as discardable");
			rx.discardable = true;
		}

Due to the reassembly requirements of extended advertising, we cannot just discard random reports (we could potentially discard reports with the status BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS_COMPLETE).

This is basically a performance issue, and thus not marked as a bug.

Describe the solution you’d like Ideally we’d find some way of discard extended advertising reports, so that we can just discard them and keep the number of events queued small, thus eliminating the issue with HCI command events.

Describe alternatives you’ve considered A small work around is discard all advertising reports once bt_le_scan_set_enable(BT_HCI_LE_SCAN_DISABLE); has been called. It is safe to assume that the upper layers do not care about any queued reports once this has been called, and that may help to clear the queued reports.

Additional context This was found doing UPF and was happening often as extended advertising was used extensively. The issue appeared using the nRF5340.

About this issue

Original URL
State: open
Created 2 years ago
Comments: 17 (11 by maintainers)

Most upvoted comments

Then I’ll just note down the WIP branch with the in-progress workaround (https://github.com/jori-nordic/zephyr/commits/ext-adv-debug) and close this issue as unreproducible.

jori-nordic on Dec 14, 2022

@Thalley so it’s not really controller, but rather the message is stuck on the network core because we are stalling the IPC RX endpoint (in the app core). The stall happens because we are doing a blocking wait for event buffers, in a loop.

So I think we can definitely work around it (in the app core/host hci driver), which am trying right now. The w/a might be quite involved though, see the discussion I had this morning with herman above.

jori-nordic on Dec 1, 2022

@rugeGerritsen This is the host issued I mentioned at the UPF when receiving many scan reports.

Thalley on Sep 28, 2022