zephyr: Nordic BLE fails assertion when logging is enabled

Describe the bug

When logging is enabled on a BLE peripheral and data is being sent, and the assertion here fails.

This still happens even when I enable CONFIG_FORCE_NO_ASSERT.

To Reproduce

I don’t have a minimal reproduction yet, and reproducing my exact setup is a bit complicated (honestly I don’t expect anyone to fully replicate this, and I’ll try to provide as much info so people can quickly filter out the unimportant bits). The full background is that this is a split keyboard using the ZMK Firmware, which for 2 halves of a board it uses a BLE connection to send the keypress data across. Here’s the repo for the board. The specific board name is fissure_left (which uses a nrf52840 module). The specific Zephyr version used is f2e12d55a1 in this branch. When I encounter the problem, I build the board fissure_left with logging on (edit: I’m only using RTT logging, but disabling RTT and using the USB UART backend also produces a crash), attach the Jlink probe for the debug log, wait for the connection to happen, and press a key on the peripheral side. (log is provided in the corresponding section)

Expected behavior

No assertion happens, the split keyboard half communicates fine with the central half, but with logging to the probe.

Impact

Major annoyance. Normally logging is not enabled (especially for the peripheral side), but I needed to investigate some issue which leads to spurious resetting. This can certainly be done with backtrace information when the system resets, but having logging will definitely help.

Logs and console output

*** Booting Zephyr OS build 02c77d4cc96a  ***

[00:00:00.319,122] <inf> fs_nvs: 8 Sectors of 4096 bytes
[00:00:00.319,152] <inf> fs_nvs: alloc wra: 0, f68
[00:00:00.319,152] <inf> fs_nvs: data wra: 0, f8
[00:00:00.320,739] <inf> bt_hci_core: No ID address. App must call settings_load()
[00:00:00.320,953] <dbg> zmk: ble_profiles_handle_set: Setting BLE value profiles/0
[00:00:00.321,105] <dbg> zmk: ble_profiles_handle_set: Loaded CB:C9:A8:EA:32:1E (random) address for profile 0
[00:00:00.326,354] <dbg> zmk: zmk_ble_ready: ready? 0
[00:00:00.326,354] <dbg> zmk: update_advertising: advertising from 0 to 2
[00:00:00.331,817] <dbg> zmk: bvd_sample_fetch: ADC raw 2450 ~ 1794 mV => 3588 mV
[00:00:00.331,817] <dbg> zmk: bvd_sample_fetch: Percent: 7
[00:00:00.331,817] <dbg> zmk: zmk_battery_update: Setting BAS GATT battery level to 7.
[00:00:00.332,458] <inf> zmk: Welcome to ZMK!

[00:00:03.467,224] <dbg> zmk: connected: Connected thread: 0x200037e0
[00:00:03.467,346] <dbg> zmk: connected: Connected CB:C9:A8:EA:32:1E (random)
[00:00:03.468,627] <dbg> zmk: update_advertising: advertising from 0 to 0
[00:00:03.468,627] <dbg> zmk: connected: Active profile connected
[00:00:03.469,604] <dbg> zmk: get_selected_endpoint: Only BLE is ready.
[00:00:03.768,341] <dbg> zmk: split_svc_pos_state_ccc: value 1
[00:00:03.768,524] <dbg> zmk: security_changed: Security changed: CB:C9:A8:EA:32:1E (random) level 2
[00:00:09.039,642] <dbg> zmk: kscan_gpio_read: Sending event at 0,19 state on
[00:00:09.039,733] <dbg> zmk: zmk_kscan_process_msgq: Row: 0, col: 19, position: 36, pressed: true
[00:00:09.039,764] <dbg> zmk: split_listener: 
ASSERTION FAIL [next] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll.c:653

[00:00:09.106,262] <err> os: r0/a1:  0x00000003  r1/a2:  0x00000004  r2/a3:  0x00000003
[00:00:09.106,262] <err> os: r3/a4:  0x20003128 r12/ip:  0xdfffebbc r14/lr:  0x0001b691[0m
[00:00:09.106,262] <err> os:  xpsr:  0x41000028
[00:00:09.106,292] <err> os: Faulting instruction address (r15/pc): 0x0001b69c
[00:00:09.106,323] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:09.106,323] <err> os: Fault during interrupt handling

[00:00:09.106,353] <err> os: Current thread: 0x20003bd0 (unknown)
[00:00:09.428,588] <err> os: Halting system

Environment (please complete the following information):

  • OS: Arch (updated in ~1 week)
  • Toolchain: GNUARMEMB
  • Commits used are provided above

Additional context

N/A

As suggested by @Emil-Gydesen-Bose in Discord, cc’ing @cvinayak

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 33 (17 by maintainers)

Most upvoted comments

Yep it does:

[00:00:05.407,745] <dbg> zmk: split_listener: 
[00:00:05.407,836] <err> os: ***** MPU FAULT *****
[00:00:05.407,836] <err> os:   Stacking error (context area might be not valid)
[00:00:05.407,836] <err> os:   Data Access Violation
[00:00:05.407,867] <err> os:   MMFAR Address: 0x20007238
[00:00:05.407,867] <err> os: r0/a1:  0x2000c780  r1/a2:  0x20008000  r2/a3:  0x00000013
[00:00:05.407,897] <err> os: r3/a4:  0x0000c00d r12/ip:  0x00000002 r14/lr:  0x00000000
[00:00:05.407,897] <err> os:  xpsr:  0x61000000
[00:00:05.407,897] <err> os: Faulting instruction address (r15/pc): 0x0002a6b6
[00:00:05.407,958] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
[00:00:05.407,989] <err> os: Current thread: 0x20001270 (unknown)
[00:00:05.766,113] <err> os: Halting system

Would still need some digging after this point to know what exactly caused the problem, but I think this would be a good start to find the problem. Thanks!

I set

CONFIG_ZMK_SPLIT_BLE_ROLE_CENTRAL=2048
CONFIG_MAIN_STACK_SIZE=4096
CONFIG_ISR_STACK_SIZE=4096

, and I still can’t see main and ISR stack usage, but the exact same assertion still occurs

Okay I merged and tested, the logs seem the same. So I set CONFIG_BT_LOG_LEVEL_DBG=y and the log is here: ble.log

I haven’t done a line by line comparison, but the logs seem identical to me.

I’m currently outside, will make sure to cherry pick and test once I get back

I got the ping, but I’m fairly busy for the last 2 days. I’ll try to see if I can get some testing done tonight or tomorrow night

Yes, I checked and they are all 5