core: After update from 2023.2.5 to 2023.3.6 esphome bluetooth stack takes 25 to retry connecting after error (error not seen)

The problem

As a topic. As we know, since 2023.3 update we had problem with error index array out of range. We have update HA 2023.3.6 and this problem was fixed, but, new problem happends.

Lock FW is newest

Iam using BT-proxy - just say this to know that BT connections and signal strength should be very fine. Hardware with High performance bt dongle, 3m USB cable, away from USB 3.0 ports. Till 2023.x HA everything working very well.

Now problem is, that lock goes offline. Not unavialable, just offline - this says log. Cant connect to it, via HA. BT proxy log, says this, HA logs too. Connection takes over 50 seconds. I must say, that if i using Switchbot app - everything working ok, lock/unlock takes less than 1 sec.

And i have other devices from Switchbot and everyone working ok, without problem, without delay etc.

What version of Home Assistant Core has the issue?

2023.3.6

What was the last working version of Home Assistant Core?

2023.2.x

What type of installation are you running?

Home Assistant OS

Integration causing the issue

SwitchBot

Link to integration documentation on our website

https://www.home-assistant.io/integrations/switchbot

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

C1:89:44:4B:2B:A5

From bbt proxy log:

[13:14:00][I][bluetooth_proxy:250]: [0] [lockmacaddress] Connecting v3 with cache
[13:14:00][D][esp32_ble_tracker:214]: Pausing scan to make connection...
[13:14:00][I][esp32_ble_client:064]: [0] [lockmacaddress] 0x01 Attempting BLE connection
......
[13:14:01][W][esp32_ble_client:134]: [0] [] Connection failed, status=133
.....
.....
[13:14:51][D][esp-idf:000]: W (60434078) BT_HCI: hcif disc complete: hdl 0x0, rsn 0x3e

as You see, from unlock command to completed and lock unlock - 51sec



from HA logs:

Logger: homeassistant.core
Source: components/switchbot/lock.py:54
First occurred: 12:43:35 (1 occurrences)
Last logged: 12:43:35

Error executing service: <ServiceCall lock.unlock (c:01GWC96ENHARJM7KZGYPZCJM69): entity_id=['lock.zamek_switchbot']>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/client.py", line 550, in bluetooth_device_connect
    await event.wait()
  File "/usr/local/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/client.py", line 549, in bluetooth_device_connect
    async with async_timeout.timeout(timeout):
  File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 129, in __aexit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 212, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/esphome/bluetooth/client.py", line 100, in _async_wrap_bluetooth_operation
    return await func(self, *args, **kwargs)
  File "/usr/src/homeassistant/homeassistant/components/esphome/bluetooth/client.py", line 314, in connect
    await self._client.bluetooth_device_connect(
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/client.py", line 575, in bluetooth_device_connect
    raise TimeoutAPIError(
aioesphomeapi.core.TimeoutAPIError: Timeout waiting for connect response while connecting to lockmacaddress after 20.0s, disconnect timed out: True,  after 5.0s

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/bleak_retry_connector/__init__.py", line 342, in establish_connection
    await client.connect(
  File "/usr/src/homeassistant/homeassistant/components/bluetooth/wrappers.py", line 249, in connect
    connected = await super().connect(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/bleak/__init__.py", line 471, in connect
    return await self._backend.connect(**kwargs)
  File "/usr/src/homeassistant/homeassistant/components/esphome/bluetooth/client.py", line 102, in _async_wrap_bluetooth_operation
    raise asyncio.TimeoutError(str(err)) from err
asyncio.exceptions.TimeoutError: Timeout waiting for connect response while connecting to lockmacaddress after 20.0s, disconnect timed out: True,  after 5.0s

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/core.py", line 1826, in catch_exceptions
    await coro_or_task
  File "/usr/src/homeassistant/homeassistant/core.py", line 1845, in _execute_service
    await cast(Callable[[ServiceCall], Awaitable[None]], handler.job.target)(
  File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 213, in handle_service
    await service.entity_service_call(
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 686, in entity_service_call
    future.result()  # pop exception if have
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 961, in async_request_call
    await coro
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 726, in _handle_entity_call
    await result
  File "/usr/src/homeassistant/homeassistant/components/lock/__init__.py", line 106, in _async_unlock
    await entity.async_unlock(**remove_entity_service_fields(service_call))
  File "/usr/src/homeassistant/homeassistant/components/switchbot/lock.py", line 54, in async_unlock
    self._last_run_success = await self._device.unlock()
  File "/usr/local/lib/python3.10/site-packages/switchbot/devices/lock.py", line 168, in unlock
    return await self._lock_unlock(
  File "/usr/local/lib/python3.10/site-packages/switchbot/devices/lock.py", line 189, in _lock_unlock
    await self._enable_notifications()
  File "/usr/local/lib/python3.10/site-packages/switchbot/devices/lock.py", line 255, in _enable_notifications
    result = await self._send_command(COMMAND_ENABLE_NOTIFICATIONS)
  File "/usr/local/lib/python3.10/site-packages/switchbot/devices/lock.py", line 299, in _send_command
    result = await self._ensure_encryption_initialized()
  File "/usr/local/lib/python3.10/site-packages/switchbot/devices/lock.py", line 314, in _ensure_encryption_initialized
    result = await self._send_command(
  File "/usr/local/lib/python3.10/site-packages/switchbot/devices/lock.py", line 297, in _send_command
    return await super()._send_command(key[:2] + "000000" + key[2:], retry)
  File "/usr/local/lib/python3.10/site-packages/switchbot/devices/device.py", line 180, in _send_command
    return await self._send_command_locked(key, command)
  File "/usr/local/lib/python3.10/site-packages/switchbot/devices/device.py", line 397, in _send_command_locked
    await self._ensure_connected()
  File "/usr/local/lib/python3.10/site-packages/switchbot/devices/device.py", line 264, in _ensure_connected
    client: BleakClientWithServiceCache = await establish_connection(
  File "/usr/local/lib/python3.10/site-packages/bleak_retry_connector/__init__.py", line 359, in establish_connection
    _raise_if_needed(name, device.address, exc)
  File "/usr/local/lib/python3.10/site-packages/bleak_retry_connector/__init__.py", line 308, in _raise_if_needed
    raise BleakNotFoundError(msg) from exc
bleak_retry_connector.BleakNotFoundError:  (lockmacaddress) - lockmacaddress: Failed to connect: Timeout waiting for connect response while connecting to lockmacaddress after 20.0s, disconnect timed out: True,  after 5.0s



and



Logger: switchbot.devices.device
Source: /usr/local/lib/python3.10/site-packages/switchbot/devices/device.py:182
First occurred: 12:43:35 (1 occurrences)
Last logged: 12:43:35

(lockmacaddress): device not found, no longer in range, or poor RSSI: -73
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/client.py", line 550, in bluetooth_device_connect
    await event.wait()
  File "/usr/local/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/client.py", line 549, in bluetooth_device_connect
    async with async_timeout.timeout(timeout):
  File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 129, in __aexit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 212, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/esphome/bluetooth/client.py", line 100, in _async_wrap_bluetooth_operation
    return await func(self, *args, **kwargs)
  File "/usr/src/homeassistant/homeassistant/components/esphome/bluetooth/client.py", line 314, in connect
    await self._client.bluetooth_device_connect(
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/client.py", line 575, in bluetooth_device_connect
    raise TimeoutAPIError(
aioesphomeapi.core.TimeoutAPIError: Timeout waiting for connect response while connecting to lockmacaddress after 20.0s, disconnect timed out: True,  after 5.0s

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/bleak_retry_connector/__init__.py", line 342, in establish_connection
    await client.connect(
  File "/usr/src/homeassistant/homeassistant/components/bluetooth/wrappers.py", line 249, in connect
    connected = await super().connect(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/bleak/__init__.py", line 471, in connect
    return await self._backend.connect(**kwargs)
  File "/usr/src/homeassistant/homeassistant/components/esphome/bluetooth/client.py", line 102, in _async_wrap_bluetooth_operation
    raise asyncio.TimeoutError(str(err)) from err
asyncio.exceptions.TimeoutError: Timeout waiting for connect response while connecting to lockmacaddress after 20.0s, disconnect timed out: True,  after 5.0s

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/switchbot/devices/device.py", line 180, in _send_command
    return await self._send_command_locked(key, command)
  File "/usr/local/lib/python3.10/site-packages/switchbot/devices/device.py", line 397, in _send_command_locked
    await self._ensure_connected()
  File "/usr/local/lib/python3.10/site-packages/switchbot/devices/device.py", line 264, in _ensure_connected
    client: BleakClientWithServiceCache = await establish_connection(
  File "/usr/local/lib/python3.10/site-packages/bleak_retry_connector/__init__.py", line 359, in establish_connection
    _raise_if_needed(name, device.address, exc)
  File "/usr/local/lib/python3.10/site-packages/bleak_retry_connector/__init__.py", line 308, in _raise_if_needed
    raise BleakNotFoundError(msg) from exc
bleak_retry_connector.BleakNotFoundError:  (lockmacaddress) - lockmacaddress: Failed to connect: Timeout waiting for connect response while connecting to lockmacaddress after 20.0s, disconnect timed out: True,  after 5.0s

Additional information

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 101 (39 by maintainers)

Most upvoted comments

I also fixed the disconnection race while I reworked that esphome code so it should be a bit more reliable as well

ok, will open new issue. thanks

ok iam for 99% sure that problem solved. Even if first time connection failes, reconnect is right away and finally second time lock opened. Its takes about 2 sec and its very ok.

[21:26:35][I][bluetooth_proxy:254]: [0] [C1:89:44:4B:2B:A5] Connecting v3 without cache
[21:26:35][D][esp32_ble_tracker:214]: Pausing scan to make connection...
[21:26:35][I][esp32_ble_client:064]: [0] [C1:89:44:4B:2B:A5] 0x01 Attempting BLE connection
[21:26:35][D][esp-idf:000]: W (33716892) BT_HCI: hcif disc complete: hdl 0x0, rsn 0x3e
[21:26:35][D][esp-idf:000]: W (33716994) BT_HCI: hcif disc complete: hdl 0x0, rsn 0x3e
[21:26:35][D][esp-idf:000]: W (33717102) BT_HCI: hcif disc complete: hdl 0x0, rsn 0x3e
[21:26:35][D][esp-idf:000]: W (33717204) BT_HCI: hcif disc complete: hdl 0x0, rsn 0x3e
[21:26:35][D][esp-idf:000]: W (33717209) BT_APPL: gattc_conn_cb: if=3 st=0 id=3 rsn=0x3e
[21:26:35][D][esp-idf:000]: W (33717211) BT_APPL: gattc_conn_cb: if=4 st=0 id=4 rsn=0x3e
[21:26:35][D][esp-idf:000]: W (33717212) BT_APPL: gattc_conn_cb: if=5 st=0 id=5 rsn=0x3e
[21:26:35][W][esp32_ble_client:134]: [0] [] Connection failed, status=133
[21:26:35][D][esp32_ble_tracker:246]: Starting scan...
[21:26:36][I][bluetooth_proxy:254]: [0] [C1:89:44:4B:2B:A5] Connecting v3 without cache
[21:26:36][D][esp32_ble_tracker:214]: Pausing scan to make connection...
[21:26:36][I][esp32_ble_client:064]: [0] [C1:89:44:4B:2B:A5] 0x01 Attempting BLE connection
[21:26:36][D][esp-idf:000]: W (33718482) BT_HCI: hcif disc complete: hdl 0x0, rsn 0x3e
[21:26:37][I][esp32_ble_client:196]: [0] [C1:89:44:4B:2B:A5] Connected
[21:26:37][D][esp32_ble_tracker:246]: Starting scan...
[21:26:51][I][esp32_ble_client:081]: [0] [C1:89:44:4B:2B:A5] Disconnecting.
[21:26:51][D][esp-idf:000]: W (33733344) BT_HCI: hci cmd send: disconnect: hdl 0x0, rsn:0x13
[21:26:51][D][esp-idf:000]: W (33733353) BT_APPL: gattc_conn_cb: if=3 st=0 id=3 rsn=0x16
[21:26:51][D][esp-idf:000]: W (33733356) BT_APPL: gattc_conn_cb: if=4 st=0 id=4 rsn=0x16
[21:26:51][D][esp-idf:000]: W (33733359) BT_APPL: gattc_conn_cb: if=5 st=0 id=5 rsn=0x16
[21:26:51][D][esp-idf:000]: W (33733383) BT_HCI: hcif disc complete: hdl 0x0, rsn 0x16

Yes i updated to 2023.5.2 and HA starting ok.

I didn’t realize you were trying on the old version. It will only work on 2023.5.x

I understand correctly that in the bt-proxy log there should no longer be a delay of 25 seconds, but there should be a reconnection right away? Should I pay attention to that?

Yes thats the goal.

It looks like it won’t work because of the conflict with the sub dep. Thats a shame as I would really have liked to get a good test but I’m 99% sure I’ve solved it so go ahead and remove the custom version and let’s see if 2023.5.3 does the trick

Helps to be looking in the right place.

Thanks @KitHubek for all the debugging and logs 👍

@KitHubek I’ve confident the linked PR will fix the issue, but it would be great to get a test

cd /config
curl -o- -sSL https://gist.githubusercontent.com/bdraco/43f8043cb04b9838383fd71353e99b18/raw/core_integration_pr | bash /dev/stdin -d esphome -p 92741
cat /config/custom_components/esphome/manifest.json

Verify that the version displayed above is “aioesphomeapi==13.7.4” and NOT 3.7.3

Restart HA

Test

Delete /config/custom_components/esphome and restart after testing is completed to restore the core version

https://github.com/home-assistant/core/pull/new/ble_retry_fix

The fix is in this branch.

Github is having problems at the moment so I can’t create a PR to fix it

https://github.com/esphome/aioesphomeapi/pull/391 is definitely the cause of the breakage

# grep -A 2 BluetoothDeviceConnectionResponse home-assistant.log
2023-05-07 13:27:01.625 DEBUG (MainThread) [aioesphomeapi.connection] master_tvcabinet_32 @ 192.168.106.86: Got message of type <class 'api_pb2.BluetoothDeviceConnectionResponse'>: address: 132613642173324
connected: true
mtu: 512

So I suspect what is happening is the connection is initially successful that than we get a follow message that the connection has failed and since https://github.com/esphome/aioesphomeapi/pull/391 now resolves the connect right away we never see the error and we have to wait the full timeout

related PRs during that timeframe

https://github.com/esphome/aioesphomeapi/pull/391

Hey there @ottowinter, @jesserockz, mind taking a look at this issue as it has been labeled with an integration (esphome) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of esphome can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign esphome Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


esphome documentation esphome source (message by IssueLinks)

In the mean time can you turn off logs for bleak and turn on logs for aioesphomeapi and homeassistant.components.esphome and get it to fail to connect again?

It’s good to know that the problem isn’t in the SwitchBot integration and it’s deeper in the Bluetooth stack.

I’ve spent most of the available time digging through the SwitchBot code so I’ve put the effort in the wrong place.

My day job is really busy right now but I will try to find some time this week to dig through the Bluetooth stack (it’s 7 or 8 libraries) and see if I can at least get a better way to debug this

So the problem is likely that the error isn’t being propagated back. 25 seconds is the timeout to try again if we get no response

ok, when avialable 2023.3.7 will update and send logs here. For now stays on 2023.2.5 because its lastes ver. working ok