core: `aioesphomeapi` floods of errors+nodes slow

The problem

Since 2023.2, log is flooded with errors like below. ESPHome nodes go randomly offline, and became slow-responsive.

Things which are certain:

  • Before HA 2023.2 everything was nice and snappy, no errors in the logs
  • ESPHome nodes have up-to date firmware (at the time of writing this 2023.2.4)
  • HA 2023.3 didn’t fix this.

What version of Home Assistant Core has the issue?

2023.3.0

What was the last working version of Home Assistant Core?

2023.1.x

What type of installation are you running?

Home Assistant OS

Integration causing the issue

aioesphomeapi

Link to integration documentation on our website

https://www.home-assistant.io/integrations/esphome/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2023-03-02 15:40:11.629 WARNING (MainThread) [aioesphomeapi.connection] lampa-wc-fent @ 192.168.20.11: Connection error occurred: 
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/connection.py", line 537, in send_message_await_response_complex
    await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/connection.py", line 536, in send_message_await_response_complex
    async with async_timeout.timeout(timeout):
  File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 129, in __aexit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 212, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/connection.py", line 320, in _keep_alive_loop
    await self._ping()
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/connection.py", line 645, in _ping
    await self.send_message_await_response(PingRequest(), PingResponse)
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/connection.py", line 554, in send_message_await_response
    res = await self.send_message_await_response_complex(
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/connection.py", line 539, in send_message_await_response_complex
    raise TimeoutAPIError(
aioesphomeapi.core.TimeoutAPIError: Timeout waiting for response for <class 'api_pb2.PingRequest'> after 10.0s
2023-03-02 15:40:12.462 WARNING (MainThread) [aioesphomeapi.connection] lampa-venti-zuhany @ 172.22.215.32: Connection error occurred: 
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/connection.py", line 537, in send_message_await_response_complex
    await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/connection.py", line 536, in send_message_await_response_complex
    async with async_timeout.timeout(timeout):
  File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 129, in __aexit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 212, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/connection.py", line 320, in _keep_alive_loop
    await self._ping()
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/connection.py", line 645, in _ping
    await self.send_message_await_response(PingRequest(), PingResponse)
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/connection.py", line 554, in send_message_await_response
    res = await self.send_message_await_response_complex(
  File "/usr/local/lib/python3.10/site-packages/aioesphomeapi/connection.py", line 539, in send_message_await_response_complex
    raise TimeoutAPIError(
aioesphomeapi.core.TimeoutAPIError: Timeout waiting for response for <class 'api_pb2.PingRequest'> after 10.0s

Additional information

Accesing HA user interface through a reverse proxy became also painfully slow - not sure if it has to do with this, but the symptoms appeared in the same time.

https://github.com/home-assistant/core/issues/88535 https://github.com/home-assistant/core/issues/85432#issuecomment-1422633517 https://github.com/home-assistant/core/issues/87158#issuecomment-1436493301

https://discord.com/channels/330944238910963714/1012073425638015037/1061867739863597127

The aioesphomeapi lib has also been optimized reduce latency and improve performance as well on dev (will help all esphome devices not just ones with a Bluetooth proxy). At this point any future improvements are probably going to come from ESP – IDF 5 and the protobuf cpp switch (bdraco — 2023.01.09. 5:42)

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 126 (111 by maintainers)

Commits related to this issue

Most upvoted comments

I made some more tweaks to https://github.com/esphome/aioesphomeapi/pull/395

With the changes, even the device with the poor connection now stays connected to HA.

I’ve published the new version as https://github.com/bdraco/esphome_testing

You can install it manually by copying it into /config/custom_components/esphome

If you have HACS you can do it through HACS as a custom repo under HACS->Integrations->… menu->Custom Repo

After restarting check your log to make sure you are running the custom version 2023-03-06 05:09:01.898 WARNING (SyncWorker_2) [homeassistant.loader] We found a custom integration esphome which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant

No disconnects overnight with https://github.com/esphome/aioesphomeapi/pull/395 for the devices with poor wifi connections

Screenshot 2023-03-05 at 7 23 32 AM

awesome! I see that this will be merged for the next core release 2023.3.2. I’ll make sure to delete the HACS CC, and then update and then restart HA (if that is the right wat to do this).

I’ve restored power_save_mode: LIGHT to some of the previously affected ESP8266 nodes after installing this CC, let’s see…

2023-03-02 18:36:54.877 WARNING (MainThread) [asyncio] Executing <Task finished name=‘Task-93901’ coro=<DataUpdateCoordinator._handle_refresh_interval() done, defined at /usr/src/homeassistant/homeassistant/helpers/update_coordinator.py:179> result=None created at /usr/src/homeassistant/homeassistant/core.py:491> took 0.136 seconds

I opened a PR to make these a bit easier to track as well https://github.com/home-assistant/core/pull/89171

I’m open to test it as a custom component for you.

Enabling power_save_mode: LIGHT causes my devices to disconnect as well.

With https://github.com/esphome/aioesphomeapi/pull/395 I’m not seeing the disconnects anymore

why is it logging a warning if it’s only, at most, 175 ms?

Because I deliberately enabled the debugger.

I’ll let it run and closely observe for a few days, and let you know here. Thank you for your help.

U got mail at koston.org. About the second .svg I’m sure it cought it, the first one fifty/fifty.

Yeah that brought up the usual top-like screen…

Where did this put the record file?

bash-5.1# ./py-spy record --pid 66 --duration 120 --rate 100
py-spy> Sampling process 100 times a second for 120 seconds. Press Control-C to exit.

py-spy> Wrote flamegraph data to '66-2023-03-02T20:24:26+01:00.svg'. Samples: 2547 Errors: 1

Edit: it’s there, but not visible in the smb share… cp 66-2023-03-02T20:24:26+01:00.svg 66.svg revealed it