core: HA is not reconnecting to ESPHome devices

The problem

Over the past few months I have been experiencing an issue where HA would disconnect from an ESPHome device and not try to reconnect. I included below some of the logs I am seeing. This issue is very statistical, I saw it happen once a month and I saw it happen a couple of times a week (even once a day).

Even more strange is the fact that none of these logs are at the time of disconnection (in the example below - the ESPHome device is disconnected from HA since 1PM and these logs are from around 10AM and there are no logs regarding ESPHome from around 1PM)

The ESPHome device itself is connected to the WiFi network and is accessible via the web interface. I am also using an API password and encryption key, a static IP, power save mode NONE and other recommended features enabled that should have fixed this issue. (see YAML example below).

I tried to reset the ESPHome device and update its ESPHome version - but it did not solve the issue (HA is still disconnected). However, when I reload the ESPHome integration - it all works (until it disconnects again). This is what led me to believe that the issue is probably in HA and not in ESPHome.

If there’s anything else I can provide to help debug this issue, please let me know. Thank you in advance!

What version of Home Assistant Core has the issue?

core-2022.2.0

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ESPHome

Link to integration documentation on our website

https://www.home-assistant.io/integrations/esphome/

Example YAML snippet

esphome:
  name: "esp2"
  platform: "ESP8266"
  board: "esp01_1m"

logger:

api:
  password: !secret api_password
  encryption:
    key: !secret api_key

ota:
  password: !secret ota_password

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  power_save_mode: NONE
  fast_connect: on
  reboot_timeout: 5min
  manual_ip:
    static_ip: "****"
    gateway: !secret wifi_gateway
    subnet: !secret wifi_subnet
    dns1: !secret wifi_dns1
    dns2: !secret wifi_dns2
  ap:
    ssid: !secret fallback_wifi_ssid
    password: !secret fallback_wifi_password
    ap_timeout: 2min

captive_portal:

web_server:
  port: 80
  include_internal: true

sensor:
  - platform: wifi_signal
    name: "WiFi Signal Sensor"
    internal: true
    update_interval: 60s

output:
  - platform: esp8266_pwm
    pin: 12
    frequency: 1000 Hz
    id: pwm_b
  - platform: esp8266_pwm
    pin: 13
    frequency: 1000 Hz
    id: pwm_g
  - platform: esp8266_pwm
    pin: 15
    frequency: 1000 Hz
    id: pwm_r

light:
  - platform: rgb
    name: "Light"
    id: esp2_light
    red: pwm_r
    green: pwm_g
    blue: pwm_b

Anything in the logs that might be useful for us?

2022-02-03 10:18:25 WARNING (MainThread) [aioesphomeapi.reconnect_logic] Can't connect to ESPHome API for esp1 @ ***: Error connecting to ('***', 6053): [Errno 113] Connect call failed ('***', 6053)
2022-02-03 10:19:09 WARNING (MainThread) [aioesphomeapi.reconnect_logic] Can't connect to ESPHome API for esp2 @ ***: Timeout waiting for response for <class 'api_pb2.ConnectRequest'>
2022-02-03 10:24:12 WARNING (MainThread) [homeassistant.components.websocket_api.http.connection] [548034806976] Disconnected: Did not receive auth message within 10 seconds
2022-02-03 10:24:32 WARNING (MainThread) [homeassistant.components.websocket_api.http.connection] [548271912416] Disconnected: Did not receive auth message within 10 seconds

Additional information

ESPHome integration Diagnostics

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

Thanks for the logs guys/gals!

Using the logs I was able to identify one real problem, when it’s merged perhaps you can check if https://github.com/home-assistant/core/pull/66189 fixes the issue for you.

Specifically (from the logs from @RobinMarsollier) here’s an example.

First the device disconnects, and a reconnect is immediately started

2022-02-08 21:03:42 INFO (MainThread) [aioesphomeapi.reconnect_logic] Disconnected from ESPHome API for bedroom_sensors @ 192.168.0.28
2022-02-08 21:03:42 DEBUG (MainThread) [aioesphomeapi.connection] bedroom_sensors @ 192.168.0.28: Socket closed, stopping read loop
2022-02-08 21:03:42 DEBUG (MainThread) [aioesphomeapi.connection] bedroom_sensors @ 192.168.0.28: Connecting to 192.168.0.28:6053 (AddrInfo(family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, proto=6, sockaddr=IPv4Sockaddr(address='192.168.0.28', port=6053)))

That socket is opened successfully and a TCP connection established, as we can see by this later message

2022-02-08 21:03:49 DEBUG (MainThread) [aioesphomeapi.connection] 192.168.0.28: Opened socket for

https://github.com/esphome/aioesphomeapi/blob/v10.8.1/aioesphomeapi/connection.py#L167

Next is initializing the frame helper and performing the handshake, and indeed we can see an (empty) ClientHello handshake frame being transmitted in the next log line

2022-02-08 21:03:49 DEBUG (MainThread) [aioesphomeapi._frame_helper] Sending frame 

https://github.com/esphome/aioesphomeapi/blob/v10.8.1/aioesphomeapi/_frame_helper.py#L148

Now the logging here isn’t great (unable to see which Received frame corresponds to which connection), but I think what’s happening is the connection is silently closed (for example by the device dropping off the network at just the right time), and so the read_frame to get the server’s response just hangs forever.

That is a bug and is fixed by https://github.com/home-assistant/core/pull/66189, but I can’t say with certainty if it’s the issue you’re seeing.