core: Permanent error: Client exceeded max pending messages [2]: 512

Home Assistant release with the issue:

HA 0.98.5

Last working Home Assistant release (if known): HA 0.91 (0.92.0?)

Operating environment (Hass.io/Docker/Windows/etc.):

HASS.io/Docker/Raspbian Buster

Component/platform:

homeassistant.components.websocket_api.http.connection

Description of problem: I have to reopen the issue (https://github.com/home-assistant/home-assistant/issues/23938) since it was unreasonably closed (some people have the same problem but in different environment). To be more exact, no use of Node-Red and no HA automation restarts but issue is till on.

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):

Tons of errors in error log with no any clear reason. I can have more than 15.000 such errors a day.

Traceback (if applicable):

ERROR (MainThread) [homeassistant.components.websocket_api.http.connection.xxxx] Client exceeded max pending messages [2]: 512

Additional information:

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 14
  • Comments: 76 (16 by maintainers)

Most upvoted comments

Found what was causing these errors for me and can reproduce 100%

  1. Open and edit a Lovelace card in Firefox (have not tested Chrome yet).
  2. Close the card edit and exit UI edit mode, do not refresh the browser.
  3. Navigate to another view or sidebar menu a few times.
  4. Sooner rather than later Firefox starts throwing up rapidly repeating “Refresh Lovelace” toast notifications in the lower left corner. This may eventually cause the screen to show a blank page with a useless “Refresh Lovelace” button in the centre. This causes thousands of errors in a very short time.

Standard browser refresh clears the error loop.

So I guess I just have to remember to refresh my browser after editing a card.

Possibly a polymer/frontend issue with how HA attempts to refresh Firefox.

I may have a suggestion, I can’t prove it’ll work for anybody, and if my idea still requires a tweak. I was getting “Client exceeded max pending messages” because I have too many history graphs. Graphs have to query the database for entity_id, state, last_updated. I’m using mariadb so I used a tool DBEAVER to connect to HA, I took a peek at the index in homeassistant “states” table, the closest index I found that could be used for history graphs is named “ix_states_entity_id_last_updated” it has columns (entity_id, last_updated) only. so a SQL query used to retrieve the data to draw history graphs would have to use the index then read the table by rowid to retrieve state data, to plot column state on y axis, column last_updated on x axis for a specific entity_id. the index is efficient, table access is efficient but there will be extra table disk I/O to get all data needed for 1 data point in history graph. but if we have enough history graphs there will be concurrent SQL with contention on the table disk I/O portion.

My hunch for my case this is where the bottleneck is for my hardware rbi3b+ with a solid state drive. as an experiment I’ve added a new Index “ix_states_entity_id_last_updated2” to help optimize the query just slightly so it wouldn’t have to access the table to get the needed data, the index storage size will go up slightly, but everything the query needs will be in the index and less disk I/O

here’s the DDL to add the index for mariadb, if you don’t think the index can be unique for the 3 column combination then remove the UNIQUE keyword.

CREATE UNIQUE INDEX 
  IF NOT EXISTS ix_states_entity_id_last_updated2
    using btree
    ON states (entity_id, state, last_updated);

here’s the DDL to drop the index for mariadb, after you’re done with the experiment

drop index ix_states_entity_id_last_updated2 on states;

I’m still looking at my logs for the max pending messages, I haven’t seen it lately, that could be because of my experiment or HA version 2021.11.5 helped on the solid state drive hardware I use. its very likely my new index is ignored completely. I’m still keeping an eye on my logs. In an Oracle database if you had enough concurrent SQL queries, such an index would have made the optimizer use it and reduce contention on the table, in Oracle Index data are kept in cache better than table data. Not sure if Mariadb would behave similarly to the index tweak as an Oracle database,

if the experiment is successful, than the index “ix_states_entity_id_last_updated” would be replaced with the change, we shoudn’t add another index unless there is a need, the rbpi3b+ has only so much ram to cache indexes.

the above DDL would be changed to, that is if mariadb allows create or replace from an index that is not unique to an index that is unique. if it doesn’t you would have to drop the index 1st, then add it.

CREATE OR REPLACE UNIQUE INDEX 
  ix_states_entity_id_last_updated
    using btree
    ON states (entity_id, state, last_updated);

Hi @pierre2113, I use ZWave USB in HA integration, however, it wasn’t my ZWave integration after all. I think I have managed to track my issue down to a script which I use to control one of my wall tablets running Fully Kiosk. I am now in the process of working out why it is causing this especially since it is identical to three others doing the same thing.

I think is related to chrome tabs that remains open for long time. More details below:

  • I’ve tried with and without: hacs - appdaemon - custom_components - custom_cards and (except for custom_cards) I didn’t find a direct correlation. For custom_card there are 2 different causes: UI not correctly reloaded (most common - use CTRL+F5 in chrome to reload) and duplication/overlap of cards/integration for who now uses hacs but before added them manually (a lot more buggy - double check if all customization are removed before using hacs to add them)
  • I’ve found that one automation was not triggered during the ERROR so I’ve made an mqtt watchdog timer to exactly see when the problem occurs:
  • Initially I’ve found that (with my config) it happens about twice a day, Hass is not responsive during the error BUT 3 out of 4 times it self-restored - after some times (variable between 30sec and 8min) Hass works as if nothing happened before. So many times the error is not appreciable, It causes problem to the user if he tries to use the UI during the error.
  • But, MOST IMPORTANT, I’ve found that the error happens ONLY if there is a client (phone or pc - chrome or home assistant companion) with the ui open for long time. Now, that i’m closing always any Home Assistant tab after use, the error is gone (for now 12h without any error and keep going)
  • It seems, but not sure about this, that it’s related to the fact that the pc/phone with the tab open goes on sleep mode or there are (like in my case) wifi/network jumps

Probably Hass and Frontend does not manage correctly the socket disconnection-reconnection. Hass must stop sending data through websocket after first pending messages and meanwhile Frontend must attempt a reconnection (like when hass is restarted but ui is keeped open)

Hope it helps