addons: Mosquitto 5.1.1 is broken.

The problem

Environment

  • Add-on with the issue: Mosquitto broker
  • Add-on release with the issue: 5.1.1
  • Last working add-on release (if known): 5.1
  • Operating environment (OS/Supervised): hassio

Problem-relevant configuration


Traceback/Error logs


Additional information

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 23
  • Comments: 158 (10 by maintainers)

Most upvoted comments

Made quite a bit of progress today, but I’m far from happy with the result at this point. Will continue my journey in the next couple of days.

Reserved my Thursday for looking into this add-on. Assigned myself.

Status update time 🕐

I’ve created quite a few versions of the add-on the past week and ended up with the simplest of them all… (NOTE TO SELF: Nothing beats KISS 😓 )

  • MQTT ✅
  • MQTT over WebSocket ✅
  • MQTT with SSL ✅
  • MQTT over WebSocket with SSL ✅
  • Local users auth ✅
  • Home Assistant users auth ✅
  • Home Assistant integration discovery ✅
  • Supervisor add-on services ✅
  • Migrated everything to S6 overlay ✅
  • Migrated everything to bashio ✅
  • Running on latest Alpine Linux ✅
  • Can handle a metric ton of auth requests in a couple of milliseconds per request ✅
  • Duration testing ✅
  • Anonymous users 🔴 Removed

I’m happy with the result, it seems to perform better. Currently got Z-Wave JS 2 MQTT, Home Assistant, a local mosquitto_sub subscribed to all topics and MQTT explorer connected; Around 25 ESPChip on ESPHome sending all a lot of status updates each second. Additionally opened up a script that tries to send in updates as fast as possible (while disconnecting/connecting between each update).

This is how fast it is now, each line in the screenshot below is a client that publishes a time (with microseconds) as fast as possible while re-connecting between each update:

image

I’ll leave this all running overnight to see how it holds and do a final round of reviewing my own code in the morning.

Morning update: Everything still alive and kicking.

@frenck I’ve come up with a slightly less abrasive option for a similar comment. It’s a bit more verbose but feel free to copy and paste it. The goal is to avoid discouraging good faith participation while still adhering to your preferred workflow. While it’s not as efficient to add a buffer for the emotional state of the reader it can be considered a “social optimization” to add a bit more explanation.

"If this isn’t resolved or there’s a new problem please create a new issue for each individual thing so it can be tracked and looked at.

We have a workflow that involves opening a new issue instead of reviving old ones; this greatly helps with our tracking and organizing. Re-opening an issue or combining multiple subjects into a single issue adds complexity to the tracking."

I know OP has left zero information here and it will probably be closed out, but on a surface level I have to agree. Something has gone wrong with this point update.

I get my whole Home Assistant instance go down overnight. When I try to access it over the network I cant connect.

After a week of troubleshooting I have narrowed it down to my Mqtt server. What seems to be happening is it loses connection to HA and throws the whole thing off. HA is still running and inaccessible. I have to power cycle to get it up and running again and when I do, I get errors in HA logs about “timed out waiting for mid 2” for Mqtt, and errors in mosquitto about being unable to find HA.

I also find my Zigbee2mqtt devices are found at startup then immediately lost again (presumably when the time out occurs) and have to restart Zigbee2mqtt a couple of times to get it running again.

I wish i could prove that the HA hang is due to mosquitto but since it’s inaccessible (but still running) I can’t. What I can say is that in Recorder I can see the only devices that are unavailable are Zigbee2mqtt and this drops because of mosquitto (there’s zero errors in the Zigbee2mqtt logs in debug mode).

I’ve restored an old backup of the previous version of Mosquitto to try and combat the issue. I’ll try to remember to report back.

Edit: I think the same issue is #1817 and #1814

Also this one https://github.com/home-assistant/core/issues/45036

I’m happy to report things are looking a lot more responsive now that I’ve upgraded to 6.0.1, the errors in the logs and laggy switch/light responses seem to be gone.

Many thanks @frenck

Confirming same. Very strange, sluggish system recently with hundreds of lines similar to

2021-02-27 07:18:29 ERROR (MainThread) [homeassistant.components.mqtt] Timed out waiting for mid 98

in home-assistant.log. After rolling Mosquitto back to 5.1, no more errors in the log, performance back to normal.

For those who want to rollback 5.1:

  • Backup and uninstall mosquitto 5.1.1
  • Fork mosquitto repository, edit “version”: “5.1.1” to “version”: “5.1.” in config.json
  • Add this custom repository in the supervisor’s add-on store and install I installed mosquitto 5.1 newly with this method, hope this help

As one person mentioned, enough is enough! Agreed! Decided to stop using all the critical addons with “magic” from HA dev team. It is all the time gambling will it work after any update. Setup a cluster of three VerneMQ instances with HAProxy as a load balancer.

The issue is that socat (which serves as auth point) is being continuously executed again and again!

/data # ps axu
PID   USER     TIME  COMMAND
    1 root      0:03 /sbin/docker-init -- /init /run.sh
    8 root      0:00 s6-svscan -t0 /var/run/s6/services
   36 root      0:00 foreground  if   /etc/s6/init/init-stage2-redirfd   foreground    if     if      s6-echo      -n      --      [s6-init] making user provided files available at /var/run/s6/etc...          foreground      backtick      -n      S6_RUNTIME_
   37 root      0:00 s6-supervise s6-fdholderd
   48 root      0:00 foreground  s6-setsid  -gq  --  with-contenv  backtick  -D  0  -n  S6_LOGGING   printcontenv   S6_LOGGING    importas  S6_LOGGING  S6_LOGGING  ifelse   s6-test   ${S6_LOGGING}   -eq   2     redirfd   -w   1   /var/run/s6/uncaught-logs-fi
  189 root      0:00 bash /usr/bin/bashio /run.sh
  246 root      6:14 socat TCP-LISTEN:8080,fork,reuseaddr SYSTEM:/bin/auth_srv.sh
  247 root     22:02 mosquitto -c /etc/mosquitto.conf
 5453 root      0:00 socat TCP-LISTEN:8080,fork,reuseaddr SYSTEM:/bin/auth_srv.sh
 5454 root      0:00 socat TCP-LISTEN:8080,fork,reuseaddr SYSTEM:/bin/auth_srv.sh
 5455 root      0:00 bash /usr/bin/bashio /bin/auth_srv.sh
 5459 root      0:00 ps axu
32379 root      0:00 /bin/sh

Yes working perfectly and fast! Thanks @frenck !

Thank you for the eloquent response! I did end up getting the 5.1 installed (forked the addons and downgraded the version) Unfortunately it didn’t fix the issue I was experiencing. So I’ve taken your advice! Backup and reinstall. Thank you

@DeviousVon You say “download an old version and starting over.” I’m assuming you mean Home Assistant? Note that the Mosquitto broker, the component with the problem we’re discussing here, is an add-on to Home Assistant, not Home Assistant itself. Even if you can download an old version of HA, you’d still need to be able to download and install an older version of the add-on (5.1 or earlier).

Different topic, “starting over.” If you do somehow manage to fix the problem by “downloading an old version,” it’s not necessary to start over. Just make sure you back up everything first. You’ll need access to Home Assistant’s filesystem – I use SMB.

  1. First do a complete snapshot of your current setup. The .tar file will be saved in Home Assistant’s backup folder. Save that file to an external location.
  2. Then for safety I’d recommend also doing a manual backup. Start by making copies of any/all .yaml files and saving those files to an external location, too. Depending on how your system is configured there could be many, but they’ll all be small in size. I have upwards of 50 individual .yaml files, but that’s because I’ve split my config up.
  3. If you have any Z-Wave xml files on your system (zw*.xml), back those up, too.
  4. If you have any Node-RED automations, manually back those up:
    • Open Node-RED
    • In the upper right, click the hamburger menu in the black border
    • Click Export | All flows | Copy to clipboard
    • Paste the clipboard contents to a text file and save to an external location
  5. Manually back up your lovelace config:
    • From any page in Home Assistant click the three vertical dots in the upper right and click Edit Dashboard
    • Click the same three dot menu again and then click Raw configuration editor
    • CTRL-A to copy everything to to the clipboard
    • Paste the clipboard contents to a text file and save to an external location

Sounds like a lot of trouble, but it’s really not. Takes 5-10 minutes.

So you’ll have the .tar file as well as individual .yaml (etc.) files once you’re done. After re-installing Home Assistant, first try copying the large .tar file to the same backup folder where you found it and doing a restore in Home Assistant: Supervisor | Snapshots. If anything fails or you otherwise have problems, you’ll have the manually-backed-up files to turn to. Just more or less reverse the steps you used to back up everything.

It’s not necessary to start over. You can easily go from brand-new install to fully restored config in well under an hour.

Enough is enough. Either take 5.1.1 down or fix it! At the very least provide a back out or workaround method that doesn’t require going back to a snapshot.

----- Original Message -----

From: “poudenes” notifications@github.com To: “home-assistant/addons” addons@noreply.github.com Cc: “adamf663” adam.github@thefelsons.us, “Author” author@noreply.github.com Sent: Saturday, March 6, 2021 7:50:31 AM Subject: Re: [home-assistant/addons] Mosquitto 5.1.1 is broken. (#1887)

This is my suggestion to anyone who is using Mosquitto inside of there HA instance. I would suggest using or setting up a MQTT broker outside of the HA and linking it inside HA manually using yaml and the ip address.

https://www.home-assistant.io/integrations/mqtt/

3 Days I changed everything from inside to outside. Using a RPi3 for a MQTT Broker (Moquitto) to much power, can move to RPi Zero I guess. And everything is blasting fast again!!! Even faster then before the brick!

– You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/home-assistant/addons/issues/1887#issuecomment-791965802

Same here only rollback from old snapshot restored the mqtt function

Would be a nice feature to get a version dropdown by the addons so a rollback would be easier for everyone

Thank you @frenck - great job!

Yes it is working again, well done!

After a couple hours on 6.0.1, I’m happy to report zero slow down issues. Very quick response.

I did some quick testing and initial filing is that it works as it should now. MQTT client connects really fast and gets all it’s topic at the same time.

Motion and button responses are also way faster then before, the same as with deprecated mqtt client from community addons.

There are also no reconnect issues in home assistant with MQTT.

Great work @frenck

I have noticed MQTT being slow on both a Raspberry Pi3B+ and a VirtualBox virtual machine (same snapshot) Something must be wrong with MQTT as between a flick of a switch in the dashboard and the actual toggle of the lights there are sometimes 2 to 5 seconds, but then, after the first time, it is instantaneous I don’t know if other errors in my log are relevant, but I’ll leave them anyway

2021-05-04 09:32:15 WARNING (MainThread) [homeassistant.components.mqtt] No ACK from MQTT server in 10 seconds (mid: 47) 2021-05-04 09:32:15 WARNING (MainThread) [homeassistant.components.mqtt] No ACK from MQTT server in 10 seconds (mid: 48) 2021-05-04 09:32:15 WARNING (MainThread) [homeassistant.components.mqtt] No ACK from MQTT server in 10 seconds (mid: 49) 2021-05-04 09:32:15 WARNING (MainThread) [homeassistant.components.mqtt] No ACK from MQTT server in 10 seconds (mid: 50) 2021-05-04 09:32:15 WARNING (MainThread) [homeassistant.components.mqtt] No ACK from MQTT server in 10 seconds (mid: 51) 2021-05-04 09:32:15 WARNING (MainThread) [homeassistant.components.mqtt] No ACK from MQTT server in 10 seconds (mid: 52) 2021-05-04 09:32:16 ERROR (MainThread) [aiohttp.server] Error handling request Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/aiohttp/web_protocol.py", line 422, in _handle_request resp = await self._request_handler(request) File "/usr/local/lib/python3.8/site-packages/aiohttp/web_app.py", line 499, in _handle resp = await handler(request) File "/usr/local/lib/python3.8/site-packages/aiohttp/web_middlewares.py", line 119, in impl return await handler(request) File "/usr/src/homeassistant/homeassistant/components/http/security_filter.py", line 56, in security_filter_middleware return await handler(request) File "/usr/src/homeassistant/homeassistant/components/http/request_context.py", line 18, in request_context_middleware return await handler(request) File "/usr/src/homeassistant/homeassistant/components/http/ban.py", line 74, in ban_middleware return await handler(request) File "/usr/src/homeassistant/homeassistant/components/http/auth.py", line 135, in auth_middleware return await handler(request) File "/usr/src/homeassistant/homeassistant/components/http/view.py", line 131, in handle result = await result File "/usr/src/homeassistant/homeassistant/components/websocket_api/http.py", line 43, in get return await WebSocketHandler(request.app["hass"], request).async_handle() File "/usr/src/homeassistant/homeassistant/components/websocket_api/http.py", line 142, in async_handle await wsock.prepare(request) File "/usr/local/lib/python3.8/site-packages/aiohttp/web_ws.py", line 135, in prepare payload_writer = await super().prepare(request) File "/usr/local/lib/python3.8/site-packages/aiohttp/web_response.py", line 378, in prepare return await self._start(request) File "/usr/local/lib/python3.8/site-packages/aiohttp/web_response.py", line 386, in _start await self._write_headers() File "/usr/local/lib/python3.8/site-packages/aiohttp/web_response.py", line 458, in _write_headers await writer.write_headers(status_line, self._headers) File "/usr/local/lib/python3.8/site-packages/aiohttp/http_writer.py", line 119, in write_headers self._write(buf) File "/usr/local/lib/python3.8/site-packages/aiohttp/http_writer.py", line 67, in _write raise ConnectionResetError("Cannot write to closing transport") ConnectionResetError: Cannot write to closing transport 2021-05-04 09:35:54 WARNING (MainThread) [homeassistant.components.mqtt.mixins] JSON result was not a dictionary 2021-05-04 09:35:54 WARNING (MainThread) [homeassistant.components.mqtt.mixins] JSON result was not a dictionary 2021-05-04 09:40:47 WARNING (MainThread) [homeassistant.components.mqtt.mixins] JSON result was not a dictionary 2021-05-04 09:40:48 WARNING (MainThread) [homeassistant.components.mqtt.mixins] JSON result was not a dictionary 2021-05-04 09:50:51 WARNING (MainThread) [homeassistant.components.mqtt.mixins] JSON result was not a dictionary 2021-05-04 09:50:51 WARNING (MainThread) [homeassistant.components.mqtt.mixins] JSON result was not a dictionary 2021-05-04 10:29:39 WARNING (MainThread) [homeassistant.components.mqtt.mixins] JSON result was not a dictionary 2021-05-04 10:29:40 WARNING (MainThread) [homeassistant.components.mqtt.mixins] JSON result was not a dictionary 2021-05-04 10:55:18 WARNING (MainThread) [homeassistant.components.mqtt.mixins] JSON result was not a dictionary 2021-05-04 10:55:18 WARNING (MainThread) [homeassistant.components.mqtt.mixins] JSON result was not a dictionary 2021-05-04 11:12:32 WARNING (MainThread) [homeassistant.components.automation.new_code] New Code: Already running 2021-05-04 11:32:46 WARNING (MainThread) [homeassistant.components.mqtt.mixins] JSON result was not a dictionary 2021-05-04 11:32:46 WARNING (MainThread) [homeassistant.components.mqtt.mixins] JSON result was not a dictionary 2021-05-04 11:47:10 WARNING (MainThread) [homeassistant.components.http.ban] Login attempt or request with invalid authentication from vodafone.broadband (192.168.0.1). (Mozilla/5.0 (iPhone; CPU iPhone OS 14_5 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Home Assistant/2021.4.1 (io.robbie.HomeAssistant; build:2021.115; iOS 14.5.0) Mobile/HomeAssistant, like Safari)

You’re welcome. When you do a Snapshot restore in HA, you have the option to disable portions of what’s to be restored from the Snapshot. This is how a person could revert to 5.1. After installing the buggy 5.1.1, make a backup of everything (for safety), and then use an earlier Snapshot to restore only the 5.1 version Mosquitto broker. That process won’t work if you don’t already have a Snapshot with 5.1 on it, of course, but it’s good to know it’s there for the next time.

Unfortunately I don’t have the ability to roll back as all my snapshots were partial (that won’t happen again). I’m unable to add any new devices (Tasmota) even though it looks good in the console. Frustrating. Really not sure what to do about it. I’ve considered installing mosquitto on a different machine, but I’m using HA authentication. I’ve not been able to fix it with the uninstall reinstall thing others are saying works, it just goes back to not working. I’m willing to work with anyone that can address it, I’m unfortunately not a developer… At this point I’m considering downloading an old version and starting over (this is a LAST resort as I’ve put a lot of time into HA). It sucks having to stand up and walk ALL the way across the room to turn lights on and off.

I also had to revert to 5.1 to have a functional setup (otherwise I had crazy delays each time I was switching on a light). This seems like an important issue. And I am afraid, there is way more than you might think between the 5.1.1 update. Unfortunately, I don’t see this issue getting a lot of attention. Is there anything we can do to clarify or reproduce the issue? How can we help getting this solved?

There are several options mentioned in this thread.

‘docker pull homeassistant/aarch64-addon-mosquitto:5.1’

@hung2kgithub has a solution with an own repository somewhere i the middle of this thread

please just check the whole thread if you need a solution for you own situation, also remember to backup and maybe disable auto update on production systems. awareness for such points mostly has to be learned the hard way

For those who want to rollback 5.1:

  • Backup and uninstall mosquitto 5.1.1
  • Fork mosquitto repository, edit “version”: “5.1.1” to “version”: “5.1.” in config.json
  • Add this custom repository in the supervisor’s add-on store and install I installed mosquitto 5.1 newly with this method, hope this help

hey i tried this but it didnt allow me to add the fork as a custom repo

I’m really struggling for days with this issue since upgrading to 5.1.1, I’ve all my z-wave devices that are unavailable. I think I’ve tried every solutions mentionned here to get it working without success. Unfortunately, no back-up old enough to get me back with the 5.1 version. I’ve tried to fork the repository but then add it does not work at all. I’m stuck. Countless hours lost to solve this issue… Too many automation in the house so this is huge problem. As it’s been week that the problem is around, I do suppose it will not be solved by an upgrade. If someone can point me out how to get back to 5.1, that would be very great. Thanks in advance.

I know it is a long shot, but do you have authentication enabled for the MQTT broker? I had problems with brokers with authentication disabled, I have no clue why, but connection took very long time. BTW, I am tracking this issue since on 5.1.1 I have higher CPU usage, but that’s all, no delays, MQTT explorer connects within 2 seconds. My broker was upgraded from 5.1 to 5.1.1 when it was released, was never tried to reinstall it.

I am noticing slightly higher latency with this new version as well. A typical work flow of mine is turning on a zigbee light from a zigbee remote. What is weird is if I haven’t turned a light on/off for some time, when I first press the on or off button on the remote, there is a noticeable (~1s) delay before the light responds. But, having just used the remote, if I then press a button a few seconds later, the response is almost instantaneous (as was always the case with 5.1).

Not sure if there is something with the new version that causes the MQTT server to go to sleep, resulting in that first action being slower. Or, if there is a different timeout on the connection between zigbee2mqtt and the broker, resulting in zigbee2mqtt having to reconnect when the first action is submitted in some time.

can confirm that, upon cogneato’s suggestion in Discord, deleting the 5.1 add-on, (copying the config) and re-installing (the now new 5.1.1) Add-on with the copied config, everything is running smoothly. NO errors in the log, and all topics are live.

So, don’t update, but re-install which gets you the new version (and rewrites the mosquito.db) which seems to be the issue. which essentially is what @christoph-luebbe said here https://github.com/home-assistant/addons/issues/1887#issuecomment-802234693

Sigh… Rolled back to 5.1, and still can’t get my Sonoff Zigbee bridge to work 😦

Probably the “only” is missing, as @chumbazoid is certainly not the only one: I’m also using HA Supervised on Debian 10, but have no time to experiment with removing-reinstalling MQTT (and also use a few topics to store information with retained messages), I’m fine with 5.1 for now.

@chumbazoid I believe you are the only person on debian+supervised to report being affected by this. (Although most people unhelpfully don’t say)

Guys it’s 2 weeks since the issue has been identified and confirmed that v5.1 is working properly why v5.1.1 hasn’t been rolled back yet? I assume more and more people are being affected by this issue.

No snapshot here. Is there any way to use command line to downgrade to 5.1?

ha addons install ???

thx.

@JoJa1101 How did you roll back?

I made a snapshot on my HASS Testsystem (was still running on Mosquitto 5.1) and did the partial rollback on my Prod System.