core: Broadlink Component - Handling Of Communication Errors
The problem
It appears that the Broadlink component will not process commands after the remote device is marked as unavailable.
Environment
- Home Assistant Core release with the issue:
0.112.0 - Last working Home Assistant Core release (if known): Unknown.
- Operating environment (OS/Container/Supervised/Core): Supervised
- Integration causing this issue:
homeassistant.components.broadlink - Link to integration documentation on our website: https://www.home-assistant.io/integrations/broadlink/
Problem-relevant configuration.yaml
remote:
- platform: broadlink
host: 192.168.2.144
mac: blah
type: rm4_mini
name: Servers Broadlink
sensor:
- platform: broadlink
host: 192.168.2.144
mac: blah
type: rm4_mini
name: Servers Broadlink
scan_interval: 60
monitored_conditions:
- temperature
- humidity
Traceback/Error logs
cat home-assistant.log | grep broad
2020-07-01 21:19:34 INFO (SyncWorker_36) [homeassistant.loader] Loaded broadlink from homeassistant.components.broadlink
2020-07-01 21:19:34 INFO (MainThread) [homeassistant.components.remote] Setting up remote.broadlink
2020-07-01 21:19:37 INFO (MainThread) [homeassistant.components.sensor] Setting up sensor.broadlink
2020-07-01 22:01:21 WARNING (MainThread) [homeassistant.components.broadlink.device] Disconnected from device at 192.168.2.144: Control key is expired
2020-07-01 22:02:17 WARNING (MainThread) [homeassistant.components.broadlink.device] Connected to device at 192.168.2.144
2020-07-01 23:31:47 WARNING (MainThread) [homeassistant.components.broadlink.device] Disconnected from device at 192.168.2.144: Control key is expired
2020-07-01 23:32:39 WARNING (MainThread) [homeassistant.components.broadlink.device] Connected to device at 192.168.2.144
2020-07-02 01:02:10 WARNING (MainThread) [homeassistant.components.broadlink.device] Disconnected from device at 192.168.2.144: Control key is expired
2020-07-02 01:02:10 ERROR (MainThread) [homeassistant.components.broadlink.remote] Failed to send 'fan only/server ac': The device is offline
2020-07-02 02:32:25 WARNING (MainThread) [homeassistant.components.broadlink.device] Disconnected from device at 192.168.2.144: Control key is expired
2020-07-02 02:33:20 WARNING (MainThread) [homeassistant.components.broadlink.device] Connected to device at 192.168.2.144
2020-07-02 05:23:55 WARNING (MainThread) [homeassistant.components.broadlink.device] Disconnected from device at 192.168.2.144: Control key is expired
2020-07-02 05:24:51 WARNING (MainThread) [homeassistant.components.broadlink.device] Connected to device at 192.168.2.144
2020-07-02 09:10:22 WARNING (MainThread) [homeassistant.components.broadlink.device] Disconnected from device at 192.168.2.144: Control key is expired
2020-07-02 09:11:15 WARNING (MainThread) [homeassistant.components.broadlink.device] Connected to device at 192.168.2.144
Additional information
- Note that the sensor module still operates every 60 seconds, which suggests the device was accessible after
2020-07-02 01:02:10(as can be seen by the WARNING logs, which is a known issue with minimal impact). - Restarting the device has no effect.
- Restarting HA seems the only mitigation.
- HA reports the device was unavailable since
2020-07-02 01:02:10, which suggests to me that being unavailable was the cause. Manually instructing HA to send commands has no impact when the device is unavailable. - Checking the logs of the Wifi Access Point, the broadlink disconnects and reconnects 20 times an hour. This appears to be normal (roaming from one access point to another).
- This is reproducible on demand.
- Send command to verify connectivity.
- Disconnect Broadlink device from power.
- Send command (which will fail).
- Reconnect and allow Broadlink device to power up.
- Send command and note that the command is never attempted (note that no communication error is logged and the device is marked as unavailable).
- Restart HA and note that sending commands is now possible.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 167 (72 by maintainers)
I cracked it.
The heartbeat message that comes in via the cloud is message type 0x01. The RM3 doesn’t actually care if this comes from the cloud, via WiFi unicast, broadcast, or whatever. So, as long as you send a packet type 0x01 at least once every 3 minutes, even via broadcast on the WiFi, the devices will think they’re connected to the cloud and stop rebooting. It doesn’t even care about the packet checksum.
So, pending better integration of this into python-broadlink and/or HA, the quick fix is sticking this into your
/etc/crontab:This broadcasts the heartbeat message every minute (substitute 192.168.7.255 with the broadcast address of your IoT network, of course). I think there is an additional timeout on top of the 3 minute cloud timeout, so I’m currently checking to see if we can afford to send it less often. (Edit: nope, needs to be every ~3 minutes; 4 minutes after stopping sending the packets both of my RM3s rebooted on exactly the same second.)
Filed mjg59/python-broadlink#458 for adding support to python-broadlink.
(as for @felipediel unless he provides a packet log to prove no reconnects/DHCP request, or otherwise show something else he’s doing to trigger the device keepalive code, I’m just going to assume his devices are either successfully hitting the cloud, or rebooting like everyone else’s, and he just isn’t aware).
FWIW, I’m seeing reconnects every 3 minutes here too, on two RM3 minis. I suspected the “I can’t talk to the cloud so I’ll restart” cause too… I’m now trying things to see if I can convince it to give up.
So far:
At this point I’m going to have to let them talk to their cloud service to see what they actuallly want, but it’s clear that none of the obvious blocking solutions are working here.
For the time being I’m limited to remote access to the Raspberry Pi running Home Assistant. I intend to try the code once I’m able to, but in the meantime I have tried running the nc-command and running your code in a custom HA addon, but none of the solutions seem to work. The broadlink is still disconnecting every hour.
I noticed that busybox nc does not include the -b flag, but I’ve tried this without any success:
echo -ne '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0' | nc -uv 192.168.1.80 80 -w 1I also made these three files, put them into the addons/ directory and started the custom addon, but that also didn’t seem to work. As far as I can tell this is the same as running your code above: config.json:
Dockerfile:
main.py:
Ok, things make more sense now. I actually had patched python-broadlink months back to fix the discovery issue with the incorrect binding, which as @litinoveweedle said is not a VLAN issue but just a general multiple interfaces issue (I was actually only using discovery on my laptop during configuration with the BroadlinkProv AP method, so I didn’t have to use the Broadlink app, and my laptop doesn’t use VLANs). The devices themselves only have one interface so this kind of problem does not apply.
@felipediel my devices recover quickly (2-3s) during reboots; I’m actually on a somewhat old HA/integration version so I’m not saying the current version can’t work around it with retries and timeouts. But of course it’s not ideal that the devices do this, which is why I wanted to find a way to solve the root cause. To me it seemed that you were saying that your devices weren’t rebooting at all and that this somehow had to do with VLANs.
Even with retries and timeouts and such, I expect there will be impossible to fix race conditions (for example, if you send it a command and it acknowledges it, but then reboots and cuts off the IR transmission; not sure if this specific one can happen but this kind of edge case is quite likely), so it is much preferable to stop them from rebooting.
2 seconds every 3 minutes is about 1% packet loss; you seem to be getting about 0.25% packet loss, so maybe your devices recover faster than mine and this is why the problem affects you less than other people. Of course for some people the devices will recover more slowly due to their router/AP/DHCP server being slower, or the WiFi being more congested, or even maybe something like the Broadlinks scanning WiFi channels in order so that people on channel 1 get faster reconnects than people on channel 11 😃 (in fact, it pretty much needs to spend at least 100-200ms per channel during a channel scan to catch a beacon, so for 11 channels that’s… 1-2 seconds!) Edit: and yeah, I’m on channel 11.
There is a misunderstanding here. I don’t have access to the logs. I don’t know if my devices are reconnecting every 3 minutes. I am using a simple ISP router with poor interface. What I said is: I blocked outbound connections from these devices with a drop silent rule (the only option I have) and I don’t miss any updates in Home Assistant. They are completely isolated and everything works fine. I can also ping them rock-solid for a long time:
I am not saying the VLAN is the main problem. The root of the problem is definitely the successive reconnections when the device cannot reach the cloud. What I was trying to understand here is why my devices recover quickly and yours don’t.
@litinoveweedle I recently fixed an issue involving Broadlink devices and VLANs. It was related to socket.bind(). This is why I am insisting on this. Perhaps we are binding the socket to the wrong interface when we try to reach an offline device. But if you tested and the VLAN is not the problem, great, thank you. I believe in you. Now we are cooperating again. Did you check the logs in Home Assistant? This is what I wanted you to test.
I know that testing can be boring at times. I usually do my own tests, but when I don’t have the same tech as the person I’m helping, I have to ask. I don’t like when I ask for a test and people write a thousand line text saying that I’m stupid and the test won’t work. It is not meant to work, we are just gathering information. I usually do the logical thinking after the tests. So I think that’s why we started off bad. I had a stressful week, so I’m sorry if I was rude at some point. I am just tired.
@marcan I was starting to find you annoying, but this is really great news. I think I underestimated you. Thank you, you did a great job and earned my respect. I am adding this to the list of things I will bring to Home Assistant 🚀
I’m following your #36914 pr with excitement. 😹
Is there currently any logic to recheck devices? In this case 9 hours passed without any re-attempts (commands to the device were executing every 10 minutes).
Sorry to add more to the thread, but to add to what was already discussed, it appears the newest rm3 minis on the market right now will constantly reset internally to try and connect to the cloud when they are being blocked by a firewall. Thus you see these messages - only my new rm3 mini does this. The other (2) older models don’t behave like this.
2021-05-12 16:09:30 ERROR (MainThread) [homeassistant.components.broadlink.updater] Error fetching Marks Office - IR Remote #2 (RM mini 3 at 192.168.0.126) data: [WinError 10054] An existing connection was forcibly closed by the remote host 2021-05-12 16:27:33 ERROR (MainThread) [homeassistant.components.broadlink.updater] Error fetching Marks Office - IR Remote #2 (RM mini 3 at 192.168.0.126) data: [WinError 10054] An existing connection was forcibly closed by the remote host 2021-05-12 16:45:36 ERROR (MainThread) [homeassistant.components.broadlink.updater] Error fetching Marks Office - IR Remote #2 (RM mini 3 at 192.168.0.126) data: [WinError 10054] An existing connection was forcibly closed by the remote host
I’ll see if allowing it to the cloud stops the errors, but then I’m concerned it won’t work with HA (which was my original issue). It seems the new rm3 minis suffer from this particular issue.
An update for the support of the last versions of Apple tvOS took something like 5-6 months to end up in an official Home Assistant release, this is why I asked… It will probably be the same for your new feature, let’s just be patient about it.
Regarding the workaround you’re talking about, I indeed have this configuration with 2 routers. The router A already had the UPnP disabled, while the router B has it enabled (for Home Assistant to gather the network throughput data) and the router B denies all internet access to the Broadlink RM Mini 3 (which I would like not to change). And the problem is still there, the heartbeat seems to me the best workaround, without modifyind the Broadlink firmware and removing the rebooting process.
As it is not a too painful problem, I’ll just wait for your code to be added to the main branch, I’d prefer avoiding tweaking my Home Assistant install too much, in case of a possible future reinstall. But thank you very much for your quick answer!
Felipe, I’ll try the update once it become available. It may just be that I did something wrong. Let me know when it’s part of the main code, I’ll update and try again.
Yes, I can confirm that this issue persist with AND without UPnP. I’ve even tried multiple RM Pro+ I have lying around and act the same. No other device has any noticeable issue on the network. It’s in my parents home, so I’m limited by the ISPs router with barely any settings and for the time being it’s hard to devote time to debug the issue further unfortunately.
I don’t have UPnP enabled anywhere 😃
(non-bold config tree entries are empty/nonexistent, and by default that means off for these services).
I did some grepping and didn’t see any mention of FastCon in the RM3 firmware, so I get the feeling it isn’t supported in these devices.
I’ve looked through the firmware codepath for the watchdog, and I didn’t notice any branch that could stop it from triggering other than the aforementioned keepalive packet (which is how I found it).
The main network handling thread (including packet rx, reconnections, and the watchdog) is a big infinite while loop that ends like this:
The only thing that sets the app network status to 8 is receiving the keepalive packet, and that gets cleared by a different branch of the code earlier after another timeout.
Just to restate this somewhat, it’s a watchdog timer. The purpose is to reboot the whole device if something goes wrong and the cloud connection is down for too long. This makes perfect sense for people who want to use that feature, and I’m sure fixed a bunch of problems those users had. Treating
icmp-admin-prohibitedas “go away” wouldn’t be the most reliable way of fixing this for local users like us, because that’s still a protocol feature that can be accidentally triggered due to misconfiguration/etc, even transiently. IMO what they should’ve done is have an outright persistent configuration setting (e.g. as part of the Wi-Fi association packet) that just turns off the cloud stuff - not just the watchdog, but the connection attempts too, to save power and Wi-Fi traffic. But then again we all know Broadlink isn’t interested in people not using their app/cloud stuff… so we’re on our own here.(Not arguing with you, just giving my opinion on how I’d handle this; it won’t happen anyway so it’s kind of moot 😃 ).
I hate to hit on this point again, but to the Broadlink, there is a single network here. My Broadlink devices are attached to an SSID that has a single IP subnet on a them. As far as the devices are concerned it is not different from any other dumb router with a single SSID. The fact that there are VLANs on the wire behind the AP, or that other SSIDs are being broadcast too, is not something the device can differentiate. It is not different than two neighbors with disconnected, unrelated SSIDs.
What the Broadlink devices do is simple. They reboot. The same function gets called that also gets called when there is a fault or some other unrecoverable condition. It’s the same thing as unplugging them and plugging them back in. I’m staring at the decompiled firmware here 😃
What do you mean by “created another network”? Created an isolated, standalone Wi-Fi network with a different SSID? Or bridged together two subnets into the same L2 network via Ethernet? In the latter case, did you have one or two DHCP servers?
If you have two IP subnets on the same network with separate DHCP servers, then of course devices are going to fight over which one they get an IP from, which is going to cause failures and timeouts. This is natural, it’s not a device bug, that would just be a broken network setup.
If you mean creating a separate isolated SSID that the device has no knowledge of and ought to ignore, then the only reason that would slow things down is due to radio congestion. But then we’re back to this having to do nothing to do with binding or VLANs, it’s just a general “too much stuff on the air makes Wi-Fi slow” issue, which applies regardless of whether it’s the same person running two SSIDs, or just your neighbors. I mean, I run 2 SSIDs on 2.4GHz myself, but obviously my neighbors have some networks too 😃
This is surprising. I don’t think they would share info via any back channel, and I would think they only support a single set of Wi-Fi configs, but maybe they can have multiple? In that case it would make sense that after a reboot, they would end up associating to a network at random, of all the configured ones.
I’ve never connected my units to a network that isn’t the single IoT one they are supposed to be on, so this doesn’t apply to me…
Edit: did you use SmartConfig to set up the new WiFi network, or AP mode? SmartConfig is a broadcast based solution, so it is possible that devices that you didn’t intend to configure picked up the network details too…
Certainly, this is possible. It could also be an unrelated issue though (not the devices rebooting, but the bad DNS somehow causing something else to make HA not be able to talk to the devices - bad DNS tends to have many weird effects). We’d need packet logs to be sure of what happened…
Really, this is why I keep hammering on packet logs - because it’s very easy to end up drawing strange conclusions from just doing trial and error experiments and observing whether the devices are stable or not. But once you have a packet log, you know exactly what is going on.
There’s actually an advantage to doing this. This way the watchdog timer benefits us too. In other words, if the devices crash or the Wi-Fi AP does something weird or connectivity breaks for some reason, the keep-alives will cease to arrive and the device will reboot… which is exactly what you want. We will be taking advantage of the watchdog mechanism to increase reliability and automatic failure recovery in HA setups too.
Small update. To my surprise I found out that I can change the IP/DNS now from the Home Assistant UI. I’ve tried changing my DNS from my ISP to 1.1.1.1 and for some strange reason it seems to be way more stable:
I’ve tried upgrading to fw 57 using the Broadlink app and now it disconnectes every hour at 38 minutes instead, but it’s only unavailable for 1 minutes instead of 5-6. Still annoying, but I didn’t get any time debugging this further.
@felipediel I actually know about that bug, as it was affecting me as well. But it was not about VLANs (i.e. multiple dedicated LAN segments = L2 level of ISO/OSI model), but it was about wrongly selected IP of local HA interface for outgoing packets (i.e. multiple IP interfaces on the HA and/or multiple IP subnets on the same interface on HA = L3 level of ISO/OSI model)
I know this previous issue was solved in the 0.15.0 version of python_broadlink library as from that one discovery of Broadlink devices works OK. But this problem was completely different, although I know it could be confusing.
From the time I applied fix with keep-alive packets as discovered by @marcan my Broadlink devices are not disconnecting/reconnecting and therefore there are no more error messages in the HA log.
So to summarize this issue:
Therefore I would like to ask you to accept proposal in mjg59/python-broadlink#458 - add into the library code to periodically generate keep-alive packets with payload as suggested. This would prevent Broadlink devices with no internet access from disconnecting/reconnecting/rebooting or whatever they do. 😃 Thank you.
@litinoveweedle if
echois troublesome for you, you can try perl:perl -e 'print "\0" x 38 . "\1" . "\0" x 9' | nc <...>You’re getting a literal \x01 in the packet, so the
echopart is also different for you. Maybe you’re using a different shell? It’s supposed to be a 0x01 byte. There’s also a \n at the end and -ne at the beginning, so it looks like your version of echo doesn’t support the required options. I think the bash built-in echo should work, maybe this is a dash thing?@felipediel “Disabling” my VLAN isn’t going to do anything because, as multiple people have told you several times, VLANs are completely transparent to WiFi devices and they can’t know nor care whether VLANs are in use or not.
Having one access point with VLANs connected to a host is literally equivalent in every way to having two separate access points with no VLANs connected to two network cards on the host. The WiFi devices cannot tell the difference. That’s how VLANs work. That’s the whole point of VLANs.
There is literally no way, shape, or form, for the Broadlink devices to know they are on a VLAN. They transmit and receive exactly the same packets. Every single bit. The same IP addresses. The same broadcast addresses. The same MAC addresses. VLANs make no difference. The only sides that are aware of VLANs are the wired devices that are VLAN-aware and used on tagged networks (which in my case includes all my switches, my AP, and my server).
If I could push a button and “disable” VLANs I would do it just to end this silly argument and prove that it doesn’t matter, but VLANs are a core part of how my home network works, and I can’t magically “disable” them. It’s not possible to do what I do without VLANs without literally sticking 5 ethernet cards into my server and having 5 times as many switches.
Yes, and I highly doubt Broadlink cares about users running Home Assistant and blocking their cloud service, so I’m not holding my breath that complaining to support will get us anywhere.
But what we’re trying to do here is find a solution that works today. You claim it works for you on v57. But instead of helping us by providing a packet log to show exactly what is necessary to make it work on v57, you are telling us the problem is “VLANs” without understanding how VLANs work. We can’t wave a magic wand and figure out what you’re doing to make it work. When something works in case A and not in case B then we need to understand what is different in both cases. You saw that everyone else happens to be using VLANs and wrongly concluded that they have anything to do with this. I am certain that you are wrong about VLANs, for the reasons I explained above, and which you can confirm if you study how VLANs work, how VLAN tags work, what a VLAN really does on the wire, and the fact that VLANs on the air over WiFi aren’t a thing that exists. So what I am asking of you now is, since we’re back to square 1 and we don’t know what works and what doesn’t, to help us by providing a packet log of your broadlink device, from cold startup through ~6 minutes, to show that it indeed doesn’t reboot, and figure out what data was exchanged that made it not do that.
@marcan I am not telling you to complain about VLANs. You can workaround the problem by disabling your VLAN if you want.
I am asking you to ask them to do this:
This is a simple and universal solution.
@Silvenga People deserve credit for their work, and that is completely tangential to being told they are wrong when they are. I’m sure @felipediel has put a bunch of time into this integration, but he isn’t being helpful right now by claiming the problem is something that makes no sense whatsoever.
That said, I’ve had improving the broadlink integration in my TODO list for a while now, in particular to specify a device-agnostic IR blasting mechanism to enable integration with complex-protocol/state-dump IR devices (e.g. aircons and my ceiling lights which work the same way), but having this kind of experience with the developers makes me lean towards just keeping it to myself rather than contributing…
@felipediel Look, I don’t know what to say any more. VLANs don’t have broadcast addresses. 802.1q VLANs are a way of putting multiple Ethernet networks into one physical cable. That is all they are. That is why they are called Virtual LANs. The only thing a VLAN does is make one cable behave like several separate cables. VLANs do not go over WiFi. Broadlink doesn’t care about VLANs. Support doesn’t care about VLANs. VLANs can’t cause broadcast address confusion. Just, please, read up on the subject and drop the idea that we need to complain to Broadlink support about some broadcast address issue related to VLANs.
The problem we have is the devices reset every 3-5m when they can’t hit the cloud. You claim yours does not. Please provide a packet dump if you are certain it is not doing that for you.
Thanks @Silvenga! I will create an options flow to configure polling in the future, so it will be easier for users to make adjustments without the need for a restart. After that, we can discuss what are the best values for each device and then we define better defaults.
@felipediel I think you’ve fixed this issue effectively. I don’t have issues, with at least my setup. So thanks a lot!
Maybe we should close this issue, and open a separate issue to gather more info on if the poll interval/method should be configured/changed?
I see this as the standard “hardware is inherently unreliable” problem. I don’t think we need to argue over if it’s happening or why it’s happening. It’s going to happen as a function of being wifi based hardware.
We really need to figure out the scope of the problem, what it impacts, and figure out solutions.
@felipediel has spent a lot of effort and his time on this code, plus many weeks, going back and forth in reviews. I feel this thread has shifted to an argument, felipediel deserves respect at the very least, if not gratitude.
You claim it works for you, yet it doesn’t for us. We’ve already tried everything you suggested to make it work. The next step in figuring this out is for you to give us a known-good reference. That means a packet capture.
Now you’re just being unhelpful, and deliberately ignorant. We’re telling you that’s not how VLANs work. You can look it up if you want.
There is no wrong interface. The device sees a single network. The device has one interface. The device does not have any idea what a VLAN is or what VLAN it’s on, because all it sees is a single 802.11 WiFi network and it is the access point’s job to deal with whatever is on the Ethernet wire behind it, be it plain Ethernet or 802.1q VLANs or an L2 tunnel over IP or anything else you might want to come up with. As far as the device is concerned it is on one network with one IP subnet and there is no confusion possible.
Asking support about VLANs isn’t going to go anywhere, because VLANs are completely irrelevant to these devices. You have latched on to the idea that us using VLANs is the problem without understanding how VLANs work, and all you’re doing now is derailing the conversation.
If you want to help us, please provide a full packet capture of everything your v57 device does on the network, so we can find out what to do to get it to stop rebooting itself after a cloud service timeout.
I already solved the issue. You just need to give their support team a link to this conversation. It’s not that I’m stupid, I just don’t have access to their firmware to fix it for you, got it?
Now I am 100% sure this is the problem.
I respect your job, but we are never too old to learn something new.
This is the expected behavior, but we are talking about a bug. They are binding the socket to the wrong interface.
If we’re playing the “simplest explanation” game… since apparently all of us are having this issue except @felipediel, my Occam’s Razor diagnosis is that his firewall might not be set up properly and he is, in fact, letting them talk to the broadlink cloud 😃
It’s clear these things really want to talk to the Internet; @felipediel if you truly believe it works fine for you and they don’t reconnect, then what we need to move forward is a complete packet log of a broadlink on wifi, from startup through 5-6 minutes, to see what it is that you’re doing that the rest of us aren’t that convinces it to not drop off. I’ve already tried everything I could think of (and have been looking at tcpdump as I did to prove I was doing what I thought I was).
Barring that, there’s two things to be done here:
@felipediel he’s right, please stop making stuff up about VLANs. VLANs behave the same as any normal isolated Ethernet network. The only correlation here is that there is a big overlap between the kind of geek paranoid enough to firewall IoT devices from the Internet and the kind of geek who happens to know about VLANs, and they are an obvious solution to this problem. Yes, I use VLANs too, and I am absolutely confident they have nothing whatsoever to do with this problem. Networking is one of my jobs, I know what I’m doing here.
As far as the devices involved are concerned, the Broadlink devices and one Ethernet (sub)interface on my Home Assistant server (which also handles DHCP/DNS/routing duties for this segment) are on the same isolated network segment, and the fact that VLANs are involved is completely irrelevant to them.
There was definitely smiley missing in my original reply, so please there were no bad intentions from my side. 😃 And I am definitely happy for any help, especially when as documented I am not only one affected by this, thank you for all… 😉
No problem.
Regarding your info, I had to recently temporarily disable firewall rule blocking my “smart devices” LAN subnet from internet. It is possible that RM were pushed FW upgrade. 😦
It is definitely not WiFi - signal are OK, and other devices connected to same virtual AP are not disconnecting. ONLY Broadlink devices ara disconnecting and they are doing it exactly after 5min. Hardly coincidence. All my devices have static DHCP lease. There is also information from other user about this behavior when Broadlink devices could not connect to cloud.
Anyway it seems that RMs work OK, except these 3-5sec disconnect periods exactly each 5min. If it would be possible to set higher timeout on Broadlink integration in HA, I would better to wait for device to reconnect if my command will fall into this disconnected time, than to allow these black beans to connect to some # cloud.
I will make the updates optional, so users who are having problems can disable them.
Just to shine some light to this issue, all RMs are constantly disconnecting/reconnecting to wi-fi IF they don’t have connection to internet. It seems to be some weird way of theirs watchdog implementation. As many users for obvious reason don’t allow smart home components used locally by HA to communicate to internet, this together with newly introduced feature is very annoying and seems to affect many users
Looking to my Mikrotik WLAN log, these reconnect attempts are rather short, about 3-5sec. It surely depend AP to AP, but I would like to propose on improvement of polling device state schedule.
I think that’s a great idea. For myself at least, having any recovery attempt would be the bees knees. If say, I updated my access points or switch firmware (automated, say at 2am, takes maybe 5 minutes), I should not need to restart HA to reconnect to devices. I would also find a command to mark the device as available as a good mitigation (that way it can be automated).
In this case, the network error looks to be during the time the broadlink device was roaming between access points, so connectivity would be restored within seconds. I haven’t found any other possible issues, the sensor was still responding after all, so I don’t think a substantial network error occurred.