core: Philips Hue Motion SML001 device becomes unavailable with ZHA.

The problem

Note! Please keep this issue on topic and concerning only Hue SML001 sensors becoming unavailable in ZHA.

This has been split off from https://github.com/home-assistant/core/issues/86231#issuecomment-1454922708 at the request of @puddly.

Around 2 weeks ago I switched from a Philips Hue hub based setup (using the integration) to a ZHA one using the Sonoff-E dongle.

Previously the Hue hub based system worked great - I’ve had the system running for multiple years, and for 1 year in my current house, I had no known issues with devices dropping off the network. In that time I didn’t know about or attempt to address issues with Zigbee interference and had the hub was in a very sub-optimal location.

I have 38 hue bulbs / lightstrips / playbars on this system with 14 SML001 motion sensors, 6 SML003 motion sensors and a handful of smart buttons. The house is 300m2 across 2 floors with brick & concrete construction (typical European construction).

Since moving to ZHA I’m dealing with (almost?) all SML001 devices disconnecting and becoming unavailabe. This often coincides with a change in motion state. The device will then be stuck in that state until I press the repair button and re-add the device in ZHA. I’m noticing it affect all SML001 devices, irrespective of location in the house.

The SML003 devices have no similar issues, I don’t think I’ve noticed any times when an SML003 becomes unavailable, even when used in the same room as unstable SML001 devices.

The dongle is on a 2m USB2 shielded cable about 1m from floor height and connected to the USB2 port. HA OS is running on a Pi4 and is up to date.

So far I’ve tried the following things to fix the issue:

  • moving the co-ordinator
    • from a similar location to where the happy hue hub was to one more inline with the usual recommendations (higher up, away from sources of EMF, more central to the network, fewer solid walls between the coordinator and the network), but the position is still not optimum.
  • changing to channel 20 (this is what the Hue hub was previously running on)
    • I did this by making a backup in ZHA, modifying the json and migrating the coordinator to the backup, after restarting ZHA it shows the updated channel and I then manually reset all of the devices and re-pair. To reset the bulbs I re-pair to Hue hub using the serial number and then delete them from the Hue hub to put them back into a pairing state, so a lot of work!
  • Changing to channel 25
    • Used the same method as above
    • I chose 25 after inspecting the Wifi channel usage using wifi explorer app and noticing that all of my 2.4 ghz traffic was on wifi channels 1-7 (I have an Eero Pro 6 mesh network which doesn’t allow changing the wifi channes, but usually uses channel 1 for 2.4Ghz, the mesh network consists of 5 access points).
    • I also only have 2 close neighbours so local Wifi traffic is minimal, usually I barely see any other wifi networks available to connet to other than mine.
  • Shutting down the network when all devices are connected to force Zigbee to heal
    • This didn’t help, I saw dropoffs within 10 mins of bringing the network backup
  • Changing the batteries in all of the devices.
    • I read that Hue doesn’t actually support rechargeable batteries, and I realised that my stable SML003 devices also had the stock batteries (since they were more recent purchases), so to test the hypothesis and eliminate batteries being the problem I purchased 30 Energizer max plus batteries and replaced all SML001 batteries. No changes.
  • Interference from baby monitors
    • I have 2 baby monitors in the house, I turned both off for a while, but still saw dropouts during this time.

I’m now at a loss about how to proceed. I’ve gone from a rock-solid Hue setup (I only migrated away to optimise speed of automations by having everything on one device with fewer intermediaries) to an un-usable ZHA setup - all of my lighting is automatic based on these motion sensors.

@TheJulianJes

You requested the following info in the thread I branched this issue from:

sw_build_id = 6.1.1.27575

current_file_version = 1107323831

I also have the following logs from when a sensor dropped off, the folowing are snippets form what seem to be key events, but I also attached the full log showing events just before and after the device becoming un-available.

The device which went offline is 0x9d47


2023-03-05 10:52:38.443 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received sendUnicast: [<EmberStatus.SUCCESS: 0>, 200]

2023-03-05 10:52:38.444 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received messageSentHandler: [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 50546, EmberApsFrame(profileId=260, clusterId=768, sourceEndpoint=11, destinationEndpoint=11, options=<EmberApsOption.APS_OPTION_NONE: 0>, groupId=0, sequence=200), 73, <EmberStatus.DELIVERY_FAILED: 102>, b'']

2023-03-05 10:52:38.444 DEBUG (MainThread) [bellows.zigbee.application] Received messageSentHandler frame with [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 50546, EmberApsFrame(profileId=260, clusterId=768, sourceEndpoint=11, destinationEndpoint=11, options=<EmberApsOption.APS_OPTION_NONE: 0>, groupId=0, sequence=200), 73, <EmberStatus.DELIVERY_FAILED: 102>, b'']

2023-03-05 10:52:38.446 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received changeSourceRouteHandler: [0x470b, 0x009d, <Bool.false: 0>]

2023-03-05 10:52:38.446 DEBUG (MainThread) [bellows.zigbee.application] Received changeSourceRouteHandler frame with [0x470b, 0x009d, <Bool.false: 0>]

2023-03-05 10:52:38.447 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'74f8b1a9d42abcf5c480ad7e'

2023-03-05 10:52:38.447 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8070787e'

2023-03-05 10:52:38.450 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received incomingRouteErrorHandler: [<EmberStatus.SOURCE_ROUTE_FAILURE: 169>, 0x9d47]

2023-03-05 10:52:38.450 DEBUG (MainThread) [bellows.zigbee.application] Received incomingRouteErrorHandler frame with [<EmberStatus.SOURCE_ROUTE_FAILURE: 169>, 0x9d47]

2023-03-05 10:52:38.450 DEBUG (MainThread) [bellows.zigbee.application] Processing route error: status=EmberStatus.SOURCE_ROUTE_FAILURE, nwk=0x9d47  

Then one second later this happens:


2023-03-05 10:52:38.935 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received trustCenterJoinHandler: [0x9d47, 00:17:88:01:08:67:2d:c4, <EmberDeviceUpdate.STANDARD_SECURITY_UNSECURED_REJOIN: 3>, <EmberJoinDecision.DENY_JOIN: 2>, 0xddcd]

2023-03-05 10:52:38.935 DEBUG (MainThread) [bellows.zigbee.application] Received trustCenterJoinHandler frame with [0x9d47, 00:17:88:01:08:67:2d:c4, <EmberDeviceUpdate.STANDARD_SECURITY_UNSECURED_REJOIN: 3>, <EmberJoinDecision.DENY_JOIN: 2>, 0xddcd]

2023-03-05 10:52:38.974 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'47ffb1a90d2ad86f19888c2dabdd85495c82269fadd4bc7e'

2023-03-05 10:52:38.975 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8520dd7e'

2023-03-05 10:52:38.977 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received incomingRouteRecordHandler: [0xddcd, 00:17:88:01:08:c6:1c:40, 192, -52, [0x4034]]

2023-03-05 10:52:38.978 DEBUG (MainThread) [bellows.zigbee.application] Received incomingRouteRecordHandler frame with [0xddcd, 00:17:88:01:08:c6:1c:40, 192, -52, [0x4034]]

2023-03-05 10:52:38.978 DEBUG (MainThread) [bellows.zigbee.application] Processing route record request: (0xddcd, 00:17:88:01:08:c6:1c:40, 192, -52, [0x4034])

2023-03-05 10:52:39.029 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'57ffb1a9702a522f9db92d2dabdd85499e4dea7660317e'

2023-03-05 10:52:39.029 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8610be7e'

2023-03-05 10:52:39.039 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received trustCenterJoinHandler: [0x9d47, 00:17:88:01:08:67:2d:c4, <EmberDeviceUpdate.DEVICE_LEFT: 2>, <EmberJoinDecision.NO_ACTION: 3>, 0xddcd]

2023-03-05 10:52:39.040 DEBUG (MainThread) [bellows.zigbee.application] Received trustCenterJoinHandler frame with [0x9d47, 00:17:88:01:08:67:2d:c4, <EmberDeviceUpdate.DEVICE_LEFT: 2>, <EmberJoinDecision.NO_ACTION: 3>, 0xddcd]

2023-03-05 10:52:39.040 INFO (MainThread) [zigpy.application] Device 0x9d47 (00:17:88:01:08:67:2d:c4) left the network

2023-03-05 10:52:39.040 DEBUG (MainThread) [homeassistant.components.zha.core.device] [[0x9D47](https://github.com/home-assistant/core/issues/SML001)](https://SML001): Update device availability -  device available: True - new availability: False - changed: True

2023-03-05 10:52:39.040 DEBUG (MainThread) [homeassistant.components.zha.core.device] [[0x9D47](https://github.com/home-assistant/core/issues/SML001)](https://SML001): Device availability changed and device became unavailable  

Later in the logs (1second later) the device again joins and gets kicked off, you can find those in the attached log file here.

I also attached the Logbook entry for when the device went unavailable.

Note! Please keep this issue on topic and concerning only Hue SML001 sensors becoming unavailable in ZHA.

What version of Home Assistant Core has the issue?

core-2023.3.1

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

sml001-unavailable.log

Screenshot 2023-03-07 at 20 14 40

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 20
  • Comments: 147 (56 by maintainers)

Most upvoted comments

This is exactly as it is written / formatted i my configuration.yaml file. And if you look at my configuration, yes it has philips ota update url included. Read everything.

zha: zigpy_config: ezsp_policies: TRUST_CENTER_POLICY: 0x0002 # ALLOW_UNSECURED_REJOINS ota: otau_directory: /config/zigpy_ota ikea_provider: true inovelli_provider: true ledvance_provider: true salus_provider: true sonoff_provider: true thirdreality_provider: true ikea_update_url: http://fw.test.ota.homesmart.ikea.net/feed/version_info.json

Unrelated response to above messages

i know @puddly asked not to report +1 but personally feel support ticket like this would benifit from escalation if it happens to alot of peaple and not just a select few and, this is the only way to let it be known.

There is nobody else to escalate it to, which is why I asked not to +1 unless you’re adding new information. I’m well aware of this issue and am testing various things to try to mitigate it, but this process takes time. If you have Zigbee packet captures of the device disconnecting, debug logs, etc. please contribute them. Otherwise, you’re repeatedly pinging nearly 30 people and are just adding noise.

I changed my Z2m (not ZHA but I think they have the same backing lib)

ZHA and Z2M share no code or libraries whatsoever: they’re written in different languages and use independent implementations to communicate with their adapters.

to Always Allow Joining mode

This is a major security hole, in addition to also “stealing” every new Zigbee device in the area. This isn’t a good idea.


Something causes these sensors to roam around. Normal Zigbee sensors do not do this unless they detect their parent router has disappeared or is losing packets. Anecdotally, my motion sensor hops between parent routers almost daily by scanning and then re-joining the network (looking at its parent router in the network visualization, mine is currently ignoring the bulb three feet away from it and today has decided to become the child of a light 30 feet away and through two walls). I’m able to replicate this behavior with a fully-updated Hue hub v2, Hue bulbs, and this sensor, so I’m fairly sure it’s firmware bug with this old generation of Hue devices.

These sensors should not be rejoining the network like this and it’s probable that some sort of minor network hiccups trigger this problem (e.g. environmental noise at the wrong time). Some people experience it constantly, others do not.

Very rarely (as in, once every two weeks for me), this motion sensor tries to re-join the network like before, but as a brand new device, without re-using the network key it already has. If the join is denied, the device gives up and will need to have its battery re-inserted to try again. This is when it stops working and becomes unavailable.

Allowing any random device to re-join your network without the network key is a security hole, so it is blocked by the firmware of most Zigbee 3.0 coordinators. That being said, you can intentionally weaken your network’s security by adding the following to your /config/configuration.yaml file (this only works for EZSP coordinators such as the SkyConnect, Yellow, Sonoff ZBDongle-E, etc.):

Before copy/pasting this configuration, please read the comment!

zha:
  zigpy_config:
    ezsp_policies:
      # WARNING: this effectively opens a permanent backdoor into your Zigbee network!!!
      # While it won't allow an attacker to passively capture your network key, it will
      # allow them to "rejoin" the network and the coordinator will blindly send it just the same.
      TRUST_CENTER_POLICY: 0x0002  # ALLOW_UNSECURED_REJOINS

It won’t stop the sensor from deciding to hop around, but for me this allows the device to successfully re-join the network when it decides to do so in an insecure manner.

I can’t believe this issue has been dragging on for a year now. Enough people have piled on whose symptoms/situations consistently point toward ZHA, not the various coordinators/their firmware.

Here’s my story:

Before: 2 Zigbee Networks:

  • Hue Hub v2 (ch15) with 5 SML001 (indoor) (also had 2 SML002 (outdoor) for a brief while) + 2 LTA001 White-Ambience bulbs + old/original Hue Bloom
  • ZHA/Conbee2 (ch25) with 2 SML002, 2 Tradfri Motion Sensors, 2 Aqara Door/Window sensors, 2 Aqara double-wall-switches (with neutral = router devices) + 2 Tradfri Routers

I’m an electronics engineer & 50+yo life-long geek. I’ve lost count of how many articles & youtube videos I’ve watched over the last couple of years about Zigbee. I’ve paid careful attention to Zigbee vs Wifi issues, zigbee router coverage. All 4 of my wired wifi APs are on ch1/6/11 (the 2 sharing a channel are 2 floors apart with thick brick/concrete, trust me they don’t see each other), and nothing else consistently nearby that’s above -90dB is on overlapping channels either.

The SML001 Motion Sensors (indoor) were of course pretty solid on the Hue Hub for many years.

The SML002 Motion Sensors (outdoor) were solid on the Hue Hub too, but knowing they were virtually identical to the SML001 (same firmware version too) and not currently mission-critical, I wanted to use them on the ZHA/Conbee2 network as a test for a future merge of these two zigbee networks. But they frequently went Unavailable, needing to be re-paired; this would happen at highly random intervals - after a day or two, to many weeks. Back then I didn’t even know you could re-pair via a specific router device (i.e. the nearest one).

Silly me thought the SkyConnect might fix the problem with the SML002’s on the Conbee2. It didn’t.

After: 1 unified ZHA/SkyConnect with all the above devices, ch25, Zigbee-only firmware.

I didn’t migrate; I deleted all the original Zigbee devices, changed Coordinator from Conbee2 to SkyConnect, then re-paired everything. Both Coordinators were on 1-2meter USB extension cables, then router devices added, then the next day add the battery devices, then a ritual sacrifice of first-born child.

Also added a Philips Hue Ensis pendant lamp & 2 more Aqara door/window, total 25 devices, 9 of them routers.

Result: Well, I’m here, aren’t I… All 5 SML001 and 2 SML002 are a “random” basket-case of Unavailableness, as to which ones will remain reliable and which will go Unavailable, and after how long (hours to many weeks). Infrakkinfuriating.

All other Hue mains-powered/router devices are fine, reliable.

All my other non-Hue devices are fine & reliable (with the exception of one of the Ikea/Tradfri Motion Sensors false-triggering several times per day).

Having recently discovered the ‘Add a device via this device’ ability, and used it to connect the Unavailable SML001/2 to nearby, or even not-so-nearby routers, Hue-branded routers and not, that doesn’t seem to have any influence on the problem. I’ll re-pair a SML001 to a nearby Hue bulb or the Ensis, and the next day I look at the map and it’s now on some other non-Hue router 🤷🏻‍♂️. And yeah, I’m finding these SML001s connecting to some of the worst / least sensible routers available, but again, not consistently.

I’ll try the trust-centre-policy hack out of desperation/curiosity/sunk-cost-fallacy, but not keen on the security implications.

Observations about this thread: Having spent the evening reading all of it (instead of some of you troopers who’ve slogged through it bit-by-bit over the course of a year), I understand @lougreenwood flipping the table and going with what works: Z2M (& unsubscribing from anything more to do with this thread).

It’s not Coordinators/radios. People with a range of Coordinators on ZHA that have worked fine for years have stumbled in here with this new symptom: ZHA seems to be the common factor (obviously exacerbated by something common to the SML001/2).

The SML001/2 are very common and very well regarded, so I hope this issue can get some more attention from people with bigger propeller-hats than me.

I documented the security hole that’s opened by enabling insecure rejoins:

# WARNING: this effectively opens a permanent backdoor into your Zigbee network!!!
# While it won't allow an attacker to passively capture your network key, it will
# allow them to "rejoin" the network and the coordinator will blindly send it just the same.

It won’t allow any new passive exploits (e.g. your network key is still broadcast every time a new device joins if you don’t use an Install Code) but it will allow an active attacker to read your network key. I would not enable this option if this is a concern.

@MattWestb

I’m sorry, I don’t mean to sound rude (but I do mean to sound frustrated! 😉), but you need to read & understand what I write because you’re saying things that directly contradict my real-world experiences and asserting that you are correct and my lived experience is wrong.

@lougreenwood Sonoff-E You need updating to the last best firmware if like have child handling working OK or all direct children is working as Zigbee HA1.x and many Zigbee 3 devices cant being direct paired to the coordinator. More info of the Zigbee 3 handling is here https://github.com/Koenkk/zigbee2mqtt/issues/13478.

No I don’t need to upgrade. Sonoff-E worked mostly fine in Z2M and in terms of day-to-day experience it worked completely fine, no dropouts, nothing - I likely would have continued using it if not for buying a Sonoff-P a few weeks before.

I think the best with many HUE devices is using then with the native hub then it shall have good handling of all there devices but dont care of other strange devices like Xioaomi / Aqara that making other problems.

I think use Z2M and don’t use crap hardware from Xioaomi / Aqara. Anyone who is asking for help in these threads implicitly doesn’t want to use Hue hub exactly because they’re trying to use ZHA.

I think if like going Thread / Matter then is HUE bridge the best then you can using it with all your HUE Zigbee device and Matter then they have getting all in place and you dont need baying all new from the beginning.

Again, if someone (i.e ME, said I was going to stop using Zigbee!) is using ZHA they implicitly don’t want to use Hue hub, so why recommend Hue hub?! Also Hue doesn’t even support Thread, so why compare Hue hub as a viable solution to Thread when trying to solve the problem of having a vendor agnostic IoT network.

Anyways, I can’t be bothered to reply anymore, I’m tired of getting spammed in my email with stories of users reporting the same issue after changing co-ordinator which leads to ZHA shitting the bed and then seeing the same old, tired reply of “parents, Hue bad, Zigbee 3, Z2M doesn’t work, firmware, Hue hub, source routing” - I understand that you’re trying to help and I genuinely appreciate your sentiment and goals, but I’m out.

So I’m going to turn notifications off and leave the thread unlocked so the pile of anecdotal evidence of ZHA being the issue in this situation can pile up for all to see.

👋

@Saxtus You’re running an old Z-Stack firmware (20210708). I’d backup and upgrade to the latest firmware on your coordinator: https://github.com/Koenkk/Z-Stack-firmware/tree/master/coordinator/Z-Stack_3.x.0/bin

I unfortunately have been unable to replicate this problem. However, I think I may see what the issue could be.

This sensor strongly prefers to join (and re-join) the network through extremely poor parent routers. Right now, it is connected through the worst possible router, on the completely opposite end of my home. No router is more distant:

image

I suspect this may explain why only some SML001 sensors are buggy, while others seem to work fine.

Can you check what parent router your sensors are using? There is currently a contrast bug with the Visualization feature when using the dark theme but with the light theme you can type in the model name (or your customized device name) and see the highlighted node in the graph:

image

To force the device to use a better parent router, can you try to remove the device from your network:

image

Then, move the device to where it will end up sitting (if it isn’t already there), permit joins only through the physically closest router, and reset the device? The default behavior is to allow joins through any router, but you can also selectively do this through a single one:

image

This will ensure the device picks this specific router as its parent (until it decides to jump to another one).

@techydude

Enough people have piled on whose symptoms/situations consistently point toward ZHA, not the various coordinators/their firmware.

ZHA/Z2M do not do anything on that level. They can both experience the same issues. I’ll go into why you normally don’t see this with Z2M futher below. In this case, the issue is a combination of the “coordinator firmware” defaults and the bad firmware Hue has on these first-gen motion sensors (and dimmers). It’s not a ZHA issue.

I’ll try the trust-centre-policy hack out of desperation/curiosity/sunk-cost-fallacy, but not keen on the security implications.

It’s only somewhat of a hack if you consider that “everyone on Z2M” uses it already (without knowing).

Z-Stack firmware allows unsecured rejoins by default. Since basically everyone uses a TI dongle with Z2M, all Z2M instances have that set by default, even if they don’t need it. You can even use a TI coordinator with ZHA, and since the Z-Stack firmware automatically allows unsecured rejoins, your sensors work without issues. And by the way, I think the Hue Bridge also allow for unsecured rejoins by default.

EZSP dongles like SkyConnect (with ZHA) are “more secure” though, as they do not allow unsecured rejoins by default. You have to use config option that puddly mentioned.

The SML001/2 are very common and very well regarded

Not really? SML003 and SML004 are fine, but SML001/SML002 have been causing issues since ages. There are multiple threads in Z2M repos regarding these issues (for first-gen Hue sensors and dimmers). As an example, see this 115 message thread. Z-Stack firmware was even changed multiple times to attempt to work around the issues caused by these bad sensors.

I hope this issue can get some more attention from people with bigger propeller-hats than me.

To clarify again, there’s no issue with ZHA. The firmware on the first-gen motion sensors has multiple bugs (jumping parents, joining via worst possible parent, rejoining with well-known key, …). EZSP coordinators with ZHA need this config option to work with these sensors, as ZHA/EZSP use more secure options by default compared to TI coordinators (with Z2M).

What does the OTA cluster’s current_file_version number look like for the older ones? For the newer ones, 6.1.1.27575 corresponds to 0x42006bb7 (1107323831). Also, try out the config I posted above: #89311 (comment)

I’ve implemented the config mentioned above. My sensor (version nr 6.1.1.27575) has been stable for a week! Before I had to reset it daily.

And… still stable after two weeks? Very curious!

it has been stable ever since!

I’m running 3 SML001’s in my network. 2 of them having the “unavailable” issue and are on sw_build_id = 6.1.1.27575, but the one that is working nonstop is sw_build_id = 6.1.0.18912. Now if I only could find firmware 6.1.0.18912 and try to downgrade to that version.

I changed my Z2m (not ZHA but I think they have the same backing lib) to Always Allow Joining mode … if ZHA has the same option perhaps it’s worth looking into. I read somewhere the hue devies drop and try to re-join and get denied if auto=allow-anybody-to-join isn’t on.

Let’s keep the “+1” replies to a minimum unless you’re adding new information or have debug logs/packet captures. You can react to the original issue.

I’m the same. hue motion sensors don’t stay connected, all other Zigbee devices are fine.

On Sat, 13 May 2023 at 7:23 am, zackslash @.***> wrote:

Same issue here using a Skyconnect. Consistent disconnects onn the SML001, all other devices on the network are hue bulbs and have no issue.

— Reply to this email directly, view it on GitHub https://github.com/home-assistant/core/issues/89311#issuecomment-1546315666, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ7W4CLY6Z3X77SKRPJWWELXF2S4ZANCNFSM6AAAAAAVS55MZU . You are receiving this because you were mentioned.Message ID: @.***>

I running my “Billy EZSP” on the stable version 6.7.8.0 build 373 and my test systems is normally using 6.10.3.0 and is also stable. For edg testing i running RCP with 7.2.2.0 with OTBR but the last is not 100% stable but working OK. Running on somthing not 6.7.8.0 or stable candidate 6.10.3.0 you is getting problem but many user is doing that. What i knowing have ZHA not making any large changes how staring and forming the network so the problem shall being the coordinator firmware or the network / devices.

Then one Zigbee coordinator is forming one network its setting the working mode and security policies and then the mesh network is more or less living its own life and shall not being any problem. But by adding one bad devices (like OSRAM Plug) you can killing the network completely if the device is bad behaving. As the mesh network is living its own life we can only configuring how the device is reporting and bindings and all other things is being made or the device and the network.

Device seen is being triggered then the system have detecting some communication from the device like reporting attributes or doing mac polling of its parent if its the coordinator so the device is living 😃)

In Zigbee end device is not sending debug reports to the coordinator the they is doing things (on commercial devices if cooking self you can have debug enabled). So we can only see what the device have done from information from the mesh network and its not easy understanding what is happening and with different Zoigbee stack implementations with all having bugs and some have making shortcuts or not flowing the spec we is on the jungle and cant see whats happening. How Routers and end device shall doing all things is in the Zigbee standard for getting the mesh network working OK and self healing then getting router problems and interference that can degrading the network.

ZHA was having one missing quirk for Xiaomi / Aqara devices that is making then leaving and it one very bad thing Xiaomi have doing in one Zigbee 3 certified device that is making it only working with there Zigbee GW. deCONZ and Z2M TTI coordinators is doing it in the firmware and ZHA is using it and for EZSP is being made in the radio lib so its working OK.

Yes Zigbee 3 is backward compatible but you is losing some futures and not Zigbee 3 device is not working well then they cant using end device timeout and pull control with its parent. You need reading the papers of Base-Device-Behavior-Specification and how the security is working and the different between versions for understanding how some things is very bad then its not working (like Z2M have disabling nearly all children handling and Danforss Zigbee 3vTRV is going in deep sleep or leaving the network). I can recording diving in this docks https://github.com/SiliconLabs/gecko_sdk/releases/download/v4.2.2/documentation.zip and then reading more online then need more info.

You was missing the working mode of HUE bridge V2 as you have seen in https://github.com/home-assistant/core/issues/89311#issuecomment-1497873069 is the working mode of it ZCB (Zigbee Control Bridge) and is not having one coordinator then the bridge is not having 0x0000 as short address and its one heratge from ZLL that is 100% supported in ZB3.

I have no problems with my 2 old Motion sensors only they is jumping around and i running EZSP for years and was dishing RapBee then it was to buggy for my production network. So if you like having them working well user one CC-2531 with HA 1.X firmware or the original HUE V2 bridge.

Ruining the network in real Zigbee 3 mode is very important for getting good performance and comparability with new sleeping end device but is tricky not braking some very bad behaving device no named (Xiaomi) but is working great in ZHA if doing all right and selecting devices with care.

Its some key things building network that is important and its good router as backbone and also forcing end device using them and not the coordinator (EZSP is made in config) and if all routers is not playing nice disabling source routing then it can being more mess of the network looks being the best was getting 150+ network working well in ZHA.

The original parent was the physically closest bulb. It’s been a few days since that capture and I just re-ran another one: now, the motion sensor is connected to the most distant bulb.

If it’s broken on a Hue-only setup, there’s nothing we can do in ZHA (or Z2M) to correct this behavior. I think it’s just a firmware bug and up to Philips to fix it (which I doubt they will for an old device like this).

I had a read through this thread and I take note of the issues being hinted at firmware/bugs/features of the Hue devices. I will stay on track and say my SML001 and RWL021 devices have the same issue consistently.
The thing is. I used ZHA with a Conbee II before swapping to a SkyConnect, and previously had used Deconz with the Conbee II. I never once had the issue with the Conbee II with Deconz. I had very small issues with the Conbee II with ZHA, but they were so rare as to be non-issues. I have only had major issues with the Hue SML001, SML003, RWL021 and RWL022 since I moved to the SkyConnect, and the issues are identical to what @lougreenwood have reported. I have not yet tried Zigbee2MQTT as I do not actually have a test setup, and this is production.
As a note, the Hue lights, some are newer BLE some are older without BLE, have never gone offline. It is only battery devices.

Mainly commenting so I can follow this thread, but thought I would share my experience also.

My 2 motion sensor have not leaving the network for over 4 years in ZHA with EZSP coordinator and only one have getting new battery under that time but it have many bad routers to jumping to if its like to do that.

FWIW, after 2 weeks with my experiment on Zigbee2MQTT, I do occasionally see in HA that devices go unavailable, but it seems that Z2M is somehow able to recover.

Aside from the lights / sensor in one room sometimes being slow to respond, everything has been mostly stable and my 14 SML001 are behaving as I expect. I sometimes see the red light when some detect motion.

My problems fixed when I updated my coordinator’s firmware to the latest, removed my devices and add them back.

@nashy008 this issue is specifically tracking problems with the SML001 and the few other battery-operated Hue devices of the same generation. If you’re experiencing issues controlling devices, this points to general network issues likely caused by interference.

FWIW, I’ve noticed troubles with my RWL021 Philips remote recently (not sure when it started maybe 2 weeks ago or a bit more). It would most of the time just not work, only showing the red led when pressing a button. But occasionally it did work. I didn’t bother looking into it, since I control the lights differently most of the time anyways.

Today I investigated: I realized that it picked a router which is across my flat! There are devices much closer, and those are also Philips. I then resetted the remote with a long press on the backside (without removing from ZHA) and added it explicitly to a close by device (a Philipps Aurelle, via triple dot on in the device info “Add devices via this device”). Now the device works perfectly again. After a while the visualization also updated and I see that it is indeed connected through that router. I’ll report in case it drops back to another/bad device again.

I use source_routing: true.

Thanks @MattWestb, I don’t mean to seem argumentative or pedantic, but if that’s the case and Hue bridge V2 is still ZLL, how is it possible to pair Z3 devices?

But in any case, Z2M continues to be stable - I finally have working lights and sensors again after 3 weeks of darkness (literal darkness 😉 - all of my lighting is automated, so broken sensors destroyed everything 😱).

I have a Sonoff Dongle Plus (P version) arriving today, but not sure If I’ll bother testing it since the E now seems happy and stable in Z2M.

So from that I can infer from the situation and all of the debugging I’ve tried, it seems that something in the ZHA stack is causing the issues, because when I switch to Z2M with the same stick, same devices, same positions of coordinator etc then everything works 💯 (so far, let’s give it a week before drawing any conclusions… but I never got more than an hour of stability from ZHA with these sensors, so I’m feeling optimistic…).

I have 2 SML001 and both is using IKEA 3 gen as routers but on the other side the apartment and have many better in the near 😦(. Firmware: 0x42006bb7 = SW 6.1.1.27575.

They is normally working well but have periods they is jumping around and finding very bad routers as parent.

I’ve had an SML001 (firmware 0x420049e0, sw_build_id = 6.1.0.18912) running for about three hours now joined through an IKEA router and it is working fine for me. I’ll update if I can get it to get kicked off the network.


@lougreenwood can you post a startup log for your coordinator? I’m interested in the lines that look like this (at the beginning of the log):

2023-03-08 15:34:22.864 DEBUG (MainThread) [bellows.ezsp.protocol] Send command version: (4,)
2023-03-08 15:34:22.867 DEBUG (MainThread) [bellows.uart] Sending: b'004221a850ed2c7e'
2023-03-08 15:34:22.870 DEBUG (MainThread) [bellows.uart] Data frame: b'0142a1a85e2805c0999c7e'
2023-03-08 15:34:22.870 DEBUG (MainThread) [bellows.uart] Sending: b'8160597e'
2023-03-08 15:34:22.871 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received version: [10, 2, 29200]
2023-03-08 15:34:22.874 DEBUG (MainThread) [bellows.ezsp] Switching to EZSP protocol version 10
2023-03-08 15:34:22.876 DEBUG (MainThread) [bellows.ezsp.protocol] Send command version: (10,)
2023-03-08 15:34:22.878 DEBUG (MainThread) [bellows.uart] Sending: b'7d314221a9542a1fac9d7e'
2023-03-08 15:34:22.880 DEBUG (MainThread) [bellows.uart] Data frame: b'1242a1a9542a1fb049e63e477e'
2023-03-08 15:34:22.880 DEBUG (MainThread) [bellows.uart] Sending: b'82503a7e'
2023-03-08 15:34:22.880 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received version: [10, 2, 29200]
2023-03-08 15:34:22.882 DEBUG (MainThread) [bellows.ezsp] EZSP Stack Type: 2, Stack Version: 7210, Protocol version: 10

I think there is some bug with parsing. This:

changeSourceRouteHandler frame with [0x340b, 0x0040, <Bool.false: 0>]

Should be:

incomingNetworkStatusHandler frame with [EmberStackError.ROUTE_ERROR_SOURCE_ROUTE_FAILURE, 0x4034]

So there is no source routing going on, it’s just a command that changed names between firmware versions. Neither command is actually handled so it doesn’t make a difference but the parsing is being done incorrectly for whatever version of EZSP your Sonoff stick is reporting it supports.

Thanks @MattWestb I’ll check this 👍.

But FWIW, every device is a Hue one, it’s a 100% Hue network.

One user that was running Z2M with Ti coordinator with 100+ devices was jumping ZHA and SC and was having some problems and the last thing for getting it working good was disabling source routing. Try putting source_routing: false in your (Z)HA config and restarting HA.

If some devices is not doing the routing request / response the system cant finding the requested device and you is getting SOURCE_ROUTE_FAILURE and commands is being lost in the network.