openhab-core: Bad implementations of ThingHandler.dispose() that are blocking

Note: I am posting this issue under openhab-core rather openhab-addons because it relates either a) to an underlying shutdown issue in the core, or b) to a non compliance of some binding developers to the OH coding rules.

Background: The Velux binding suffers from a particularly sensitive communication issue on shutdown: If the TCP socket between OH and the Velux KLF hub is ‘hard killed’ the hub goes into a zombie state; and it requires a specific and orderly shutdown sequence to prevent this.

Symptoms: I have been doing some tests, and found out the following…

From the time that the user issues a console command to stop the openHAB service, the OH framework takes a variable time of between 0 and 8 seconds before the Velux binding dispose() method is called.
From the time that Velux dispose() method is called, the actual Velux binding code takes a fairly stable time of 3.5 seconds for it to physically shut down the KLF comms.

I guess that our problem is the variable 0 to 8 seconds delay in step 1. above: – If the delay gets too long then the OH framework will already be trying to tear down its underlying TCP socket system (hard close) before the poor Velux dispose() method has had a chance to finish its soft close.

Obviously the OH framework calls dispose() on all its Things, UI, and other services, in no particular time sequence. Sometimes Velux dispose() gets called early in the sequence and everything is good; but sometimes Thing/service dispose() methods are called before Velux dispose(), and then there can be problems for us.

Particularly problematic are those Things which (against the openHAB coding rules) are blocking for a long time in their dispose() method. e.g. the ZWave binding dispose() blocks for more 5 seconds; the framework gives a warning after 5 seconds, but it probably blocks longer than that, so it is likely to be guilty of a large proportion, if not all, of the ‘problem period’ of 8 seconds…

Solutions: I see two potential solutions…

Force the developers of all bindings which are guilty of breaking the openHAB coding rules, by blocking in their ThingHandler.dispose() methods, to fix their code.
Implement a layer in OH core to de-serialize its calls to ThingHandler.dispose()…

https://community.openhab.org/t/velux-new-openhab2-binding-feedback-welcome/32926/1311?u=andrewfg

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 30 (24 by maintainers)

Most upvoted comments

I think we should find what causes the dispose methods of some bindings to take so long.

Yes, that should indeed be avoided - dispose() is expected to return quickly, so we should take care of this during reviews.

I don’t know if running the dispose code in a scheduler is a good solution.

I’d think this should be fine. We have the scheduler in the ThingHandlers, so submitting a (final) task to it is the obvious thing to do here.

The framework does report a log.warn if a binding takes more than 5000ms

Yeah, that’s our safety guard that we’d build in. Besides the warning, it will actually also make sure that dispose is further executed, while already proceeding with the next handlers, so that a single binding cannot fully bog down the system.

kaikreuzer on Aug 31, 2021

The Velux binding is the only binding I’m aware of that defers a dispose. We had a discusson on that, when the PR was reviewed. The only reason we need this quirk is the crappy Velux firmware. And they seem not willing to fix that. I don’t know if it’s reasonable to raise a change in the core due to this issue. Although, I’m able to relate…

I saw that you already added a workaround to the zwave binding. I think that’s the way to go, fix the bindings which take an unneccessary long time to dispose.

fwolter on Aug 23, 2021