ot-br-posix: "Operation refused for security reasons" when updating the same SRP host twice in a row

Describe the bug https://github.com/openthread/ot-br-posix/pull/1008 was supposed to solve an issue with duplicate SRP updates but it doesn’t help in my case. In fact, it can make things even worse in some scenarios.

Suppose that the following SRP updates are sent within a short period of time:

UPDATE1:
    EXISTING_HOST
        SERVICE1
UPDATE2:
    EXISTING_HOST
        SERVICE1

Since PublishServiceHandler and PublishHostHandler simply decrement the number of outstanding operations for a matching SRP update and the handlers can be called synchronously when calling PublishHost or PublishService, UPDATE1 will be considered completed after processing UPDATE2.EXISTING_HOST even if UPDATE1.EXISTING_HOST.SERVICE1 is still pending. Moreover, too early removal of UPDATE1 will invalidate a pointer to UPDATE2 and UPDATE2.EXISTING_HOST.SERVICE1 will be put in a wrong place in the memory. As a result, the SRP client will never receive a result of UPDATE2.

To Reproduce

Build and run OTBR Docker container using commit 685a1e0f483febc26127c8c2f2bd8f922b682a06.
Form a Thread network using default settings.
Build and flash Matter Lock onto a nRF52840 DK.
Press Button 3 on the DK to join the formed Thread network.
Reboot the DK. After each reboot the firmware adds a new SRP service and due to the issue described above, an SRP update may fail.

Expected behavior SRP updates succeed no matter how frequently they are sent.

Console/log output To be added.

Additional context I prepared a fix which is available here: https://github.com/Damian-Nordic/ot-br-posix/commit/08cb66612477eb5ceb0574b01e7069e855261807, so I can create a PR, but since it reverts https://github.com/openthread/ot-br-posix/pull/1008 I would like to first discuss if that’s a good direction.

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 15 (15 by maintainers)

Most upvoted comments

@simonlingoogle @wgtdkp I think that’s a good idea. I modified my change to also run all the handlers asynchronously and in the same thread, and tested that I couldn’t break the SRP by re-publishing the same services in a loop. I posted a PR with the change. Let me know if that’s what you meant.

Btw, I wonder if it would make sense to implement some kind of recovery mechanism in OTBR - currently when an update is somehow not handled correctly (which I admit should never happen, but things happen 😉) it never gets removed from the list of outstanding updates and if the client sends a new update, the old one is updated first which leaves the new update unfinished (and the cycle repeats). I believe the server should give up an invalid update at some point in case it can’t be completed properly. But that’s just an idea.

Damian-Nordic on Sep 20, 2021