App: HIGH: [API Reliability] Prevent simultaneous calls to `GetMissingOnyxMessages` using a `deferredUpdates` queue
If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!
Background:
An onyxUpdate is content we receive through https responses or Pusher that will update the state the local data. They will only bring a smaller piece of the information that was updated, instead of constantly doing full fetches to update everything.
To guarantee that the data is in a correct state, we need those updates to be applied in order. So every update that will modify local data will have previousUpdateID and lastUpdateID. For every new update we receive, we check the received previousUpdateID with the local lastUpdateID.
If they are the same, that means that we can apply that update and we’ll have a correct state of the data. If they’re not, then we need to fetch the missing updates (using GetMissingOnyxMessages with an updateIDFrom with the local lastUpdateID and updateIDTo with the previousUpdateID that was received from the server.
Once we have those updates applied, we can then apply the received update.
We have onyxUpdates being applied both through http responses and Pusher (calls here and here). When we call saveUpdateInformation, we we’re triggering this callback function that will pause the SequentialQueue (so we don’t receive more onyxUpdates until this is finished), fetch the missing updates, apply the pending updates, and then unpause the SequentialQueue. But we can’t do that with onyxUpdates that comes from pusher.
Problem:
When we receive onyxUpdates, we’ll trigger one GetMissingOnyxMessages per update out of order that comes by.
Scenario:
- Alice posts to #big-room to create updateID=10
- Bob posts to #small-room two times, to create updateID=11 and 12
- Cathy is part of both rooms, and has a local lastUpdateID=9
- Cathy receives 11 – sends GetMissingOnyxUpdates. This call might takes a few seconds.
- Cathy receives 12 before (4) is finished. So the client sends GetMissingOnyxUpdates again
- Once (4) finishes and it’s applied locally, the response for GetMissingOnyxUpdates on (5) will be discarded.
In this scenario, we did 2 GetMissingOnyxMessages, when 1 would be enough. Please note that this is not limited to 2 calls today: if we had received 10 messages, it would do 10 API calls.
In this scenario, we did 3 GetMissingOnyxMessages.
Possible solution:
What I’m proposing is that we implement 3 changes.
- Only execute one GetMissingOnyxMessages call at a time;
- Implement a deferredUpdates queue that will hold all updates that can’t be applied immediately that will arrive while the call on (1) is being executed on an ordered manner. Then apply them if the pre-requisites are filled OR do another GetMissingOnyxMessages if we still find gaps on them
- Change the way we deal with received updates. If we can apply it immediately, we should trigger a check on the deferredUpdates queue to confirm if any of the held updates can be applied after we apply the received one.
By doing those, the expected behavior for the same scenario is:
- Alice posts to #big-room to create updateID=10
- Bob posts to #small-room three times, to create updateID=11, 12 and 13
- Cathy is part of both rooms, and has a local lastUpdateID=9
- Cathy receives 11 – sends GetMissingOnyxUpdates. This call takes a few seconds.
- Cathy receives 12 and 13 before (4) is finished. So the client stores those updates in the deferredUpdates queue. It’s important to note that these updates can arrive unordered.
- Once (4) finishes and it’s applied locally, we get all updates stored in deferredUpdates and apply them, in order. If we detect any gaps in the received updates, we need to go back to step (4)
In this scenario, we only did 1 GetMissingOnyxUpdates.
About this issue
- Original URL
- State: open
- Created 3 months ago
- Comments: 27 (18 by maintainers)
That’s fake news, we’ve reverted that PR and we’re testing it again.
OK, this was reverted. It caused issue https://github.com/Expensify/App/issues/39650, but also caused crashes in the app as you can see here: https://expensify.slack.com/archives/C049HHMV9SM/p1712264151436019
@chrispader can you investigate those please?
@danieldoglas Could you assign me and @hungvu193 here and add the Bug label so BZ team can help take care of eventual payment? Thanks!
Sorry bout that @wildan-m !