amplify-js: DataStore would not attempt to re-establish subscriptions when subscription timeout or handshake error occured
Describe the bug
Subscriptions with PubSub in DataStore can be very fragile and would not re-attempt to re-estalish itself when an internet disconnection happen over a short period of time. Network drops are common and therefore DataStore’s subscriptions needs the resilience to recover from such scenario.
After the merge of PR #6366, DataStore would only re-estalish the subscriptions in the case of Connection closed and Timeout disconnect. However, if there is a failure when conducting a handshake or when the client didnt receive start_ack from server, DataStore would not do anything. Do that note that, on React Native, NetInfo is unable to detect a transient network drop has occured. Hence, it will tell DataStore that the client is online throughout despite a network drop.
I have highlighted the failure point in red which DataStore subscriptions does not recover.

In purple, currently the client will only know that the websocket has broken off after 5 minutes when it did not receive KA from the server. In my opinion, 5 minutes is too long for PubSub to detect Timeout disconnect has happened. We would lose all the data that happened during this period, it simply doesn’t cut for our project. Perhaps we can lower this down to a minute and half?
To Reproduce Steps to reproduce the behavior:
- iPad > Developer setting > Network Link Conditioner > Enable Bad network.
- Leave the app open for few hours and check back every 1 hour to see whether the subscription remains alive or not.
This can be tricky to reproduce therefore you need to have the patience.
Expected behavior
DataStore subscriptions should recover or at least re-estalish itself when handshake error or subscription timeout has occured.
**I’ve made some changes to force DataStore to re-establish the subscriptions when subscription timeout has occured. However, if we would constantly re-establish the subsciptions on the same websocket over and over within a short period, the client would easily hit Max subscriptions error. I don’t know what else I can do apart from this.
What is Configured?
Environment
System:
OS: macOS 10.15.5
CPU: (4) x64 Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
Memory: 179.66 MB / 8.00 GB
Shell: 5.7.1 - /bin/zsh
Binaries:
Node: 14.3.0 - /usr/local/bin/node
Yarn: 1.22.4 - /usr/local/bin/yarn
npm: 6.14.4 - /usr/local/bin/npm
Watchman: 4.9.0 - /usr/local/bin/watchman
Browsers:
Chrome: 84.0.4147.105
Safari: 13.1.1
npmGlobalPackages:
@aws-amplify/cli: 4.22.0
ios-deploy: 1.10.0
npm: 6.14.4
react-native-cli: 2.0.1
Smartphone (please complete the following information):
- Device: iPad
- OS: iOS 13
- React Native
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 34 (20 by maintainers)
Hey sorry everyone, I’m closing this issue as it has become cluttered and I could not provide clear reproduction steps despite listing the flaws it has in words.
I have given up on DataStore and I wholeheartledly cannot vouch to use it on production due to the number of criticial bugs it still has and its architecture that I cant reason with.
You should file a new issue if you are still facing the same problem I had in the past.
We are still thinking through this one, as it also relates to https://github.com/aws-amplify/amplify-js/issues/7036. To summarize, it appears that there are actually 2 separate (but related) issues within this. One being the subscription timeout regarding the default timeout of 5 minutes, and the other being how the handshake error is handled.
For the subscription timeout, this could potentially be some configuration added in the future to override the default value of 5 minutes: https://github.com/aws-amplify/amplify-js/blob/2a53beff4ffbeeb9857dd8c144fcf950ba09f7e8/packages/pubsub/src/Providers/AWSAppSyncRealTimeProvider.ts#L143 There are certain intricacies as to why this was used, but we will treat this as a feature request for now and make sure it is accounted for with any future updates around this behavior since this is the same across all other platforms and potentially involves AppSync changes as well.
For the handshake error, this is also tough one to reason about because replicating the issue reliably has still proven to be difficult without changing the code as you mentioned. However, it seems like the main issue here is dealing with network connectivity issues when NetInfo is unable to detect and transient issues, and thus doesn’t send an
offlineevent. As I noted earlier, DataStore/PubSub does actually retry the connection multiple times until the max delay of the jittered retry is met, but apparently sometimes that length of time is not enough. You and I did previously discuss adding more time to the max delay which could probably solve most cases, but it’s likely that the issue would pop up at some point in the future. I’m also not a fan of having DS retry indefinitely for the reason you provided as well.With that said, one potential solution to the handshake error would be to emit a Hub event so that the developer can listen and retry (
DataStore.stop()&DataStore.start()) based on their own app business logic.We are still thinking through these solutions as they related to other issues, so I will keep you posted as things progress.
Just wanted to reach out and see if there is any update on this. Thanks!
I am still experiencing this due to my wifi connection intermittently dropping and reconnecting. Data does not seem to be syncing to the cloud either, but it does when i have an uninterrupted connection.
no bad bot
It’s been almost 5 months, erm, what can we expect here?
I basically spent my entire day trying to prove there’s indeed a problem(or a gap) with the underlying mechanism.
What I did was, changing AWSAppSyncRealTimeProvider.ts#L634-L653 to force it to throw a reject error (simulating
Connection handshake error).I encountered 2 tiny bugs after I did that, hence I’ve submitted a PR for it #7225.
Here’s a video that shows DataStore does not recover from it: https://streamable.com/f0zfq7
My observation: DataStore will not perform a retry.
Calling(this is wrong, as DS is not initialized, calling it the second time will reattempt again but it would not help if DS is already initialized) and no mutations will go through.DataStore.start()multiple times doesnt help andHow does this translate to real world scenario? When intermittent connection drop happened on a device for at least 2 second while the websocket is getting established, all consecutive
jitteredExponentialRetry()will fail. When this happens, DataStore receive an error (Connection handshake error) from the subscription but it will not perform a datastore ‘restart’ so to speak.You might wonder why
datastoreConnectivity.status()did not inform DS the device went offline. Technically the device went offline but it was only for a 1-2 second (connection drop). NetInfo fails to catch that.I really hope my information would be make useful to expedite the fix.