apollo-client: Queries hang and are not issued on network requests
Intended outcome:
Queries should be dispatched to the network.
Actual outcome:
Queries are not dispatched to the network, they just hang.
While using apollo-angular@3.0.1
with @apollo/client@3.6.x
it appears that some queries start hanging after a while.
I’m also using graphql-code-generator to generate an apollo-angular data service, so usage looks like this:
this.someQuery
.fetch({
myVariable: 1
})
.pipe(map((res) => res.data.someQuery))
.subscribe(foo => console.log(foo));
After a while (within 1-2 minutes), the subscribe block will not be called any more. This can also be used with rxjs
’s await firstValueFrom(...)
and the promise will never resolve.
Downgrading to 3.6.2 3.5.10 seems to fix the problem.
How to reproduce the issue:
Unfortunately no repro, but hopefully it might be easy to pinpoint what would affect these by comparing 3.6.2 to 3.6.3. 3.6.0 and 3.5.10.
Versions
System:
OS: macOS 12.3.1
Binaries:
Node: 16.14.2 - ~/.nvm/versions/node/v16.14.2/bin/node
Yarn: 3.2.0-git.20220329.hash-0764215 - ~/.nvm/versions/node/v16.14.2/bin/yarn
Browsers:
Chrome: 101.0.4951.54
Safari: 15.4
npmPackages:
@apollo/client: 3.6.3 => 3.6.3
apollo-angular: ^3.0.1 => 3.0.1
apollo-link-scalars: ^4.0.1 => 4.0.1
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 5
- Comments: 45 (32 by maintainers)
Commits related to this issue
- Add basic regression test for issues #7608 and #9690. — committed to apollographql/apollo-client by benjamn 2 years ago
- Add basic regression test for issues #7608 and #9690. — committed to apollographql/apollo-client by benjamn 2 years ago
- Guarantee `Concast` cleanup without `Observable cancelled prematurely` rejection (#9701) Should fix/improve issues #7608 and #9690, and possibly others. — committed to apollographql/apollo-client by benjamn 2 years ago
- Implement concast.beforeNext as replacement for concast.cleanup. Doing this right would potentially resolve issue #9690. — committed to apollographql/apollo-client by benjamn 2 years ago
- Implement concast.beforeNext as replacement for concast.cleanup. Doing this right would potentially resolve issue #9690. — committed to apollographql/apollo-client by benjamn 2 years ago
- Implement concast.beforeNext as replacement for concast.cleanup. Doing this right would potentially resolve issue #9690. — committed to apollographql/apollo-client by benjamn 2 years ago
- Backport PR #9793 from apollographql/issue-9773-unbreak-BatchHttpLink PR #9793 was first released in v3.7.0-beta.3 for testing, and now (in this PR) will be backported to the `main` branch, to be rel... — committed to apollographql/apollo-client by benjamn 2 years ago
@jdmoody I appreciate all the details, and while I share your uncertainty about how many different issues we’re talking about here, I think there must be something wrong with
BatchHttpLink
, probably a bug that’s been there for a while but was only revealed by other changes recently, so that’s what I’m currently looking into.Run some e2e tests on 3.7.0-beta.3 and they are all green. Thanks for the update @benjamn
I am also having this issue.
Some queries hang are never dispatched with
3.6.*
- but only when usingBatchHttpLink
- the issue does not occur when usingHttpLink
Just tried3.7.0-alpha.5
- no joy.Last version I can get to work with my
BatchHttpLink
setup is3.5.10
Don’t have time to put together a repro - also it seems intermittent - but can answer questions if it helps…
I also have hanging queries as described in this issue. Some details:
3.2.2
to3.6.5
(other issues prevent me from upgrading to versions between 3.2 and 3.6)3.7.0-alpha.4
BatchHttpLink
and the issue goes away when I replaceBatchHttpLink
with apollo client’sHttpLink
useQuery
call made that hangs (I can go into more detail about how I measured this if helpful).This has been especially hard to debug because I haven’t been able to reproduce it in a dev environment. I’m only able to reproduce it when deploying my app to a production-like environment. Then, when I try to attach a debugger to the box, I’m no longer able to reproduce it 😵
Afaict, this is the only issue blocking me from upgrading to v3.6, which is the only thing blocking me from using certain React 18 features.
It’s also unclear to me whether all the behaviors described by folks in this issue are indeed the same issue. If it would be helpful for me to create a separate issue, or if there’s any other info I can provide, please let me know 🙏🏻
Alright, I believe this regression stems originally from PR #9248, which made the batch link capable of cancelling in-flight batched operations when the underlying observable is terminated.
This theory about the source of the regression is consistent with @andrew-hu368’s comment about the old
apollo-link-batch-http
version ofBatchHttpLink
still working (😂), since that version does not contain PR #9248, which was released more recently (first in v3.6.0-beta.4, and then officially in v3.6.0).While I am open to making the changes from PR #9248 more purely opt-in, I believe my PR #9793 fixes the problem without completely abandoning #9248.
@andreialecu @jdmoody @bentron2000 @doflo-dfa @nikhilgupta16 @vieira @andrew-hu368 (and anyone else I missed) Please try running
npm i @apollo/client@beta
to get version 3.7.0-beta.3, which includes #9793.If that doesn’t work, please try
npm i @apollo/client@3.7.0-beta.2
(note: 2 not 3) to see if the full reversion of PR #9248 (described in #9793) makes any difference for you. If you see any differences in behavior between these two versions, please describe the differences here in detail. I’m hoping the simpler changes in@apollo/client@3.7.0-beta.3
are enough to fix the regression, without completely undoing #9248.I am not sure if it is related. We’ve recently upgraded from apollo v2 to apollo v3 (latest version). If I import the old
BatchHttpLink
fromapollo-link-batch-http
the queries are successfully run, while the new version doesn’t send requests.We downgraded to 3.5.10 and it seems to work as expected.
Hello, I have tried 3.6.6 which potentially resolves a possible cause for this issue (#9718) but after some testing the issue is still present.
As everyone else, we are also using
BatchHttpLink
and the last version that is working without this issue for us is 3.5.10.With deduplication disabled, the initial query may still get stuck randomly, but subsequent ones will go through even if they use the same variables.
They’re still leaking and getting stuck here in
inFlightLinkObservables
:However, because some queries are still getting stuck, this causes all sorts of issues.
Once the bug starts happening any subsequent queries with the exact same variables get stuck.
They’re never removed and none of the promises/observables for that specific query ever resolve.
I have one particular query where I can reproduce it very easily (because it is issued very frequently). But I have seen it hang for other queries as well.
To clarify: those 41 queries are for the same operation and same variables.
@benjamn unfortunately the bug in the OP still exists in
3.7.0-alpha.3
Notice how
observable
contains something, is deduplicated and not issued. Also notice how the Concast got to41
(stuck) observers.As mentioned previously, the issue started between
3.5.10
and3.6.0
but there are too many commits for me to be able to pinpoint one easily. So if you have any clue which one might’ve touched anything in this area, please let me know so I can try reverting it.Alternatively, if you have any suggestions where to set any breakpoints and what to inspect, that would also be great.
Thanks for all the details @andreialecu! I think you’re on the right track, and the
Observable canceled prematurely
error is a long-standing hard-to-pin-down issue, so it would be great to fix that finally as well. 🤞I can do some digging/testing today with this information. I’ll report back when I have news.
So it appears that there’s a callback being added to
concast.cleanup()
that is supposed to remove the observable from thatinFlightLinkObservables
map.I think it’s not being called most likely due to a race condition/or teardown situation similar to the one causing the “Observable cancelled prematurely” error.
The problem seems to be related to
Concast
not cleaning up properly:Possibly relevant: https://github.com/apollographql/apollo-client/blob/da3355ce794e105ad7f2652595fc33527a8a461b/src/core/QueryManager.ts#L991-L996
https://github.com/apollographql/apollo-client/blob/da3355ce794e105ad7f2652595fc33527a8a461b/src/core/QueryManager.ts#L1159-L1165
I can reproduce it consistently, if it would help to set up a screen sharing session at some point feel free to DM me on Twitter.
Update: in
QueryManager
,getObservableFromLink
seems to have the stuck query ininFlightLinkObservables_1
and because of deduplication it is not issued any more.I think that’s what’s causing the issue.
Now as to why it gets stuck there, will need to investigate further.
Actually the plot thickens.
I’ve changed some things so that those observables don’t resubscribe. This prevented the “Observable cancelled prematurely” from happening, but the queries still hang. 🤔
I have an update.
It appears this also happens on 3.6.2 (deployed it in production earlier) so is not a recent regression in 3.6.3.
It seems to be related to peak times somehow (a lot of subscriptions chatter at least).
I have reverted to 3.4.17 for now where everything seems fine. I haven’t yet checked 3.5.x.
Since this happens in apollo angular I assume it’s an issue in the core and not the react part.
@benjamn somehow I’m not able to reproduce this early today, so I’m very confused.
Yesterday it was reproducible every single time in production, staging and development - but it was occurring during peak times.
We use graphql subscriptions and there’s a lot of chatter over them, that’s the only thing that could be relevant. One of the subscriptions then triggers a query on a certain condition.
I’ll try to reproduce it during the next peak.
The problems cleared up for all of our customers after deploying a downgrade to 3.4.17 initially as per my comment in https://github.com/apollographql/apollo-client/issues/9456#issuecomment-1119942293
I’ve then tried upgrading up until 3.6.2 and I couldn’t reproduce it. On 3.6.3 it was reproducible again.
Peak times were fading by that point, so it might be possible the problem isn’t exactly between 3.6.2 and 3.6.3 but could be earlier. I’ll continue monitoring this over the next few days.