query: Potential race condition exists between an observer subscribing to a query and when that query's response comes back

Describe the bug

I am seeing a bug in my application when using mocked network requests (i.e. query fetch requests resolve immediately), resulting from a race condition between these network requests resolving and a component’s observer subscribing to the query’s changes. Unfortunately, I can not reproduce it outside of our proprietary codebase, so simply demonstrating the theoretical possibility will need to suffice.

Consider two components, A and B, both of which call useQuery with the same queryKey (thus sharing the same underlying Query instance).

  1. A calls useQuery. This creates a new Query instance and the query begins fetching. A receives isLoading: true from the hook.
  2. B calls useQuery, sharing the same Query instance as A, and therefore not executing a new fetch. B receives isLoading: true from the hook.
  3. The useEffect hooks for A execute, resulting in the query observer for A calling addObserver on its query (via useSyncExternalStore), thus subscribing to query state notifications.
  4. The network request resolves, updating the Query state and notifying A of changes.
  5. The useEffect hooks for B execute, so now B is subscribed to query state notifications, but it does not get notified of the query state updates from the previous step. This is the race condition.
  6. A re-renders with the updated query state, B does not.

This race condition exists because React Query does not control how the JS event loop will schedule the useEffect calls vs. the resolution of the network request, since both are happening asynchronously. Of course, the moment any real network latency is added to the network call, the execution of the useEffect hooks first is effectively guaranteed. Furthermore, while I can reproduce this dependably in Chrome, I am not able to do so in Firefox, which hints at the role that the JS engine is playing in regards to how the work is being scheduled on the event loop.

Our codebase currently has nearly 1700 testcafe tests, and this edge case bug appeared on only a few, all of which were in the same test suite, thus indicating just how much of an edge case it is.

FWIW, component B in our case is a child of component A, and is rendered as a Next.js dynamic component. I don’t see any reason to believe that is relevant beyond it having some random effect on the scheduling of work on the event loop, but in theory so could anything else.

Solution I’d be happy to open a PR to patch this if desired. The solution is fairly simple: the observer needs to subscribe to query notifications immediately, rather than waiting until the useEffect hooks get executed. Since the handleStoreChange function is not available from useSyncExternalStore until the useEffect hooks are called, however, this initial subscription would just set a flag to indicate that handleStoreChange needs to be called once it is available. Such a flag could be implemented as an instance property on the observer or as a ref inside useBaseQuery, but regardless it would look something like this:

const [observer] = React.useState(
  () => {
    const observer = new Observer<TQueryFnData, TError, TData, TQueryData, TQueryKey>(
      queryClient,
      defaultedOptions,
    )
    observer.subscribe(() => {
      observer.needsToFlushStoreChange = true
    })
    return observer
  }
)

useSyncExternalStore(
  React.useCallback(
    (onStoreChange) => {
      if (isRestoring) return () => undefined;
      if (observer.needsToFlushStoreChange) {
        observer.needsToFlushStoreChange = false;
        onStoreChange();
      }

      observer.listeners = []; // Remove `needsToFlushStoreChange` subscription
      return observer.subscribe(notifyManager.batchCalls(onStoreChange));
    },
    [observer, isRestoring]
  ),
  () => observer.getCurrentResult(),
  () => observer.getCurrentResult()
);

Your minimal, reproducible example

N/A

Steps to reproduce

As described above, I am not able to create a standalone reproduction.

Expected behavior

I expect all components that call useQuery to be subscribed to that query in time to immediately receive state updates.

How often does this bug happen?

Almost always

Screenshots or Videos

No response

Platform

  • OS: OSX 13.3.1, and whatever version of Alpine Linux our CI uses
  • Chrome 113.0.5672.126

Tanstack Query adapter

react-query

TanStack Query version

4.29.7

TypeScript version

4.3.2

Additional context

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 21 (11 by maintainers)

Most upvoted comments

I have a failing test and a fix; brace yourselves