nodejs-firestore: query.stream() time out with Error: 14 UNAVAILABLE after 60 sec

Environment details

  • OS: MacOS 10.13.6
  • Node.js version: v12.16.1
  • npm version: 6.13.4
  • @google-cloud/firestore version: 3.6.0 through firebase-admin 8.9.2

Problem

When using query.stream() to stream documents from a large collection (100k+ docs) the query stream stalls at 60 seconds without any kind of error. It just hangs. After enabling setLogFunction(console.error) the output shows:

Firestore (3.5.0) 2020-03-10T19:00:34.189Z riRrx [Query._stream]: Query failed with stream error: Error: 14 UNAVAILABLE: The datastore operation timed out, or the data was temporarily unavailable.
    at callErrorFromStatus (node_modules/@grpc/grpc-js/build/src/call.js:30:26)
    at Http2CallStream.<anonymous> (node_modules/@grpc/grpc-js/build/src/call.js:79:34)
    at Http2CallStream.emit (events.js:323:22)
    at node_modules/@grpc/grpc-js/build/src/call-stream.js:100:22
    at processTicksAndRejections (internal/process/task_queues.js:79:11) {
  code: 14,
  details: 'The datastore operation timed out, or the data was temporarily unavailable.',
  metadata: Metadata {
    internalRepr: Map { 'content-disposition' => [Array] },
    options: {}
  }
}

Expected outcome

That the stream would give me all of my result, alternatively emit an error to the stream when it fails.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 20 (8 by maintainers)

Most upvoted comments

@schmidt-sebastian Ok that is unfortunate. It pretty much renders the query.stream() method useless to us and to most people I believe since streaming is mostly relevant with large result sets which will take some time to fetch. Also why isn’t the stream error event triggered when this error occurs? I’d call that a bug regardless of the back end timeout limitations. Even stream end event isn’t triggered… my stream ended up in a stalled state.

@klon Thanks for filing this issue.

Unfortunately, 60 seconds is the maximum timeout we currently support on the backend. To work around this limitation, we suggest that you use the startAt()/startAfter()/endAt()/endBefore() APIs to limit the result set of a single query: https://firebase.google.com/docs/firestore/query-data/query-cursors

Alternatively, the onSnapshot API can potentially return larger result sets as the maximum timeout for the underlying onSnapshot RPCs is one hour.

Thanks for flagging this @klon . There seems to be different things going on at once in this issue. And IIUC the only outstanding issue today is the above error The requested snapshot version is too old.. I’m opening a new issue to look into that.

My one good piece of good news is that there is a lot of chatter on the backend side to finally lift the 60s deadline limitation. That doesn’t quite solve the problem that we should be raising an error here. I will give this another spin next week. @merlinnot If you think there is anything special in your code, it might help me if you provide a code snippet that reproduces this problem on your end.

As for returning partial results from the Watch stream - I still think this is not a good idea in almost all use cases, as the backend does not provide any guarantees about what they sent to us before they confirm that they have sent a complete snapshot. If you really want to give this a spin, you can try commenting out this.current in https://github.com/googleapis/nodejs-firestore/blob/master/dev/src/watch.ts#L459 and let us know if this helps.

To add to this, what I’m actually doing is a query.stream to populate my in-memory cache faster (partial results are very useful in my use case) and then query.onSnapshot on the same collection to keep the cache up to date.

It would be the best if I could stream results of query.onSnapshot straight away, instead of waiting for the initial (full) snapshot. Is it something that the back end API enables us to do? If so, would you welcome a PR to add this functionality after an alignment on the interface?

I will look at this and see if we are somehow dropping the error.

As for getting this documented: Yes, we should do that (it is not right now). We are however also working on lifting this limitation.