grpc-node: `@grpc/grpc-js` throw 'Received RST_STREAM with code 0' with retry enabled

Problem description

@grpc/grpc-js throw ‘Received RST_STREAM with code 0’ with retry enabled

Reproduction steps

  • Start an example HelloWorld Golang grpc server in Kubernetes and enable the istio sidecar
  • To reduce the connection reset(reset reason: connection termination), we set the max_concurrent_streams: 256
  • Making multi-calls same time by nodeJs with large responses (like response size 6k/per request, and 1000 requests)
  • Got Error: 13 INTERNAL: Received RST_STREAM with code 0

Environment

  • Kubernetes with istio envoy sidecar enabled(Thanks orbstack Kubernetes environment, I can debug this error in local VSCode environment)
    • NodeJs -> client sidecar(envoy) -> server
    • NodeJs -> server sidecar(envoy) -> server
    • NodeJs -> client sidecar(envoy) -> server sidecar(envoy) -> server
  • Node version: v16.18.1
  • Node installation method: Docker image
  • If applicable, compiler version: N/A
  • Package name and version: @grpc/grpc-js version > 1.7.3

Additional context

Logs with GRPC_TRACE=all GRPC_VERBOSITY=DEBUG

D 2023-09-09T23:58:35.160Z | load_balancing_call | [1513] Received metadata
D 2023-09-09T23:58:35.160Z | retrying_call | [1512] Received metadata from child [1513]
D 2023-09-09T23:58:35.160Z | retrying_call | [1512] Committing call [1513] at index 0
D 2023-09-09T23:58:35.160Z | resolving_call | [256] Received metadata
D 2023-09-09T23:58:35.160Z | subchannel_call | [3256] HTTP/2 stream closed with code 0
D 2023-09-09T23:58:35.160Z | subchannel_call | [3256] ended with status: code=13 details="Received RST_STREAM with code 0"
D 2023-09-09T23:58:35.160Z | load_balancing_call | [1513] Received status
D 2023-09-09T23:58:35.160Z | load_balancing_call | [1513] ended with status: code=13 details="Received RST_STREAM with code 0"
D 2023-09-09T23:58:35.160Z | retrying_call | [1512] Received status from child [1513]
D 2023-09-09T23:58:35.160Z | retrying_call | [1512] state=COMMITTED handling status with progress PROCESSED from child [1513] in state ACTIVE
D 2023-09-09T23:58:35.160Z | retrying_call | [1512] ended with status: code=13 details="Received RST_STREAM with code 0"
D 2023-09-09T23:58:35.160Z | resolving_call | [256] Received status
D 2023-09-09T23:58:35.160Z | resolving_call | [256] ended with status: code=13 details="Received RST_STREAM with code 0"

Error: 13 INTERNAL: Received RST_STREAM with code 0
    at callErrorFromStatus (/[redacted]/grpc-node/examples/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
    at Object.onReceiveStatus (/[redacted]/grpc-node/examples/node_modules/@grpc/grpc-js/build/src/client.js:192:76)
    at Object.onReceiveStatus (/[redacted]/grpc-node/examples/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
    at Object.onReceiveStatus (/[redacted]/grpc-node/examples/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
    at /[redacted]/grpc-node/examples/node_modules/@grpc/grpc-js/build/src/resolving-call.js:94:78
    at processTicksAndRejections (node:internal/process/task_queues:78:11)
for call at
    at ServiceClientImpl.makeUnaryRequest (/[redacted]/grpc-node/examples/node_modules/@grpc/grpc-js/build/src/client.js:160:32)
    at ServiceClientImpl.<anonymous> (/[redacted]/grpc-node/examples/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
    at /[redacted]/grpc-node/examples/helloworld/dynamic_codegen/client.js:73:14
    at new Promise (<anonymous>)
    at main (/[redacted]/grpc-node/examples/helloworld/dynamic_codegen/client.js:72:21)
    at Object.<anonymous> (/[redacted]/grpc-node/examples/helloworld/dynamic_codegen/client.js:102:1)
    at Module._compile (node:internal/modules/cjs/loader:1155:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1209:10)
    at Module.load (node:internal/modules/cjs/loader:1033:32)
    at Function.Module._load (node:internal/modules/cjs/loader:868:12) {
  code: 13,
  details: 'Received RST_STREAM with code 0',
  metadata: Metadata { internalRepr: Map(0) {}, options: {} }
}

Methods I can apply

  • Disable grpc.enable_retries by set grpc.enable_retries = 0
  • Downgrade to the version 1.7.3

It seems we can just ignore this error, and doesn’t affect the results. RST_STREAM with code NO_ERROR in HTTP2 RFC Error Codes

There are some similar cases in other projects:

So can we just ignore this NO_ERROR in the Node js SDK? Thanks in advance.

About this issue

  • Original URL
  • State: open
  • Created 10 months ago
  • Reactions: 1
  • Comments: 27 (13 by maintainers)

Most upvoted comments

We’ve been facing this error in our production environment for the past 3 days and it’s occurred roughly 15k times: Error: 13 INTERNAL: Received RST_STREAM with code 1

The error is triggered when executing the following code:

const writeResult = await admin
      .firestore()
      .collection(FirestoreCollections.Users)
      .doc(userID)
      .update(fieldsToUpdate); // This line throws the error

We’ve attempted several methods, but none have resolved the problem:

  • Switched the update method to use set and set with {merge: true}, but this didn’t work.
  • Created a new cloud function dedicated solely to this method. Surprisingly, it didn’t work either. However, it’s noteworthy that we have several other cloud functions that update user data using the same method, and they function as expected. What’s even more perplexing is that this issue arises in about 60% of our http cloud function calls, and it seems to occur randomly. We haven’t been able to identify a consistent pattern.

Interestingly, everything operates flawlessly in our development project. The only difference is that the development project has a smaller User collection.

@murgatroid99 Is any updates on this issue? We using nodejs v16.20.0 and firebase-admin library

The error this bug is about means that the client did not receive trailers and that it received an RST_STREAM with code 0 (no error).

Any suggestion how 1.9.x could lead this random issues with envoy? It may help to understand in which scenario this is causing the issue.

I presented a theory in a previous comment https://github.com/grpc/grpc-node/issues/2569#issuecomment-1883544814. The problem could also be related to https://github.com/envoyproxy/envoy/issues/30149 as mentioned in https://github.com/grpc/grpc-node/issues/2569#issuecomment-1899554032. But in general, I don’t know why Envoy does what it does; the right people to ask about that are the Envoy maintainers.

@maylorsan Code 1 is different from code 0. Please file a separate issue.

We’ve been facing this error in our production environment for the past 3 days and it’s occurred roughly 15k times: Error: 13 INTERNAL: Received RST_STREAM with code 1

The error is triggered when executing the following code:

const writeResult = await admin
      .firestore()
      .collection(FirestoreCollections.Users)
      .doc(userID)
      .update(fieldsToUpdate); // This line throws the error

We’ve attempted several methods, but none have resolved the problem:

  • Switched the update method to use set and set with {merge: true}, but this didn’t work.
  • Created a new cloud function dedicated solely to this method. Surprisingly, it didn’t work either. However, it’s noteworthy that we have several other cloud functions that update user data using the same method, and they function as expected. What’s even more perplexing is that this issue arises in about 60% of our http cloud function calls, and it seems to occur randomly. We haven’t been able to identify a consistent pattern.

Interestingly, everything operates flawlessly in our development project. The only difference is that the development project has a smaller User collection.

@murgatroid99 Is any updates on this issue? We using nodejs v16.20.0 and firebase-admin library

Same problem i am facing here. is there any update ?

OK, I see what happened: having retries enabled changed the sequencing of some low-level operations, which caused a different outcome with your specific test setup. With retries disabled, each HTTP/2 stream starts and ends in a single synchronous operation, while with retries enabled, starting and ending the stream are separated into two asynchronous operations. The result is that with retries disabled, each stream starts and ends one at a time, while with retries enabled, all of the streams start, and then all of the streams end. This means that in the “retries disabled” case, the remote end can finish processing the first stream before the last stream even starts, while in the “retries enabled” case, the remote end is guaranteed to see all of the streams open at once. So, the primary difference between the observable outcomes in these two tests is that the “retries enabled” test triggers the “max concurrent streams” limit, and the “retries disabled” test does not. If you instead forced the client to reach the “max concurrent streams” limit by opening streaming requests without closing them, you would see the same error in both cases.

The primary problem here, still, is that when the “max concurrent streams” limit is reached, Istio closes the stream in a way that is not valid within the gRPC protocol. Istio is what needs to be changed here. There may be a configuration option to change how it responds in this situation, but if not, this is a bug in Istio.