grpc-node: Calls trigger ERR_HTTP2_SESSION_ERROR with node 20.10.0
Problem description
When running multiple gRPC requests asynchronously with the same gRPC client, gRPC client at some point starts cancelling thoe calls with the following error message: Received RST_STREAM with code 2 triggered by internal client error: Session closed with error code 2
I’ve enabled the following ENV variables GRPC_TRACE=all
and GRPC_VERBOSITY=DEBUG
and these are the logs for one of the calls that received the RST_STREAM error.
D 2023-11-28T11:52:30.785Z | subchannel_call | [5567] write() called with message of length 433
D 2023-11-28T11:52:30.785Z | subchannel_call | [5567] sending data chunk of length 433
D 2023-11-28T11:52:31.297Z | subchannel_call | [5567] end() called
D 2023-11-28T11:52:31.297Z | subchannel_call | [5567] calling end() on HTTP/2 stream
D 2023-11-28T11:52:35.382Z | subchannel_call | [5567] Node error event: message=Session closed with error code 2 code=ERR_HTTP2_SESSION_ERROR errno=Unknown system error undefined syscall=undefined
D 2023-11-28T11:52:35.393Z | subchannel_call | [5567] HTTP/2 stream closed with code 2
D 2023-11-28T11:52:35.393Z | subchannel_call | [5567] ended with status: code=13 details="Received RST_STREAM with code 2 triggered by internal client error: Session closed with error code 2"
Right before the calls start triggering the ERR_HTTP2_SESSION_ERROR
, i found the following a log explaining the client got closed by GOAWAY with code 2. 0.0.0.0:3000 connection closed by GOAWAY with code 2
.
D 2023-11-28T11:52:35.354Z | resolving_call | [6070] Received status
D 2023-11-28T11:52:35.354Z | resolving_call | [6070] ended with status: code=0 details="OK"
D 2023-11-28T11:52:35.354Z | subchannel_call | [6202] HTTP/2 stream closed with code 0
D 2023-11-28T11:52:35.354Z | subchannel_call | [6424] Received server trailers:
grpc-status: 0
grpc-message: OK
D 2023-11-28T11:52:35.354Z | subchannel_call | [6424] received status code 0 from server
D 2023-11-28T11:52:35.354Z | subchannel_call | [6424] received status details string "OK" from server
D 2023-11-28T11:52:35.354Z | subchannel_call | [6424] close http2 stream with code 0
D 2023-11-28T11:52:35.354Z | subchannel_call | [6424] ended with status: code=0 details="OK"
D 2023-11-28T11:52:35.354Z | load_balancing_call | [6338] Received status
D 2023-11-28T11:52:35.354Z | load_balancing_call | [6338] ended with status: code=0 details="OK"
D 2023-11-28T11:52:35.354Z | retrying_call | [6337] Received status from child [6338]
D 2023-11-28T11:52:35.354Z | retrying_call | [6337] state=COMMITTED handling status with progress PROCESSED from child [6338] in state ACTIVE
D 2023-11-28T11:52:35.354Z | retrying_call | [6337] ended with status: code=0 details="OK"
D 2023-11-28T11:52:35.354Z | resolving_call | [6247] Received status
D 2023-11-28T11:52:35.354Z | resolving_call | [6247] ended with status: code=0 details="OK"
D 2023-11-28T11:52:35.354Z | subchannel_call | [6424] HTTP/2 stream closed with code 0
D 2023-11-28T11:52:35.355Z | transport | (12) 0.0.0.0:3000 connection closed by GOAWAY with code 2
D 2023-11-28T11:52:35.355Z | subchannel | (11) 0.0.0.0:3000 READY -> IDLE
D 2023-11-28T11:52:35.355Z | subchannel_refcount | (11) 0.0.0.0:3000 refcount 2 -> 1
D 2023-11-28T11:52:35.355Z | pick_first | READY -> IDLE
D 2023-11-28T11:52:35.355Z | resolving_load_balancer | dns:0.0.0.0:3000 READY -> IDLE
D 2023-11-28T11:52:35.355Z | connectivity_state | (1) dns:0.0.0.0:3000 READY -> IDLE
You can find the full logs in the attached file: grpcLogs.txt
This error happened after upgrading our NodeJS project to version 20.10.0. This problem is only replicable when there are a ‘‘large’’ (2361 calls from the logs) amount of gRPC calls triggered. When there is a reduced amout of calls, I couldn’t replicate the error.
Moreover, the ERR_HTTP2_SESSION_ERROR
is not replicable with lower versions Node versions ( issue didn’t appear when runnin on node 20.5.1 ).
Reproduction steps
Use Nodejs 20.10.0 and install grpc/grpc-js 1.9.12. Run multiple gRPC calls with the same client asynchornously. Issue is also replicable with other older versions of grpc/grpc-js such as 1.8.14 .
Environment
- OS name, version and architecture: Arch Linux amd64
- Node version 20.10.0/20.9.0
- Node installation method: nvm
- Package name and version: 1.9.12
About this issue
- Original URL
- State: open
- Created 7 months ago
- Reactions: 4
- Comments: 23 (14 by maintainers)
I am facing exactly the same error in my application that involves a high degree of concurrency. In my case, I see exactly the same error/log output with Node version 18.19.0 and Node 20.10.0. It works if I fix the Node version at 18.16.0 - I did not try the versions between 18.16.0 and 18.19.0, but the root cause of the problem must be in a code change between any of those versions I suppose.
I have investigated the traffic with Wireshark and the server sends a HTTP2 GOAWAY with Code 2. The @grpc/grpc-js client then throws the mentioned exception and in my case does not retry the request. I have tried @grpc/grpc-js versions 1.7.3. and 1.9.13, the behavior is the same. Behavior is the same on Windows 11, Ubuntu 20.04, and Ubuntu 22.04. It does not matter if the client manages the channel on its own or if I override the channel for all concurrent requests to the same one.
My logs are identical to the one of @AlbertLlebaria. Please let me know when there is something else that I can provide to further investigate the issue. Thanks a lot! 😃
I honestly forgot I had that in there.
destroyHttp2Stream
does need to callhttp2Stream.close
sometimes, to deal with cancellations. The condition should probably be that it makes that call if the client has not received trailers or aclose
event from the server.I am also facing the same issue in my application and mostly during peak usage. I am getting the same exception which causing context cancellation for other upstream services. I am on node version 20.10.0 and using grpc-js version
1.9.13
.Please let me know if I can provide any more context or help in investigation of the issue. Thank you.