grpc-node: Getting 'Error: 13 INTERNAL: No message received' when sending/receiving JSON message that are 60 - 80 kb
Problem description
When sending JSON string that are 60 - 80 kb long, we are seeing intermittent ‘Error: 13 INTERNAL: No message received’ on the gRPC client server. Based on our investigation it is happening due to gRPC client receiving the Ok status call before it receives the message. When we added in the below setTimeout call the issue stopped happening.
Below is the exact error message
Error: Error: 13 INTERNAL: No message received at callErrorFromStatus (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\call.js:31:19) at Object.onReceiveStatus (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\client.js:180:80) at Object.onReceiveStatus (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\client-interceptors.js:360:141) at Object.onReceiveStatus (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\client-interceptors.js:323:181) at C:\grpc-test\node_modules\@grpc\grpc-js\build\src\resolving-call.js:99:78 at process.processTicksAndRejections (node:internal/process/task_queues:77:11) for call at at ServiceClientImpl.makeUnaryRequest (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\client.js:160:32) at ServiceClientImpl.testMethod (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\make-client.js:105:19) at C:\grpc-test\dist\client.js:25:16 at step (C:\grpc-test\node_modules\tslib\tslib.js:195:27) at Object.next (C:\grpc-test\node_modules\tslib\tslib.js:176:57) at C:\grpc-test\node_modules\tslib\tslib.js:169:75 at new Promise (<anonymous>) at Object.__awaiter (C:\grpc-test\node_modules\tslib\tslib.js:165:16) at C:\grpc-test\dist\client.js:22:60 at Layer.handle [as handle_request] (C:\grpc-test\node_modules\express\lib\router\layer.js:95:5) { code: 13, details: 'No message received', metadata: Metadata { internalRepr: Map(0) {}, options: {} } }
Reproduction steps
We setup gRPC client and gRPC server on two different servers and sent JSON string with 7k line count over the network. When we did this the above issue was reproducible.
Environment
- OS Name Microsoft Windows Server 2019 Standard
- Node version 20.11.1
- gRPC-js@1.10.4
Additional context
The issue started happening when we migrated over to grpc-js from deprecated grpc package.
About this issue
- Original URL
- State: open
- Created 3 months ago
- Comments: 17 (8 by maintainers)
I don’t yet have the setup to reproduce this. However, I did realize that we have an existing test that involves sending a large payload (~300KB), so I’d like to understand why that didn’t already catch this error
That package didn’t use the Node networking stack. If this is a bug in Node, that difference would be expected.
This is how HTTP/2 works. Data frames have a maximum size, and gRPC messages that are larger than that size are divided into multiple data frames.
I have no idea what this is referring to. Both libraries use the same protocol (gRPC over HTTP/2), which includes the same chunking rules. I have never even heard of the RSL protocol.
No. All of the operations involved in a gRPC stream are deterministic and sequential (to the extent that we care about), so it should be possible to get it right consistently without fudging the timing. There is no reason to believe that any particular timeout will consistently prevent this error, and 150ms will substantially harm performance in every other case.
OK, while I’m figuring out how to reproduce that myself, can you run you test server with the environment variables
GRPC_TRACE=all
,GRPC_VERBOSITY=DEBUG
, andNODE_DEBUG=http2
and share the output from a run that reproduces this error?