grpc-node: Getting 'Error: 13 INTERNAL: No message received' when sending/receiving JSON message that are 60 - 80 kb

Problem description

When sending JSON string that are 60 - 80 kb long, we are seeing intermittent ‘Error: 13 INTERNAL: No message received’ on the gRPC client server. Based on our investigation it is happening due to gRPC client receiving the Ok status call before it receives the message. When we added in the below setTimeout call the issue stopped happening.

image

Below is the exact error message Error: Error: 13 INTERNAL: No message received at callErrorFromStatus (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\call.js:31:19) at Object.onReceiveStatus (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\client.js:180:80) at Object.onReceiveStatus (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\client-interceptors.js:360:141) at Object.onReceiveStatus (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\client-interceptors.js:323:181) at C:\grpc-test\node_modules\@grpc\grpc-js\build\src\resolving-call.js:99:78 at process.processTicksAndRejections (node:internal/process/task_queues:77:11) for call at at ServiceClientImpl.makeUnaryRequest (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\client.js:160:32) at ServiceClientImpl.testMethod (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\make-client.js:105:19) at C:\grpc-test\dist\client.js:25:16 at step (C:\grpc-test\node_modules\tslib\tslib.js:195:27) at Object.next (C:\grpc-test\node_modules\tslib\tslib.js:176:57) at C:\grpc-test\node_modules\tslib\tslib.js:169:75 at new Promise (<anonymous>) at Object.__awaiter (C:\grpc-test\node_modules\tslib\tslib.js:165:16) at C:\grpc-test\dist\client.js:22:60 at Layer.handle [as handle_request] (C:\grpc-test\node_modules\express\lib\router\layer.js:95:5) { code: 13, details: 'No message received', metadata: Metadata { internalRepr: Map(0) {}, options: {} } }

Reproduction steps

We setup gRPC client and gRPC server on two different servers and sent JSON string with 7k line count over the network. When we did this the above issue was reproducible.

Environment

  • OS Name Microsoft Windows Server 2019 Standard
  • Node version 20.11.1
  • gRPC-js@1.10.4

Additional context

The issue started happening when we migrated over to grpc-js from deprecated grpc package.

About this issue

  • Original URL
  • State: open
  • Created 3 months ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

Btw were you able to reproduce?

I don’t yet have the setup to reproduce this. However, I did realize that we have an existing test that involves sending a large payload (~300KB), so I’d like to understand why that didn’t already catch this error

  1. Our test uses TLS. Can you still reproduce the error when using TLS?
  2. Our test runs on Linux. Have you encountered this error on any platform other than Windows?

Ok this wasn’t occurring with the deprecated grpc pkg

That package didn’t use the Node networking stack. If this is a bug in Node, that difference would be expected.

Also, we noticed that grpc-js sends response in multiple chunks over TCP

This is how HTTP/2 works. Data frames have a maximum size, and gRPC messages that are larger than that size are divided into multiple data frames.

The legacy grpc sends the response in a single chunk over RSL protocol

I have no idea what this is referring to. Both libraries use the same protocol (gRPC over HTTP/2), which includes the same chunking rules. I have never even heard of the RSL protocol.

Also, do you think the 150ms timeout is a viable solution?

No. All of the operations involved in a gRPC stream are deterministic and sequential (to the extent that we care about), so it should be possible to get it right consistently without fudging the timing. There is no reason to believe that any particular timeout will consistently prevent this error, and 150ms will substantially harm performance in every other case.

OK, while I’m figuring out how to reproduce that myself, can you run you test server with the environment variables GRPC_TRACE=all, GRPC_VERBOSITY=DEBUG, and NODE_DEBUG=http2 and share the output from a run that reproduces this error?