grpc-node: ECONNRESET [{"message":"14 UNAVAILABLE: read ECONNRESET"}]

Problem description

We experience intermittent connection reset errors migrating to grpc-js version “1.3.2”. We are creating the Grpc client instance every time when message is processed. On the client side we see the stack trace ECONNRESET [{"message":"14 UNAVAILABLE: read ECONNRESET"}] and no log messages on the server side although GRPC_TRACE=all and GRPC_VERBOSITY=DEBUG on server side as well. We have attached the debug logs seen on the client side. This issue is seen in our production k8s environment and it is intermittent.

Environment

  • Node version 16.9.0
  • Package name and version @grpc/grpc-js: 1.3.2 and @grpc/proto-loader: 0.6.2

Additional context

Client Logging:

2021-09-19T16:53:33.253Z | call_stream | [79] write() called with message of length 34
2021-09-19T16:53:33.255Z | call_stream | [79] end() called
2021-09-19T16:53:33.258Z | call_stream | [79] deferring writing data chunk of length 39
2021-09-19T16:53:33.263Z | dns_resolver | Resolved addresses for target dns:x-des:50051: [x.x.x.x:50051]
2021-09-19T16:53:33.264Z | pick_first | Connect to address list x.x.x.x:50051
2021-09-19T16:53:33.264Z | subchannel_refcount | x.x.x.x:50051 refcount 4 -> 5
2021-09-19T16:53:33.264Z | pick_first | Pick subchannel with address x.x.x.x:50051
2021-09-19T16:53:33.264Z | pick_first | IDLE -> READY
2021-09-19T16:53:33.264Z | resolving_load_balancer | dns:x-des:50051 CONNECTING -> READY
2021-09-19T16:53:33.264Z | channel | callRefTimer.unref | configSelectionQueue.length=1 pickQueue.length=0
2021-09-19T16:53:33.265Z | connectivity_state | dns:x-des:50051 CONNECTING -> READY
2021-09-19T16:53:33.265Z | subchannel_refcount | x.x.x.x:50051 refcount 5 -> 6
2021-09-19T16:53:33.265Z | subchannel_refcount | x.x.x.x:50051 refcount 6 -> 5
2021-09-19T16:53:33.265Z | channel | Pick result: COMPLETE subchannel: x.x.x.x:50051 status: undefined undefined
2021-09-19T16:53:33.266Z | call_stream | Starting stream on subchannel x.x.x.x:50051 with headers
                grpc-timeout: 99946m
                grpc-accept-encoding: identity,deflate,gzip
                accept-encoding: identity
                :authority: x-des:50051
                user-agent: grpc-node-js/1.3.7
                content-type: application/grpc
                :method: POST
                :path: /io.xingular.device.Service/Read
                te: trailers

2021-09-19T16:53:33.266Z | call_stream | [79] attachHttp2Stream from subchannel x.x.x.x:50051
2021-09-19T16:53:33.266Z | subchannel_refcount | x.x.x.x:50051 callRefcount 0 -> 1
2021-09-19T16:53:33.266Z | call_stream | [79] sending data chunk of length 39 (deferred)
2021-09-19T16:53:33.266Z | call_stream | [79] calling end() on HTTP/2 stream
2021-09-19T16:53:33.273Z | call_stream | [79] Node error event: message=read ECONNRESET code=ECONNRESET errno=Unknown system error -104 syscall=read
2021-09-19T16:53:33.273Z | subchannel | x.x.x.x:50051 connection closed with error read ECONNRESET
2021-09-19T16:53:33.274Z | subchannel | x.x.x.x:50051 connection closed
2021-09-19T16:53:33.274Z | subchannel | x.x.x.x:50051 READY -> IDLE
2021-09-19T16:53:33.274Z | subchannel_refcount | x.x.x.x:50051 refcount 5 -> 4
2021-09-19T16:53:33.274Z | pick_first | READY -> IDLE
2021-09-19T16:53:33.275Z | resolving_load_balancer | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.275Z | connectivity_state | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.275Z | subchannel_refcount | x.x.x.x:50051 refcount 4 -> 3
2021-09-19T16:53:33.275Z | pick_first | READY -> IDLE
2021-09-19T16:53:33.275Z | resolving_load_balancer | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.275Z | connectivity_state | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.275Z | subchannel_refcount | x.x.x.x:50051 refcount 3 -> 2
2021-09-19T16:53:33.276Z | pick_first | READY -> IDLE
2021-09-19T16:53:33.276Z | resolving_load_balancer | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.276Z | connectivity_state | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.276Z | subchannel_refcount | x.x.x.x:50051 refcount 2 -> 1
2021-09-19T16:53:33.276Z | pick_first | READY -> IDLE
2021-09-19T16:53:33.276Z | resolving_load_balancer | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.276Z | connectivity_state | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.277Z | call_stream | [79] HTTP/2 stream closed with code 2
2021-09-19T16:53:33.277Z | call_stream | [79] ended with status: code=14 details="read ECONNRESET"
2021-09-19T16:53:33.278Z | subchannel_refcount | x.x.x.x:50051 callRefcount 1 -> 0
^[[31merror^[[39m: 2021-09-19T16:53:33.278Z: Error serving unary request 14 UNAVAILABLE: read ECONNRESET [{"message":"14 UNAVAILABLE: read ECONNRESET"}]

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 11
  • Comments: 32 (8 by maintainers)

Most upvoted comments

This works for us:

const channelOptions: ChannelOptions = {
  // Send keepalive pings every 10 seconds, default is 2 hours.
  'grpc.keepalive_time_ms': 10 * 1000,
  // Keepalive ping timeout after 5 seconds, default is 20 seconds.
  'grpc.keepalive_timeout_ms': 5 * 1000,
  // Allow keepalive pings when there are no gRPC calls.
  'grpc.keepalive_permit_without_calls': 1,
};

✌️

I set the url parameters to localhost:3000 And it works for me

export const grpcClientOptions: ClientOptions = {
  transport: Transport.GRPC,
  options: {
    package: 'hero',
    protoPath: join(__dirname, './hero/hero.proto'),
    url: 'localhost:3000',
  },
}

image

Still no answer? We are also having this issue, its quite frustrating because we cant replicate it

To new commenters, see my first response on this issue:

Sometimes connections get dropped for reasons outside of the library’s control. Just retry the call when that happens. The client will reconnect when you do.

ECONNRESET is a relatively low-level network error. It’s unlikely that it’s caused by the framework.

You might want to try this version of the retry interceptor, which was actually tested. Just remove the code that references the registry, because that’s there for the test.

all i can see in this library is "grpc.initial_reconnect_backoff_ms" under channelOptions so my assumption would be it does retries by default, however it does not seem like this is the case

Connections are separate things for requests. This is one of the options that controls how the channel re-establishes connections after they are dropped. The @grpc/grpc-js library does not currently have any built-in support for doing requests automatically, so you will need to do that yourself. For example, you could use an interceptor, as described in the interceptor spec’s examples section.