grpc-node: Error: No connection established

Problem description

Since google-gax forced the grpc-js version to 0.6.4 (https://github.com/googleapis/gax-nodejs/commit/628db9e84c61017dfaba22ac7c2215e2c79c9745), I am having the following issue :

Error: No connection established
    at Http2CallStream.call.on (/usr/src/node_modules/@google-cloud/bigtable/node_modules/@grpc/grpc-js/build/src/call.js:68:41)
    at Http2CallStream.emit (events.js:203:15)
    at Http2CallStream.EventEmitter.emit (domain.js:448:20)
    at process.nextTick (/usr/src/node_modules/@google-cloud/bigtable/node_modules/@grpc/grpc-js/build/src/call-stream.js:75:22)
    at process._tickCallback (internal/process/next_tick.js:61:11)

Reproduction steps

The bug only happens after one hour. We are using grpc-js inside @google-cloud/bigtable and our bigtable instance is autoscaling.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 46
  • Comments: 94 (30 by maintainers)

Most upvoted comments

@alexander-fenster I tried v0.6.5 and I still have the issue. Lots of Error: No connection established after an hour of runtime.

@murgatroid99 we are seeing that our services are running fine for 10ish minutes and then we start dropping connections. At this time we see deadline exceeded errors. Once this happens we are unable to have successful requests with both google-cloud/pubsub and google-cloud/datastore

@alexander-fenster we are seeing this using with services running within GKE containers

Same issue when using @google-cloud/datastore which depends on google-gax

I’m hoping this issue hasn’t resurfaced, but I did encounter the same error in a project that had been idle for some time. I recycled the pod so I should have an idea of the timeout being the cause in the future, but it impacted all our instances in a test environment.

The error:

err:
    message: '14 UNAVAILABLE: No connection established'
    name: Error
    stack: >
        Error: 14 UNAVAILABLE: No connection established
            at Object.callErrorFromStatus (/app/packages/animus/node_modules/@grpc/grpc-js/src/call.ts:81:24)
            at Object.onReceiveStatus (/app/packages/animus/node_modules/@grpc/grpc-js/src/client.ts:334:36)
            at Object.onReceiveStatus (/app/packages/animus/node_modules/@grpc/grpc-js/src/client-interceptors.ts:434:34)
            at Object.onReceiveStatus (/app/packages/animus/node_modules/@grpc/grpc-js/src/client-interceptors.ts:397:48)
            at Http2CallStream.outputStatus (/app/packages/animus/node_modules/@grpc/grpc-js/src/call-stream.ts:230:22)
            at Http2CallStream.maybeOutputStatus (/app/packages/animus/node_modules/@grpc/grpc-js/src/call-stream.ts:280:14)
            at Http2CallStream.endCall (/app/packages/animus/node_modules/@grpc/grpc-js/src/call-stream.ts:263:12)
            at Http2CallStream.cancelWithStatus (/app/packages/animus/node_modules/@grpc/grpc-js/src/call-stream.ts:597:10)
            at ChannelImplementation.tryPick (/app/packages/animus/node_modules/@grpc/grpc-js/src/channel.ts:387:22)
            at ChannelImplementation._startCallStream (/app/packages/animus/node_modules/@grpc/grpc-js/src/channel.ts:433:10)
            at Http2CallStream.start (/app/packages/animus/node_modules/@grpc/grpc-js/src/call-stream.ts:573:18)
            at BaseUnaryInterceptingCall.start (/app/packages/animus/node_modules/@grpc/grpc-js/src/client-interceptors.ts:374:15)
            at BaseUnaryInterceptingCall.start (/app/packages/animus/node_modules/@grpc/grpc-js/src/client-interceptors.ts:437:11)
            at ServiceClientImpl.makeUnaryRequest (/app/packages/animus/node_modules/@grpc/grpc-js/src/client.ts:315:10)
            at ServiceClientImpl.<anonymous> (/app/packages/animus/node_modules/@grpc/grpc-js/src/make-client.ts:174:15)
            at args (/app/packages/animus/node_modules/dialogflow/src/v2beta1/sessions_client.js:175:35)
            at /app/packages/animus/node_modules/google-gax/src/normalCalls/timeout.ts:54:13
            at OngoingCallPromise.call (/app/packages/animus/node_modules/google-gax/src/call.ts:82:23)
            at NormalApiCaller.call (/app/packages/animus/node_modules/google-gax/src/normalCalls/normalApiCaller.ts:46:15)
            at funcPromise.then.then (/app/packages/animus/node_modules/google-gax/src/createApiCall.ts:103:26)
            at Object.dynatraceOnServiceExecutionIndicator [as doInvoke] (/opt/dynatrace/oneagent/agent/bin/1.195.54.20200529-113801/any/nodejs/nodejsagent.js:1803:20)
            at Object.a.safeInvoke (/opt/dynatrace/oneagent/agent/bin/1.195.54.20200529-113801/any/nodejs/nodejsagent.js:1854:29)
            at /opt/dynatrace/oneagent/agent/bin/1.195.54.20200529-113801/any/nodejs/nodejsagent.js:7079:25
            at process._tickCallback (internal/process/next_tick.js:68:7)"
    code: 14

We install dependencies from the package-lock.json, which has @grpc/grpc-js locked in at 1.0.4.

{
  // deps...
  "@grpc/grpc-js": {
    "version": "1.0.4",
    "resolved": "https://registry.npmjs.org/@grpc/grpc-js/-/grpc-js-1.0.4.tgz",
    "integrity": "sha512-Qawt6HUrEmljQMPWnLnIXpcjelmtIAydi3M9awiG02WWJ1CmIvFEx4IOC1EsWUWUlabOGksRbpfvoIeZKFTNXw==",
    "requires": {
      "google-auth-library": "^6.0.0",
      "semver": "^6.2.0"
    },
    "dependencies": { /** deps... */ }
  }
}

Will have another issue opened if this has resurfaced. I’ll be back with more if it’s an issue we come across again.

Same here, with DataStore and PubSub, 0.6.5 doesn’t fix the issue

I have now published grpc-js version 0.6.9 with some changes that appear to further improve this situation in certain cases.

Same error use firebase admin work with firestore “No connection estabilished”

We downgraded @grpc/grpc-js to 0.5.4 and also other related google libraries to a version that don’t depend on @grpc/grpc-js@0.6.x. The problem seems gone. It may related to #1062 & #1061

@murgatroid99 Sorry for the late reply. Pods have been stable, version 0.6.18 has fixed my issues. Thanks again for the quick reply and the fix.

I had a rogue reference in my package-lock.json

After nuking that and starting from a clean slate I was able to get the correct grpc-js version. For those interested it was under google-gax as a required dependency

@grpc/grpc-js": "0.6.9"

I have since had three scheduled calls go off with no failures. Fingers crossed this is the one, thanks for the timely responses from all and the community support!

I have published grpc-js version 0.6.8 that contains more potential fixes for these errors.

I have published grpc-js version 0.6.7 that should hopefully resolve this issue for the people who are still experiencing it. If you can update to it please try it out.

We have seen those errors starting to occur around 1 hour after a fresh pod creation. After those started, all subsequent calls were starting to fail. Pintpointing lib to previous working version (0.5.2) resolved the issue. (using google Firestore)

would be reasonable to suggest using pinned @grpc/grpc-js versions ? just to avoid a large impact when those sensitive dependencies run into issues.

@murgatroid99 still testing 0.6.8 on my side. Haven’t encountered any error yet but I will leave it in production for a day to be sure. Is it possible not to pin @grpc/grpc-js@0.6.7 in @google-cloud/common-grpc ? It makes me push the 0.6.7 version by mistake. Thanks guys.

I use node-pubsub, npm ls @grpc/grpc-js:

└─┬ @google-cloud/pubsub@1.1.1
  ├── @grpc/grpc-js@0.6.7 
  └─┬ google-gax@1.7.1
    └── @grpc/grpc-js@0.6.7  deduped

After server restart all works fine, but after some time when I call publish nothing not sends. Latests debug info in logs:

Oct 10 07:05:03.554pm info app web.1 | subchannel | 172.217.13.74:443 READY -> IDLE
Oct 10 07:05:03.555pm info app web.1 | dns_resolver | Resolution update requested for target pubsub.googleapis.com:443
Oct 10 07:05:03.555pm info app web.1 | connectivity_state | pubsub.googleapis.com:443 READY -> CONNECTING
Oct 10 07:05:03.555pm info app web.1 | resolving_load_balancer | pubsub.googleapis.com:443 READY -> CONNECTING
Oct 10 07:05:03.555pm info app web.1 | pick_first | READY -> CONNECTING
Oct 10 07:05:03.555pm info app web.1 | pick_first | Start connecting to subchannel with address 172.217.13.74:443
Oct 10 07:05:03.555pm info app web.1 | pick_first | Connect to address list 172.217.13.74:443
Oct 10 07:05:03.557pm info app web.1 | subchannel | 172.217.13.74:443 IDLE -> CONNECTING
Oct 10 07:05:03.558pm info app web.1 | subchannel | 172.217.13.74:443 CONNECTING -> TRANSIENT_FAILURE
Oct 10 07:05:03.571pm info app web.1 | dns_resolver | Resolved addresses for target pubsub.googleapis.com:443: 172.217.13.74:443
Oct 10 07:05:03.572pm info app web.1 | connectivity_state | pubsub.googleapis.com:443 CONNECTING -> TRANSIENT_FAILURE
Oct 10 07:05:03.572pm info app web.1 | resolving_load_balancer | pubsub.googleapis.com:443 CONNECTING -> TRANSIENT_FAILURE
Oct 10 07:05:03.572pm info app web.1 | pick_first | CONNECTING -> TRANSIENT_FAILURE
Oct 10 07:05:03.572pm info app web.1 | pick_first | Connect to address list 172.217.13.74:443
Oct 10 07:05:04.558pm info app web.1 | subchannel | 172.217.13.74:443 TRANSIENT_FAILURE -> IDLE

And:

Oct 10 07:47:29.334pm info app web.1 10/10/2019, 7:47:29 PM - error: Error on add file to encoding queue: message=Retry total timeout exceeded before any response was received, stack=Error: Retry total timeout exceeded before any response was received
Oct 10 07:47:29.334pm info app web.1 at repeat (/app/node_modules/google-gax/build/src/normalCalls/retries.js:80:31)
Oct 10 07:47:29.334pm info app web.1 at Timeout.setTimeout [as _onTimeout] (/app/node_modules/google-gax/build/src/normalCalls/retries.js:113:25)
Oct 10 07:47:29.334pm info app web.1 at ontimeout (timers.js:436:11)
Oct 10 07:47:29.334pm info app web.1 at tryOnTimeout (timers.js:300:5)
Oct 10 07:47:29.334pm info app web.1 at listOnTimeout (timers.js:263:5)
Oct 10 07:47:29.334pm info app web.1 at Timer.processTimers (timers.js:223:10), code=4

And after some times(messagees still not sends to pubsub):

Oct 10 07:56:40.051pm info app web.1 | resolving_load_balancer | pubsub.googleapis.com:443 READY -> CONNECTING
Oct 10 07:56:40.051pm info app web.1 | pick_first | READY -> CONNECTING
Oct 10 07:56:40.051pm info app web.1 | pick_first | Start connecting to subchannel with address 172.217.13.74:443
Oct 10 07:56:40.051pm info app web.1 | pick_first | Connect to address list 172.217.13.74:443
Oct 10 07:56:40.051pm info app web.1 | subchannel | 172.217.13.74:443 READY -> IDLE
Oct 10 07:56:40.052pm info app web.1 | subchannel | 172.217.13.74:443 IDLE -> CONNECTING
Oct 10 07:56:40.052pm info app web.1 | dns_resolver | Resolution update requested for target pubsub.googleapis.com:443
Oct 10 07:56:40.052pm info app web.1 | connectivity_state | pubsub.googleapis.com:443 READY -> CONNECTING
Oct 10 07:56:40.054pm info app web.1 | subchannel | 172.217.13.74:443 CONNECTING -> TRANSIENT_FAILURE
Oct 10 07:56:40.062pm info app web.1 | resolving_load_balancer | pubsub.googleapis.com:443 CONNECTING -> CONNECTING
Oct 10 07:56:40.062pm info app web.1 | pick_first | CONNECTING -> CONNECTING
Oct 10 07:56:40.062pm info app web.1 | pick_first | Start connecting to subchannel with address 172.217.7.170:443
Oct 10 07:56:40.062pm info app web.1 | pick_first | Connect to address list 172.217.7.170:443
Oct 10 07:56:40.062pm info app web.1 | dns_resolver | Resolved addresses for target pubsub.googleapis.com:443: 172.217.7.170:443
Oct 10 07:56:40.063pm info app web.1 | subchannel | 172.217.7.170:443 IDLE -> CONNECTING
Oct 10 07:56:40.063pm info app web.1 | connectivity_state | pubsub.googleapis.com:443 CONNECTING -> CONNECTING
Oct 10 07:56:40.079pm info app web.1 | subchannel | 172.217.7.170:443 CONNECTING -> READY
Oct 10 07:56:40.080pm info app web.1 | connectivity_state | pubsub.googleapis.com:443 CONNECTING -> READY
Oct 10 07:56:40.080pm info app web.1 | resolving_load_balancer | pubsub.googleapis.com:443 CONNECTING -> READY
Oct 10 07:56:40.080pm info app web.1 | pick_first | CONNECTING -> READY
Oct 10 07:56:40.080pm info app web.1 | pick_first | Pick subchannel with address 172.217.7.170:443
Oct 10 07:56:41.052pm info app web.1 | subchannel | 172.217.13.74:443 TRANSIENT_FAILURE -> IDLE

I have published 0.6.6 with another possible fix for this problem. If anyone is willing to try it out and report the result that would be helpful.