nodejs-pubsub: DEADLINE_EXCEEDED makes application not receiving messages at all
Environment details
Node.js version: v12.7.0 npm version: 6.10.0 @google-cloud/pubsub version: “^1.0.0”,
Error:
insertId: "gnr3q1fz7eerd" jsonPayload: { level: "error" message: "unhandledRejection" originalError: { ackIds: [1] code: 4 details: "Deadline exceeded" } }
After receiving this error, the app does not receive messages anymore and we have to exit the application to recreate the kubernetes pod.
Any help would be appreciated!
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 12
- Comments: 99 (29 by maintainers)
@pwrkpop @npomfret @xoraingroup, rather than rolling back to
0.29.1
of@google-cloud/pubsub
I recommend one of the following workarounds:Using grpc, rather than @grpc/grpc-js
@google-cloud/pubsub
, add the dependencygrpc
(this is the old gRPC transport layer).☝️ this same approach can be used for other libraries that use gRPC, e.g.,
Using the workaround recommended by @Redgwell
We potentially have a reproduction of the issue described in this thread (thanks @Redgwell for pointing us in the right direction), and will hopefully have a fix out soon that makes either of these workaround unnecessary.
I can confirm this after upgrading to PubSub ^1.0.0, all our services stop sending pubsubs after the error occurs.
The full stacktrace is
Can I suggest raising the priority on this issue?
We used to see these error messages, we now see these errors in all our projects that use it:
this comment makes no sense from optics and I totally agree with @MatthieuLemoine release a proper fix for this or one with this ‘workaround’ built in, asking customers to change their production code with some speculative ‘fix’ is irresponsible
what happens when this is actually fixed and this ends up causing more problems later
the updated documentation does not even mention what kind of workloads would be better to use native or not making the mere suggestion of using it even more confusing and bug prone potentially
I just wanted to give an update before the weekend, we do have a version of
@grpc/grpc-js
(0.6.9
), that all signs are indicating is stable:The reason I was holding off on this update, was that we were doing more stress testing on the system that we had managed to reproduce this issue on.
If anyone is still bumping into issues on
0.6.9
, please:open a new issue on PubSub, so that we can debug your issue in isolation (just in case there’s more than one thing being debugged in this thread).
run your environment with the following environment variables set:
So far, with debug information, @murgatroid99 has been able to address issues almost immediately.
If you do not want to share your logs publicly (understandably) you can open an issue through our issue tracker, and also email me (
bencoe [at] google.com
so that I can make sure it’s escalated immediately):https://issuetracker.google.com/savedsearches/559741
Now, if folks start using
@grpc/grpc-js@0.6.9
, and it becomes apparent that it is not in fact stable, I will take steps to move us back togrpc
immediately in@google-cloud/pubsub
(until such time that we are confident).I’d like to 👍 this going back to a P1. Providing a workaround isn’t an acceptable response. Indeed, we have implemented the workaround and rolled out to Prod to see that while it successfully mitigated the lost pub/sub connection, it also introduces a memory leak that requires periodic restart of our k8s pods regardless.
Nope. That didn’t work. 😭 still getting this.
Cant find anyting on creating PubSub with options includeing grpc? I use projectId, and tried to add grpc, but typescript rejects this.
https://googleapis.dev/nodejs/pubsub/1.1.1/PubSub.html
@bcoe
I did a quick google and landed here. Locally, I was able to replicate the same error message and behaviour (that is, no longer receiving messages after the error) by:
To make this a little faster to replicate, I set the streamingOptions timeout value in the pubsub options object to have a shorter streaming connection timeout:
One thing that might be useful for fixing this issue - if you look at my workaround, inside the initSubscriber function, I create a new PubSub() object. I found that if I didn’t do this inside the initSubscriber function, and instead did it just once when the app starts, the application just kept getting deadline exceeded messages over and over again, even once I’d reconnected to the internet.
That makes me thing there is some state being set in the PubSub object that puts it in a faulted state until it’s recreated. Sorry, I don’t have the time right now to dig into that assertion any further.
@google-cloud/pubsub@1.1.1 definitely not working for me. I regularly see
{"code":4,"details":"Failed to connect before the deadline","metadata":{"internalRepr":{},"options":{}}}
and then my entire app stops working. The subscribers stop processing messages.Is there an agreed on workaround in any of the above?
[EDIT]. I’m also seeing this error from time to time
{"ackIds":["KlgRTgQhIT4wPkVTRFAGFixdRkhRNxkIaFEOT14jPzUgKEUSASBuFSFCXhliaFxcdQdQC00geTQnYltFVQhCUnRfcysvV1tbdAVRDR56e2Z0aF8XCSr75KDd7KSXWUZgTbTgwcVHXbKv4JoiZh49WxJLLD5-MDxFQV5AEkw7CURJUytDCw"],"code":4,"details":"Deadline exceeded","metadata":{"internalRepr":{},"options":{}}}
Don’t know if it’s related though.
Errors started happening at
2019-10-09 17:35:28.252 BST
(Wed Oct 09 16:35:28 UTC)Hello, We hit the same problem here since 10/02. We tried upgrading to 0.32.1, and even to 1.1.0. Didn’t solve a thing. We are running in App Engine, so when one of the instances starts hitting the error, it snowballs and errors flow like crazy until the instance gets killed and another instance starts. Then, errors stop flowing for a bit.
Non of our services using pubsub is working anymore either. We are using version 1.1.0 Getting this:
And this:
We have to restart our services every 10 minutes because of that.
It also seems til it is storing more and more to disk as disk usage goes up over time.
Hey @bcoe, first time after upgrading googlecloud/pubsub to ^1.0.0. Any workaround to recreate subsciption after this error?
@npomfret I think this “@google-cloud/pubsub”: “0.29.1”, has done magic. I have had the same issue. Only 0.29.1 is the working version.
Okay, I grabbed the latest as you said - here’s the pubsub dependency tree from
npm ls
:I tried my test loop again - with this code:
So with that running, I pushed some messages, saw the the app was handling them, then disconnected wifi and waited for the deadline exceeded error. I reconnected wifi, pushed some more messages and waited for about 2 minutes. The app didn’t process the messages. After restarting the app, it immediately processed the messages.
So unfortunately, it doesn’t seem that this has fixed things, at least for my test loop of disconnecting the internet connection.
@Redgwell, could I bother you to test with
@grpc/grpc-js@0.6.8
, if you delete yournode_modules
andpackage-lock.json
, and runnpm install @google-cloud/pubsub
, you should get it now. (you can runnpm ls
post install to confirm).We believe we’ve potentially addressed the bug requiring you to resubscribe.
@gae123 there’s a chance we have a reproduction, we’re working towards testing on a few
@grpc/grpc-js
versions, so that we can better isolate where the breakage occurs. Will keep you updated.