nodejs-pubsub: All acks are expired after 30 min of activity with v0.23.0+
Environment details
- Hardware: on-prem 8-core compute node
- OS: RHEL 6.9
- Node.js version: 8.11.3
- npm version: 6.4.0
@google-cloud/pubsub
0.23.0-0.24.1:
Steps to reproduce
- Publish a moderate number of messages to a topic (~4k in our case)
- Use a single subscriber that processes each message within 1-2 min, e.g. with the following code:
const subscription = pubSub.subscription(SUBSCRIPTION_NAME, {
flowControl: {
maxMessages: 2,
allowExcessMessages: false,
},
});
subscription.on("message", ({ ack }) => {
setTimeout(ack, 60000 * 2 * Math.random());
});
subscription.on("error", console.error);
- After ~30 min of churning through messages, Stackdriver shows that:
-
num_undelivered_messages
stops going down -pull_ack_message_operation_count
marks every message ack asexpired
-
streaming_pull_message_operation_count
shows that 10-min pull interval stops.
As a net result, we can’t seem to ack any more messages and have to restart the subscriber.
We varied settings like flowControl.maxMessages
(1-4) and ackDeadline
(10, 300, 7200, 720000) and that didn’t seem to have an effect.
Downgrading to 0.22.2 seems to resolve this issue. We still get occasional expired
acks, but not in 100% of cases.
Thanks!
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 5
- Comments: 44 (15 by maintainers)
Sure, thanks!
@sduskis I feel like this is more of a band-aid than anything. From what I understand, the solution here is shutting down streaming connections more frequently than the default, is that correct? With long processing of messages as is the case here, the solution might just be to use the underlying pull API instead of the client library. I’m hesitant to expand the API in this way unless we think it is a good thing to do and do it across all languages.
@dinvlad @rossj With this PR #556 from @callmehiphop (not yet released) and using below option, subscriber works fine with slow ack processing. You can see in below charts there was a minor hiccup for around 2:30 to 2:35 but it recovered from that. hiccup was at around after 1 hour run time.
Options to use:
@callmehiphop I tried with
@grpc/grpc-js
and it seems to have the same issue. All acks gave expired result after ~40 min.In addition, after 60 min it seems that every modAck attempt failed with a message like:
Failed to "modifyAckDeadline" for X message(s). Reason: Getting metadata from plugin failed with error: New streams cannot be created after receiving a GOAWAY
I don’t remember ever seeing these errors with straight
grpc
.I’ll try the
polling
PR next.@rossj nope, actually we only read from the stream pool, all acks and modacks are sent as a plain requests. It sounds more like the grpc channel dies without giving any sort of indication as to why.