google-cloud-ruby: Google Pub/Sub 14:Deadline Exceeded and Google::Cloud::UnavailableError
Environment: GKE 1.10.2. Library version: 0.31.1
Gemfile.lock section:
google-cloud-core (1.2.2)
google-cloud-env (~> 1.0)
google-cloud-env (1.0.2)
faraday (~> 0.11)
google-cloud-pubsub (0.31.1)
concurrent-ruby (~> 1.0)
google-cloud-core (~> 1.2)
google-gax (~> 1.0)
grpc-google-iam-v1 (~> 0.6.9)
google-cloud-storage (1.12.0)
digest-crc (~> 0.4)
google-api-client (~> 0.19.0)
google-cloud-core (~> 1.2)
googleauth (~> 0.6.2)
google-gax (1.3.0)
google-protobuf (~> 3.2)
googleapis-common-protos (>= 1.3.5, < 2.0)
googleauth (~> 0.6.2)
grpc (>= 1.7.2, < 2.0)
rly (~> 0.2.3)
google-protobuf (3.6.1)
googleapis-common-protos (1.3.7)
google-protobuf (~> 3.0)
googleapis-common-protos-types (~> 1.0)
grpc (~> 1.0)
googleapis-common-protos-types (1.0.2)
google-protobuf (~> 3.0)
growl (1.0.3)
grpc (1.14.1)
google-protobuf (~> 3.1)
googleapis-common-protos-types (~> 1.0.0)
grpc-google-iam-v1 (0.6.9)
googleapis-common-protos (>= 1.3.1, < 2.0)
grpc (~> 1.0)
We are seeing a “bad state” that our pods get into. So far, this has only affected publishing. When a pod gets into a bad state, publishing of messages gives frequent errors that look like below stacktrace.
We have discovered that about 10 minutes before a pod goes into this bad state, we see two log messages like this:
#<Thread:0x00007f26bdb43a08 run> terminated with exception (report_on_exception is true):
Header value '�|&' has invalid characters (ArgumentError)
I’m not yet sure what on our end would be causing that ArgumentError.
I think this might be coming from https://github.com/grpc/grpc/blob/master/src/ruby/ext/grpc/rb_call.c#L436?
10 minutes later is when the first exception is raised. Sidekiq reports that it failed after a little over 10 minutes. The pod from that point forward continues to have the same issue. I can’t tell if every call has the issue, but the number of these exception raised is at least correlated with throughput.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 18 (10 by maintainers)
@blowmage we have not seen this error since upgrading to 0.32.0 about a week ago. We’re still on
publish
(notpublish_async
). Throughput is same / increased.I think that 0.32.0 did fix this.
If it’s helpful here is what upgrades happened during that gem upgrade:
Nothing else changed.
From our perspective, feel free to close.