google-cloud-ruby: Google Pub/Sub 14:Deadline Exceeded and Google::Cloud::UnavailableError

Environment: GKE 1.10.2. Library version: 0.31.1

Gemfile.lock section:

    google-cloud-core (1.2.2)
      google-cloud-env (~> 1.0)
    google-cloud-env (1.0.2)
      faraday (~> 0.11)
    google-cloud-pubsub (0.31.1)
      concurrent-ruby (~> 1.0)
      google-cloud-core (~> 1.2)
      google-gax (~> 1.0)
      grpc-google-iam-v1 (~> 0.6.9)
    google-cloud-storage (1.12.0)
      digest-crc (~> 0.4)
      google-api-client (~> 0.19.0)
      google-cloud-core (~> 1.2)
      googleauth (~> 0.6.2)
    google-gax (1.3.0)
      google-protobuf (~> 3.2)
      googleapis-common-protos (>= 1.3.5, < 2.0)
      googleauth (~> 0.6.2)
      grpc (>= 1.7.2, < 2.0)
      rly (~> 0.2.3)
    google-protobuf (3.6.1)
    googleapis-common-protos (1.3.7)
      google-protobuf (~> 3.0)
      googleapis-common-protos-types (~> 1.0)
      grpc (~> 1.0)
    googleapis-common-protos-types (1.0.2)
      google-protobuf (~> 3.0)
      growl (1.0.3)
    grpc (1.14.1)
      google-protobuf (~> 3.1)
      googleapis-common-protos-types (~> 1.0.0)
    grpc-google-iam-v1 (0.6.9)
      googleapis-common-protos (>= 1.3.1, < 2.0)
      grpc (~> 1.0)

We are seeing a “bad state” that our pods get into. So far, this has only affected publishing. When a pod gets into a bad state, publishing of messages gives frequent errors that look like below stacktrace.

We have discovered that about 10 minutes before a pod goes into this bad state, we see two log messages like this:

#<Thread:0x00007f26bdb43a08 run> terminated with exception (report_on_exception is true):
Header value '�|&' has invalid characters (ArgumentError)

I’m not yet sure what on our end would be causing that ArgumentError.

I think this might be coming from https://github.com/grpc/grpc/blob/master/src/ruby/ext/grpc/rb_call.c#L436?

10 minutes later is when the first exception is raised. Sidekiq reports that it failed after a little over 10 minutes. The pod from that point forward continues to have the same issue. I can’t tell if every call has the issue, but the number of these exception raised is at least correlated with throughput.

stacktrace.txt

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 18 (10 by maintainers)

Most upvoted comments

@blowmage we have not seen this error since upgrading to 0.32.0 about a week ago. We’re still on publish (not publish_async). Throughput is same / increased.

I think that 0.32.0 did fix this.

If it’s helpful here is what upgrades happened during that gem upgrade:

  • google-cloud-pubsub 0.31.1 -> 0.32.0
  • google-gax 1.0 -> 1.3
  • googleauth 0.6.4 -> 0.6.5
  • signet 0.8.1 -> 0.9.0

Nothing else changed.

From our perspective, feel free to close.