google-cloud-python: Frequent gRPC StatusCode.UNAVAILABLE errors
Using the current codebase from master branch (e1fbb6b), with GRPC, we sometimes (0.5% of requests, approximately) see the following exception:
AbortionError(code=StatusCode.UNAVAILABLE, details="{"created":"@1478255129.468798425","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":157,"grpc_status":14,"referenced_errors":[{"created":"@1478255129.468756939","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]}"))
Retrying this seem to always succeed.
Should application code have to care about this kind of error and retry? Or is this a bug in google-cloud-pubsub code?
Package versions installed:
gapic-google-logging-v2==0.10.1
gapic-google-pubsub-v1==0.10.1
google-api-python-client==1.5.4
google-cloud==0.20.0
google-cloud-bigquery==0.20.0
google-cloud-bigtable==0.20.0
google-cloud-core==0.20.0
google-cloud-datastore==0.20.1
google-cloud-dns==0.20.0
google-cloud-error-reporting==0.20.0
google-cloud-language==0.20.0
google-cloud-logging==0.20.0
google-cloud-monitoring==0.20.0
google-cloud-pubsub==0.20.0
google-cloud-resource-manager==0.20.0
google-cloud-storage==0.20.0
google-cloud-translate==0.20.0
google-cloud-vision==0.20.0
google-gax==0.14.1
googleapis-common-protos==1.3.5
grpc-google-iam-v1==0.10.1
grpc-google-logging-v2==0.10.1
grpc-google-pubsub-v1==0.10.1
grpcio==1.0.0
Note: Everything google-cloud* comes from git master.
This is on Python 2.7.3
Traceback:
File "ospdatasubmit/pubsub.py", line 308, in _flush
publish_response = self.pubsub_client.Publish(publish_request, self._publish_timeout)
File "grpc/beta/_client_adaptations.py", line 305, in __call__
self._request_serializer, self._response_deserializer)
File "grpc/beta/_client_adaptations.py", line 203, in _blocking_unary_unary
raise _abortion_error(rpc_error_call)
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 61 (29 by maintainers)
Hi, i’m getting this error on PubSub consumer. I manage to get a “not so pretty” workaround.
using a policy like this that replicates code for deadline_exceeded on google.cloud.pubsub_v1.subscriber.policy.thread.Policy.on_exception.
On receive message function i have a code like
Problem is that when the resource it is trully UNAVAILABLE we will be not aware.
UPDATE: As noted here by @makrusak and here by @rclough. This hack cause high CPU usage leaving your consumer practically useless (available intermittently). So basically this changes one problem for another, your consumer does not die, but you will have to restart the worker that executes it often.
I really think my problem is related to this, we have a node.js connecting to a python server using gRpc and we frequently receive this:
Sometimes, the same request on the same server works without any problem.
Upgraded my stack to google-cloud-pubsub==0.22.0. Error is still present, traceback/error message is slightly different. Here’s a fresh one:
Some package versions:
Timestamp in UTC if some googler wants to look on the other side. Let me know if there’s something I can add to my logs to aid in debugging.
In most cases, an immediate retry will fix the problem. Sometimes we have to retry 2 or 3 times (we give up after 3 times and drop the message).
I confirm the intermittent errors when working with bigtable API
python 3.5.2 google-cloud==0.23.0
Also - the sample has since been updated to use the google-auth package, which should also fix that issue.
I think with all the work that @dhermes did on pubsub this should be resolved. I’m going to go ahead and close this, but if it’s still reproducible with the latest version we can re-open.
@barroca I had the same problem. In my case, If my Node.js run for a while without any request, this error will occur and making request again will get normal response.
Really need help
{ Error: {"created":"@1490271131.819044969","description":"Endpoint read failed","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1490271131.819031343","description":"Socket closed","fd":16,"file":"../src/core/lib/iomgr/tcp_posix.c","file_line":249,"target_address":"ipv4:172.16.250.137:8980"}]} 2017-03-23T12:12:11.842630309Z at /usr/local/wongnai/node_modules/grpc/src/node/src/client.js:434:17 2017-03-23T12:12:11.842638153Z cause: 2017-03-23T12:12:11.842643446Z { Error: {"created":"@1490271131.819044969","description":"Endpoint read failed","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1490271131.819031343","description":"Socket closed","fd":16,"file":"../src/core/lib/iomgr/tcp_posix.c","file_line":249,"target_address":"ipv4:172.16.250.137:8980"}]} 2017-03-23T12:12:11.842652712Z at /usr/local/wongnai/node_modules/grpc/src/node/src/client.js:434:17 code: 14, metadata: Metadata { _internal_repr: {} } }, isOperational: true, code: 14, metadata: Metadata { _internal_repr: {} } }Also seeing this issue. Retrying within in our own code seems to workaround the issue. we also only retry a max of 3 times. Usually second try fixes it.
We were on 0.18 and just upped to 0.23. We run python 3.6