google-cloud-python: Frequent gRPC StatusCode.UNAVAILABLE errors

Using the current codebase from master branch (e1fbb6b), with GRPC, we sometimes (0.5% of requests, approximately) see the following exception:

 AbortionError(code=StatusCode.UNAVAILABLE, details="{"created":"@1478255129.468798425","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":157,"grpc_status":14,"referenced_errors":[{"created":"@1478255129.468756939","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]}"))

Retrying this seem to always succeed.

Should application code have to care about this kind of error and retry? Or is this a bug in google-cloud-pubsub code?

Package versions installed:

gapic-google-logging-v2==0.10.1
gapic-google-pubsub-v1==0.10.1
google-api-python-client==1.5.4
google-cloud==0.20.0
google-cloud-bigquery==0.20.0
google-cloud-bigtable==0.20.0
google-cloud-core==0.20.0
google-cloud-datastore==0.20.1
google-cloud-dns==0.20.0
google-cloud-error-reporting==0.20.0
google-cloud-language==0.20.0
google-cloud-logging==0.20.0
google-cloud-monitoring==0.20.0
google-cloud-pubsub==0.20.0
google-cloud-resource-manager==0.20.0
google-cloud-storage==0.20.0
google-cloud-translate==0.20.0
google-cloud-vision==0.20.0
google-gax==0.14.1
googleapis-common-protos==1.3.5
grpc-google-iam-v1==0.10.1
grpc-google-logging-v2==0.10.1
grpc-google-pubsub-v1==0.10.1
grpcio==1.0.0

Note: Everything google-cloud* comes from git master.

This is on Python 2.7.3

Traceback:

  File "ospdatasubmit/pubsub.py", line 308, in _flush
    publish_response = self.pubsub_client.Publish(publish_request, self._publish_timeout)
  File "grpc/beta/_client_adaptations.py", line 305, in __call__
    self._request_serializer, self._response_deserializer)
  File "grpc/beta/_client_adaptations.py", line 203, in _blocking_unary_unary
    raise _abortion_error(rpc_error_call)

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 61 (29 by maintainers)

Most upvoted comments

Hi, i’m getting this error on PubSub consumer. I manage to get a “not so pretty” workaround.

using a policy like this that replicates code for deadline_exceeded on google.cloud.pubsub_v1.subscriber.policy.thread.Policy.on_exception.

from google.cloud.pubsub_v1.subscriber.policy.thread import Policy
import grpc

class UnavailableHackPolicy(Policy):
    def on_exception(self, exception):
        """
        There is issue on grpc channel that launch an UNAVAILABLE exception now and then. Until
        that issue is fixed we need to protect our consumer thread from broke.
        https://github.com/GoogleCloudPlatform/google-cloud-python/issues/2683
        """
        unavailable = grpc.StatusCode.UNAVAILABLE
        if getattr(exception, 'code', lambda: None)() in [unavailable]:
            print("¡OrbitalHack! - {}".format(exception))
            return
        return super(UnavailableHackPolicy, self).on_exception(exception)

On receive message function i have a code like

subscriber = pubsub.SubscriberClient(policy_class=UnavailableHackPolicy)
subscription_path = subscriber.subscription_path(project, subscription_name)
subscriber.subscribe(subscription_path, callback=callback, flow_control=flow_control)

Problem is that when the resource it is trully UNAVAILABLE we will be not aware.

UPDATE: As noted here by @makrusak and here by @rclough. This hack cause high CPU usage leaving your consumer practically useless (available intermittently). So basically this changes one problem for another, your consumer does not die, but you will have to restart the worker that executes it often.

I really think my problem is related to this, we have a node.js connecting to a python server using gRpc and we frequently receive this:

 Error: {"created":"@1489090385.752311821","description":"Endpoint read failed","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1489090385.752305292","description":"TCP Read failed","file":"../src/core/lib/iomgr/tcp_uv.c","file_line":170}]} { Error: {"created":"@1489090385.752311821","description":"Endpoint read failed","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1489090385.752305292","description":"TCP Read failed","file":"../src/core/lib/iomgr/tcp_uv.c","file_line":170}]}
    at /app/node_modules/grpc/src/node/src/client.js:434:17
  cause: 
   { Error: {"created":"@1489090385.752311821","description":"Endpoint read failed","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1489090385.752305292","description":"TCP Read failed","file":"../src/core/lib/iomgr/tcp_uv.c","file_line":170}]}
       at /app/node_modules/grpc/src/node/src/client.js:434:17 code: 14, metadata: Metadata { _internal_repr: {} } },
  isOperational: true,
  code: 14,
  metadata: Metadata { _internal_repr: {} } }

Sometimes, the same request on the same server works without any problem.

Upgraded my stack to google-cloud-pubsub==0.22.0. Error is still present, traceback/error message is slightly different. Here’s a fresh one:

ERROR 2017-02-22 08:16:31,484
Traceback (most recent call last):
  File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/ospgcptools/pubsub/__init__.py", line 327, in flush
    self.pubsub_topic.publish(data)
  File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/cloud/pubsub/topic.py", line 253, in publish
    message_ids = api.topic_publish(self.full_name, [message_data])
  File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/cloud/pubsub/_gax.py", line 173, in topic_publish
    options=options)
  File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/cloud/gapic/pubsub/v1/publisher_client.py", line 290, in publish
    return self._publish(request, options)
  File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/gax/api_callable.py", line 442, in inner
    return api_caller(api_call, this_settings, request)
  File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/gax/api_callable.py", line 70, in inner
    return a_func(request, **kwargs)
  File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/gax/api_callable.py", line 395, in inner
    gax.errors.create_error('RPC failed', cause=exception))
  File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/gax/api_callable.py", line 391, in inner
    return a_func(*args, **kwargs)
  File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/gax/retry.py", line 67, in inner
    return a_func(*updated_args, **kwargs)
  File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/grpc/_channel.py", line 511, in __call__
    return _end_unary_response_blocking(state, False, deadline)
  File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/grpc/_channel.py", line 459, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
GaxError: GaxError(RPC failed, caused by <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, {"created":"@1487751391.483882744","description":"Endpoint read failed","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1487751391.483832140","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":166,"referenced_errors":[{"created":"@1487751391.483828339","description":"Socket closed","fd":67,"file":"src/core/lib/iomgr/tcp_posix.c","file_line":249,"target_address":"ipv6:[2a00:1450:400e:807::200a]:443"}]}]})>)

Some package versions:

pip freeze|egrep 'grpc|pubsub|google-cloud-core|grep protobuf'
gapic-google-cloud-pubsub-v1==0.14.1
google-cloud-core==0.22.1
google-cloud-pubsub==0.22.0
grpc-google-cloud-pubsub-v1==0.14.0
grpc-google-iam-v1==0.11.1
grpc-google-pubsub-v1==0.10.1
grpcio==1.1.0

Timestamp in UTC if some googler wants to look on the other side. Let me know if there’s something I can add to my logs to aid in debugging.

In most cases, an immediate retry will fix the problem. Sometimes we have to retry 2 or 3 times (we give up after 3 times and drop the message).

I confirm the intermittent errors when working with bigtable API

image

python 3.5.2 google-cloud==0.23.0

(storm) ➜  storm git:(develop) ✗ vi requirements/base.txt
(storm) ➜  storm git:(develop) ✗ pip freeze G google             
21:gapic-google-cloud-datastore-v1==0.15.3
22:gapic-google-cloud-error-reporting-v1beta1==0.15.3
23:gapic-google-cloud-logging-v2==0.91.3
24:gapic-google-cloud-pubsub-v1==0.15.3
25:gapic-google-cloud-spanner-admin-database-v1==0.15.3
26:gapic-google-cloud-spanner-admin-instance-v1==0.15.3
27:gapic-google-cloud-spanner-v1==0.15.3
28:gapic-google-cloud-speech-v1beta1==0.15.3
29:gapic-google-cloud-vision-v1==0.90.3
31:google-auth==0.10.0
32:google-auth-httplib2==0.0.2
33:google-cloud==0.23.0
34:google-cloud-bigquery==0.23.0
35:google-cloud-bigtable==0.23.1
36:google-cloud-core==0.23.1
37:google-cloud-datastore==0.23.0
38:google-cloud-dns==0.23.0
39:google-cloud-error-reporting==0.23.2
40:google-cloud-language==0.23.1
41:google-cloud-logging==0.23.1
42:google-cloud-monitoring==0.23.0
43:google-cloud-pubsub==0.23.0
44:google-cloud-resource-manager==0.23.0
45:google-cloud-runtimeconfig==0.23.0
46:google-cloud-spanner==0.23.1
47:google-cloud-speech==0.23.0
48:google-cloud-storage==0.23.1
49:google-cloud-translate==0.23.0
50:google-cloud-vision==0.23.3
51:google-gax==0.15.8
52:googleapis-common-protos==1.5.2
55:grpc-google-iam-v1==0.11.1
80:proto-google-cloud-datastore-v1==0.90.3
81:proto-google-cloud-error-reporting-v1beta1==0.15.3
82:proto-google-cloud-logging-v2==0.91.3
83:proto-google-cloud-pubsub-v1==0.15.3
84:proto-google-cloud-spanner-admin-database-v1==0.15.3
85:proto-google-cloud-spanner-admin-instance-v1==0.15.3
86:proto-google-cloud-spanner-v1==0.15.3
87:proto-google-cloud-speech-v1beta1==0.15.3
88:proto-google-cloud-vision-v1==0.90.3
(storm) ➜  storm git:(develop) ✗ pip freeze G grpc  
55:grpc-google-iam-v1==0.11.1
56:grpcio==1.2.1
(storm) ➜  storm git:(develop) ✗ 

Also - the sample has since been updated to use the google-auth package, which should also fix that issue.

I think with all the work that @dhermes did on pubsub this should be resolved. I’m going to go ahead and close this, but if it’s still reproducible with the latest version we can re-open.

@barroca I had the same problem. In my case, If my Node.js run for a while without any request, this error will occur and making request again will get normal response.

Really need help

{ Error: {"created":"@1490271131.819044969","description":"Endpoint read failed","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1490271131.819031343","description":"Socket closed","fd":16,"file":"../src/core/lib/iomgr/tcp_posix.c","file_line":249,"target_address":"ipv4:172.16.250.137:8980"}]} 2017-03-23T12:12:11.842630309Z at /usr/local/wongnai/node_modules/grpc/src/node/src/client.js:434:17 2017-03-23T12:12:11.842638153Z cause: 2017-03-23T12:12:11.842643446Z { Error: {"created":"@1490271131.819044969","description":"Endpoint read failed","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1490271131.819031343","description":"Socket closed","fd":16,"file":"../src/core/lib/iomgr/tcp_posix.c","file_line":249,"target_address":"ipv4:172.16.250.137:8980"}]} 2017-03-23T12:12:11.842652712Z at /usr/local/wongnai/node_modules/grpc/src/node/src/client.js:434:17 code: 14, metadata: Metadata { _internal_repr: {} } }, isOperational: true, code: 14, metadata: Metadata { _internal_repr: {} } }

Also seeing this issue. Retrying within in our own code seems to workaround the issue. we also only retry a max of 3 times. Usually second try fixes it.

We were on 0.18 and just upped to 0.23. We run python 3.6

$ pip freeze|egrep 'grpc|pubsub|google-cloud-core|grep protobuf'
gapic-google-cloud-pubsub-v1==0.15.0
google-cloud-core==0.23.1
google-cloud-pubsub==0.23.0
grpc-google-cloud-logging-v2==0.90.0
grpc-google-iam-v1==0.11.1
grpcio==1.1.3
proto-google-cloud-pubsub-v1==0.15.1
Traceback (most recent call last):
  File "/opt/app/psutils.py", line 129, in _pub_topic
    return topic.publish(bytes(json.dumps(data), 'utf-8'))
  File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/cloud/pubsub/topic.py", line 255, in publish
    message_ids = api.topic_publish(self.full_name, [message_data])
  File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/cloud/pubsub/_gax.py", line 174, in topic_publish
    options=options)
  File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/cloud/gapic/pubsub/v1/publisher_client.py", line 320, in publish
    return self._publish(request, options)
  File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/gax/api_callable.py", line 419, in inner
    return api_caller(api_call, this_settings, request)
  File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/gax/api_callable.py", line 67, in inner
    return a_func(request, **kwargs)
  File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/gax/api_callable.py", line 372, in inner
    gax.errors.create_error('RPC failed', cause=exception))
  File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/future/utils/__init__.py", line 419, in raise_with_traceback
    raise exc.with_traceback(traceback)
  File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/gax/api_callable.py", line 368, in inner
    return a_func(*args, **kwargs)
  File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/gax/retry.py", line 68, in inner
    return a_func(*updated_args, **kwargs)
  File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/grpc/_channel.py", line 507, in __call__
    return _end_unary_response_blocking(state, False, deadline)
  File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/grpc/_channel.py", line 455, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
google.gax.errors.GaxError: GaxError(RPC failed, caused by <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, {"created":"@1488959268.808794184","description":"Endpoint read failed","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1488959268.808696874","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":166,"referenced_errors":[{"created":"@1488959268.808693234","description":"Socket closed","fd":55,"file":"src/core/lib/iomgr/tcp_posix.c","file_line":249,"target_address":"ipv4:173.194.74.95:443"}]}]})>)