cloud-custodian: Make aws connection timeout configurable

Getting the error below but looks like the boto connection timeout isn’t configurable. Ideas welcome. I can probably contribute a PR.

2019-12-17 17:37:02,204: custodian.output:ERROR Error while executing policy
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 416, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.7/http/client.py", line 1344, in getresponse
    response.begin()
  File "/usr/local/lib/python3.7/http/client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.7/http/client.py", line 267, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/botocore/httpsession.py", line 263, in send
    chunked=self._chunked(request.headers),
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 376, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 423, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 331, in _raise_timeout
    self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: AWSHTTPSConnectionPool(host='ec2.us-east-1.amazonaws.com', port=443): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/c7n/policy.py", line 288, in run
    resources = self.policy.resource_manager.resources()
  File "/usr/local/lib/python3.7/site-packages/c7n/resources/ec2.py", line 93, in resources
    return super(EC2, self).resources(query=query)
  File "/usr/local/lib/python3.7/site-packages/c7n/query.py", line 453, in resources
    resources = self.source.resources(query)
  File "/usr/local/lib/python3.7/site-packages/c7n/query.py", line 230, in resources
    return self.query.filter(self.manager, **query)
  File "/usr/local/lib/python3.7/site-packages/c7n/query.py", line 90, in filter
    getattr(resource_manager, 'retry', None)) or []
  File "/usr/local/lib/python3.7/site-packages/c7n/query.py", line 69, in _invoke_client_enum
    data = results.build_full_result()
  File "/usr/local/lib/python3.7/site-packages/botocore/paginate.py", line 449, in build_full_result
    for response in self:
  File "/usr/local/lib/python3.7/site-packages/botocore/paginate.py", line 255, in __iter__
    response = self._make_request(current_kwargs)
  File "/usr/local/lib/python3.7/site-packages/c7n/query.py", line 679, in _make_request
    return self.retry(self._method, **current_kwargs)
  File "/usr/local/lib/python3.7/site-packages/c7n/utils.py", line 390, in _retry
    return func(*args, **kw)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 272, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 563, in _make_api_call
    operation_model, request_dict, request_context)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 582, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 137, in _send_request
    success_response, exception):
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 231, in _needs_retry
    caught_exception=caught_exception, request_dict=request_dict)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 183, in __call__
    if self._checker(attempts, response, caught_exception):
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 251, in __call__
    caught_exception)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 277, in _should_retry
    return self._checker(attempt_number, response, caught_exception)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 317, in __call__
    caught_exception)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 223, in __call__
    attempt_number, caught_exception)
  File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
    raise caught_exception
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 200, in _do_get_response
    http_response = self._send(request)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 244, in _send
    return self.http_session.send(request)
  File "/usr/local/lib/python3.7/site-packages/botocore/httpsession.py", line 289, in send
    raise ReadTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "https://ec2.us-east-1.amazonaws.com/"

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 18 (2 by maintainers)

Most upvoted comments

I was digging into this, read connection time out is subject to default retry behavior in boto. the issue is that current defaults to the legacy mode, and five retry attempts. switching to standard or adaptive modes and raising the limit on max retry attempts will address a wide set of issues.

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html#defining-a-retry-configuration-in-your-aws-configuration-file

I’ll look at exposing this stuff via environment variables for custodian. there’s some duplication on retry logic, where custodian wraps some of the builtin behavior, in digging through the new botcher adaptive mode it actually looks quite nice, and its perhaps worthwhile removing some of custodian’s retry wrapping and using the builtin capabilities here.

I commented on the upstream issue, I think boto sdk should fix this upstream per comment there.

@sean-zou can you clarify what network conditions cause such repeated timeouts? I never seen this very reproducibly outside of bad network conditions or very long paths or broken intermediaries. is it a particular inter-region connection over slow network, its there a proxy? what’s example policy that exhibits, etc. also what os distribution and version are you using? there are tweaks here that can be made at a lower level re os config, ie play around with the ipv4 tcp keep alive timeouts.

Hey John, thank you for replying.

I did try writing the following to ~/.boto

[Boto]
debug = 2
read_timeout = 120
http_socket_timeout = 120

I also tried by adding same snippet in ~/.aws/config and ~/.aws/credentials

I looked at the docs, and seems that read_timeout setting is not supported on config files, neither http_socket_timeout governs read_timeout.

It doesn’t even take the debug setting… Maybe I’m doing something wrong…

Anyway, as a workaround, we took the dirty path of hard-writing DEFAULT_TIMEOUT = 240 into our local installation of: https://github.com/boto/botocore/blob/develop/botocore/endpoint.py#L34