cloud-custodian: Make aws connection timeout configurable
Getting the error below but looks like the boto connection timeout isn’t configurable. Ideas welcome. I can probably contribute a PR.
2019-12-17 17:37:02,204: custodian.output:ERROR Error while executing policy
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 416, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.7/http/client.py", line 1344, in getresponse
response.begin()
File "/usr/local/lib/python3.7/http/client.py", line 306, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.7/http/client.py", line 267, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/usr/local/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/usr/local/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/botocore/httpsession.py", line 263, in send
chunked=self._chunked(request.headers),
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 376, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
raise value
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 423, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 331, in _raise_timeout
self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: AWSHTTPSConnectionPool(host='ec2.us-east-1.amazonaws.com', port=443): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/c7n/policy.py", line 288, in run
resources = self.policy.resource_manager.resources()
File "/usr/local/lib/python3.7/site-packages/c7n/resources/ec2.py", line 93, in resources
return super(EC2, self).resources(query=query)
File "/usr/local/lib/python3.7/site-packages/c7n/query.py", line 453, in resources
resources = self.source.resources(query)
File "/usr/local/lib/python3.7/site-packages/c7n/query.py", line 230, in resources
return self.query.filter(self.manager, **query)
File "/usr/local/lib/python3.7/site-packages/c7n/query.py", line 90, in filter
getattr(resource_manager, 'retry', None)) or []
File "/usr/local/lib/python3.7/site-packages/c7n/query.py", line 69, in _invoke_client_enum
data = results.build_full_result()
File "/usr/local/lib/python3.7/site-packages/botocore/paginate.py", line 449, in build_full_result
for response in self:
File "/usr/local/lib/python3.7/site-packages/botocore/paginate.py", line 255, in __iter__
response = self._make_request(current_kwargs)
File "/usr/local/lib/python3.7/site-packages/c7n/query.py", line 679, in _make_request
return self.retry(self._method, **current_kwargs)
File "/usr/local/lib/python3.7/site-packages/c7n/utils.py", line 390, in _retry
return func(*args, **kw)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 272, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 563, in _make_api_call
operation_model, request_dict, request_context)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 582, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 137, in _send_request
success_response, exception):
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 231, in _needs_retry
caught_exception=caught_exception, request_dict=request_dict)
File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 183, in __call__
if self._checker(attempts, response, caught_exception):
File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 251, in __call__
caught_exception)
File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 277, in _should_retry
return self._checker(attempt_number, response, caught_exception)
File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 317, in __call__
caught_exception)
File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 223, in __call__
attempt_number, caught_exception)
File "/usr/local/lib/python3.7/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
raise caught_exception
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 244, in _send
return self.http_session.send(request)
File "/usr/local/lib/python3.7/site-packages/botocore/httpsession.py", line 289, in send
raise ReadTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "https://ec2.us-east-1.amazonaws.com/"
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 18 (2 by maintainers)
I was digging into this, read connection time out is subject to default retry behavior in boto. the issue is that current defaults to the legacy mode, and five retry attempts. switching to standard or adaptive modes and raising the limit on max retry attempts will address a wide set of issues.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html#defining-a-retry-configuration-in-your-aws-configuration-file
I’ll look at exposing this stuff via environment variables for custodian. there’s some duplication on retry logic, where custodian wraps some of the builtin behavior, in digging through the new botcher adaptive mode it actually looks quite nice, and its perhaps worthwhile removing some of custodian’s retry wrapping and using the builtin capabilities here.
I commented on the upstream issue, I think boto sdk should fix this upstream per comment there.
@sean-zou can you clarify what network conditions cause such repeated timeouts? I never seen this very reproducibly outside of bad network conditions or very long paths or broken intermediaries. is it a particular inter-region connection over slow network, its there a proxy? what’s example policy that exhibits, etc. also what os distribution and version are you using? there are tweaks here that can be made at a lower level re os config, ie play around with the ipv4 tcp keep alive timeouts.
Hey John, thank you for replying.
I did try writing the following to
~/.botoI also tried by adding same snippet in
~/.aws/configand~/.aws/credentialsI looked at the docs, and seems that
read_timeoutsetting is not supported on config files, neitherhttp_socket_timeoutgoverns read_timeout.It doesn’t even take the
debugsetting… Maybe I’m doing something wrong…Anyway, as a workaround, we took the dirty path of hard-writing
DEFAULT_TIMEOUT = 240into our local installation of: https://github.com/boto/botocore/blob/develop/botocore/endpoint.py#L34