terraform-provider-aws: Intermittent network issues (read: connection reset Errors)
Terraform Version
Terraform v0.12.23
We’re running a drift detection workflow using github hosted github actions, which simply runs terraform plan and fails if it outputs anything. This runs on a schedule every hour. We’re getting request errors, causing terraform plan to fail, around 2-3 times a day
Some of the request errors we’ve so far encountered:
Error: RequestError: send request failed
caused by: Get https://cloudfront.amazonaws.com/2019-03-26/origin-access-identity/cloudfront/E26H********: read tcp 10.1.0.4:52046->54.239.29.26:443: read: connection reset by peer
Error: RequestError: send request failed
caused by: Get https://cloudfront.amazonaws.com/2019-03-26/origin-access-identity/cloudfront/E1B3D********: read tcp 10.1.0.4:33408->54.239.29.51:443: read: connection reset by peer
Error: error listing tags for CloudFront Distribution (E24R********): RequestError: send request failed
caused by: Get https://cloudfront.amazonaws.com/2019-03-26/tagging?Resource=arn%3Aaws%3Acloudfront%3A%3A*********%3Adistribution%2FE24********: read tcp 10.1.0.4:56918->54.239.29.65:443: read: connection reset by peer
Error: error getting S3 Bucket website configuration: RequestError: send request failed
caused by: Get https://******.s3.amazonaws.com/?website=: read tcp 10.1.0.4:59070->52.216.20.56:443: read: connection reset by peer
Error: error getting S3 Bucket replication: RequestError: send request failed
caused by: Get https://*******.s3.amazonaws.com/?replication=: read tcp 10.1.0.4:60534->52.216.138.67:443: read: connection reset by peer
Most of these seem to be CloudFront and S3
Thanks
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 58
- Comments: 38 (12 by maintainers)
I started seeing these today.
https://status.aws.amazon.com/
There is definitely a problem on the AWS side If you go to the cloudfront console and hit refresh a few times, you’re now very likely to encounter this
We haven’t had this happen for more than a week now Could be fixed on aws side?
Hi again 👋 Since it appears that this was handled on the AWS side (both in this issue and lack of Terraform support tickets), our preference will be to leave things as they are for now. If this comes up again, especially since CloudFront seems to very prominently have this issue when it occurs, we can definitely think more about this network connection handling. 👍
for what it’s worth - every time I’ve seen this issue it’s been on read calls to either Cloudfront distribution configs or Cloudfront origin access identities.
It may not be the best way to approach solving the issue, but given that the majority of the connection reset issues seem to be with specific Cloudfront read calls + a few others, it might be worth just adding retries to individual API calls (Cloudfront or otherwise) as they become problematic?
Seeing this same issue in Terraform Cloud, specifically with the cloudfront_distribution and cloudfront_origin_access_identity resources - it’s happening almost daily at this point.
We have a support ticket request open with aws for both this issue and https://github.com/terraform-providers/terraform-provider-aws/issues/14797 - especially in the latter case it would greatly help if
TRACEwould show complete requests + responses for us/aws to understand what is going on.Or maybe even something separate like
HTTP_TRACEthat only shows requests + responses, which in most cases is the more interesting part when debugging these type of issues.We are experiencing this issue on our Jenkins hosted on EC2 - we run multiple nodes behind a natgw (so shared IP for outgoing connections).
Error:
Terraform 0.12.28, AWS provider 2.70.0
We’re experiencing the issue also in Terraform Cloud using
v0.12.28&0.12.29and the AWS provider pinned to~> 2.0.seeing this on
v0.12.29as wellGPG-encrypted logs available at https://gist.github.com/mattburgess/2a00b1e77b00368781360ac8581383b9
analytical-dataset-generation_analytical-dataset-generation-qa_154.log.gpg - this one failed after seeing a single
connection reset by peererror; no retries were attempted.analytical-dataset-generation_analytical-dataset-generation-preprod_136.log.gpg - this one hung/paused/waited for 15 minutes having seen a
connection reset by peererror, then retried and succeeded on its first retry.Having the same issues in our CI/CD pipeline