terraform-provider-aws: Intermittent network issues (read: connection reset Errors)

Terraform Version

Terraform v0.12.23

We’re running a drift detection workflow using github hosted github actions, which simply runs terraform plan and fails if it outputs anything. This runs on a schedule every hour. We’re getting request errors, causing terraform plan to fail, around 2-3 times a day

Some of the request errors we’ve so far encountered:

Error: RequestError: send request failed
caused by: Get https://cloudfront.amazonaws.com/2019-03-26/origin-access-identity/cloudfront/E26H********: read tcp 10.1.0.4:52046->54.239.29.26:443: read: connection reset by peer

Error: RequestError: send request failed
caused by: Get https://cloudfront.amazonaws.com/2019-03-26/origin-access-identity/cloudfront/E1B3D********: read tcp 10.1.0.4:33408->54.239.29.51:443: read: connection reset by peer

Error: error listing tags for CloudFront Distribution (E24R********): RequestError: send request failed
caused by: Get https://cloudfront.amazonaws.com/2019-03-26/tagging?Resource=arn%3Aaws%3Acloudfront%3A%3A*********%3Adistribution%2FE24********: read tcp 10.1.0.4:56918->54.239.29.65:443: read: connection reset by peer

Error: error getting S3 Bucket website configuration: RequestError: send request failed
caused by: Get https://******.s3.amazonaws.com/?website=: read tcp 10.1.0.4:59070->52.216.20.56:443: read: connection reset by peer

Error: error getting S3 Bucket replication: RequestError: send request failed
caused by: Get https://*******.s3.amazonaws.com/?replication=: read tcp 10.1.0.4:60534->52.216.138.67:443: read: connection reset by peer

Most of these seem to be CloudFront and S3

Thanks

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 58
Comments: 38 (12 by maintainers)

Most upvoted comments

I started seeing these today.

Error: Error reading IAM policy version arn:aws:iam::XXXX:policy/OktaChildAccountPolicy: RequestError: send request failed
caused by: Post https://iam.amazonaws.com/: read tcp 192.168.1.216:52180->52.94.225.3:443: read: connection reset by peer



Error: RequestError: send request failed
caused by: Post https://iam.amazonaws.com/: read tcp 192.168.1.216:52172->52.94.225.3:443: read: connection reset by peer



Error: Error reading IAM Role Okta-Idp-cross-account-role: RequestError: send request failed
caused by: Post https://iam.amazonaws.com/: read tcp 192.168.1.216:52171->52.94.225.3:443: read: connection reset by peer

+12

tibbon on Oct 1, 2020

https://status.aws.amazon.com/

1:50 PM PDT We are investigating increased error rates and latencies affecting IAM. IAM related requests to other AWS services may also be impacted.

claco on Oct 1, 2020

There is definitely a problem on the AWS side If you go to the cloudfront console and hit refresh a few times, you’re now very likely to encounter this

lijok on Aug 27, 2020

We haven’t had this happen for more than a week now Could be fixed on aws side?

lijok on Sep 11, 2020

Hi again 👋 Since it appears that this was handled on the AWS side (both in this issue and lack of Terraform support tickets), our preference will be to leave things as they are for now. If this comes up again, especially since CloudFront seems to very prominently have this issue when it occurs, we can definitely think more about this network connection handling. 👍

bflad on Sep 14, 2020

for what it’s worth - every time I’ve seen this issue it’s been on read calls to either Cloudfront distribution configs or Cloudfront origin access identities.

It may not be the best way to approach solving the issue, but given that the majority of the connection reset issues seem to be with specific Cloudfront read calls + a few others, it might be worth just adding retries to individual API calls (Cloudfront or otherwise) as they become problematic?

acburdine on Aug 26, 2020

Seeing this same issue in Terraform Cloud, specifically with the cloudfront_distribution and cloudfront_origin_access_identity resources - it’s happening almost daily at this point.

acburdine on Jul 30, 2020

We have a support ticket request open with aws for both this issue and https://github.com/terraform-providers/terraform-provider-aws/issues/14797 - especially in the latter case it would greatly help if TRACE would show complete requests + responses for us/aws to understand what is going on.

Or maybe even something separate like HTTP_TRACE that only shows requests + responses, which in most cases is the more interesting part when debugging these type of issues.

We are experiencing this issue on our Jenkins hosted on EC2 - we run multiple nodes behind a natgw (so shared IP for outgoing connections).

lifeofguenter on Aug 26, 2020

Error:

Error: error waiting until CloudFront Distribution (XXXXX) is deployed: RequestError: send request failed
caused by: Get https://cloudfront.amazonaws.com/2019-03-26/distribution/XXXXX: read tcp 10.x.x.x:35832->54.x.x.x:443: read: connection reset by peer

Question	Answer
Terraform Resource	aws_cloudfront_distribution
Terraform Operation	Read
AWS Service	CloudFront
API Call	GetDistribution
Terraform Environment	AWS VPC (EC2, Concourse CI)
Terraform Concurrency	10 (default)
Known HTTP Proxy	No
How Many Resources	1 in same configuration
How Often	80% of runs, 4/5 in 24h

Terraform 0.12.28, AWS provider 2.70.0

spouzols on Aug 26, 2020

We’re experiencing the issue also in Terraform Cloud using v0.12.28 & 0.12.29 and the AWS provider pinned to ~> 2.0.

Error: RequestError: send request failed
caused by: Get https://cloudfront.amazonaws.com/2019-03-26/origin-access-identity/cloudfront/ABCD1234567: read tcp 10.181.43.96:56350->54.239.29.51:443: read: connection reset by peer

Error: RequestError: send request failed
caused by: Get https://cloudfront.amazonaws.com/2019-03-26/distribution/ABCD1234567: read tcp 10.181.43.96:57570->54.239.29.51:443: read: connection reset by peer

edwardofclt on Aug 25, 2020

seeing this on v0.12.29 as well

awsiv on Aug 20, 2020

GPG-encrypted logs available at https://gist.github.com/mattburgess/2a00b1e77b00368781360ac8581383b9

analytical-dataset-generation_analytical-dataset-generation-qa_154.log.gpg - this one failed after seeing a single connection reset by peer error; no retries were attempted.

analytical-dataset-generation_analytical-dataset-generation-preprod_136.log.gpg - this one hung/paused/waited for 15 minutes having seen a connection reset by peer error, then retried and succeeded on its first retry.

mattburgess on Aug 17, 2020

Having the same issues in our CI/CD pipeline

tristanhoskenjs on Jul 22, 2020