aws-cdk: aws-s3-deployment - intermittent cloudfront "Waiter InvalidationCompleted failed" error
I’ve come across a deployment where cloudfront was invalidated but the lambda timed out with cfn_error: Waiter InvalidationCompleted failed: Max attempts exceeded. ~I suspect a race conditon, and that reversing the order of cloudfront.create_invalidation() and cloudfront.get_waiter() would fix this race condition.~
edit: proposed fix of reversing create_invalidation() and get_waiter() is invalid, see https://github.com/aws/aws-cdk/issues/15891#issuecomment-898413309
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 24
- Comments: 61 (15 by maintainers)
Commits related to this issue
- Add parameter - dont fail deployment on CFront err Related to: https://github.com/aws/aws-cdk/issues/15891 — committed to msheiny/aws-cdk by deleted user a year ago
- Use sleep instead of sleep 180 instead of waiting for invalidation because of https://github.com/aws/aws-cdk/issues/15891 — committed to mdbudnick/personal-website by mdbudnick 8 months ago
Started to see this problem when using s3 bucket deployments with CDK
Can we re-open this issue? It’s still a problem with the underlying lambda even if its related to another service. What if we provide an option to not fail the custom resource if the invalidation fails?
Is it possible to re-open this issue? We’re experiencing this problem as well.
It seems there’s currently a problem in AWS cloudfront I get the same timeouts errors
From the CloudFront team:
and
I raised this issue internally with the CloudFront team. I’ll keep you guys updated in this conversation.
My team are also seeing this error regularly!
This issue got worse for us so this is our solution for now:
Still happening in 2024… Not sure why I’m using Cloudfront at this point…
This is happening us frequently now also
Reopening because additional customers have been impacted by this issue. @naseemkullah are you still running into this issue?
From other customer experiencing the issue
Message returned: Waiter InvalidationCompleted failed: Max attempts exceededcustomer’s code:
@otaviomacedo Did you ever get an update from them? Just ran into this (also once at deploy, once at rollback), and it’s a major PITA.
We no longer experience this issue after increasing the memory limit of the bucket deployment.
The defalut memory limit is 128. (docs)
encounter the same issue, some action log timestamps:
took 10m to failed the CDK stack, and the invalidation was created 1 min after the failure.
We are also experiencing this issue intermittently with our cloudfront invalidations (once every two weeks or so) 😞
In my case, the invalidation kicked off two and both were in progress for a long time and eventually timed out.
it has happened twice in recent days, next time it occurs i will try to confirm this, iirc the first time this happened i checked and I saw the invalidation event had occurred almost immediately yet the waiter did not see that (that’s why i thought it might be a race condition). Will confirm though!
I think the risk involved in this change is quite low. Please submit the PR and I’ll be happy to review it.