terraform-provider-aws: Cycle error for replacement of aws_api_gateway_deployment with lifecycle create_before_destroy set to true and API Gateway resources in depends_on section
Community Note
- Please vote on this issue by adding a š reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave ā+1ā or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Terraform Version
Terraform v0.12.18
+ provider.aws v2.42.0
Affected Resource(s)
- aws_api_gateway_deployment
Terraform Configuration Files
Iām not copying all API Gateway resourcesā configuration as itās pretty standard but happy to share configuration of whole API Gateway if requested
resource "aws_api_gateway_deployment" "deployment" {
depends_on = [
aws_api_gateway_rest_api.api,
aws_api_gateway_resource.api_email_health,
aws_api_gateway_method.api_email_health_get,
aws_api_gateway_integration.api_email_health_get_integration,
aws_api_gateway_method.api_email_health_options,
aws_api_gateway_integration.api_email_health_options_integration,
aws_api_gateway_integration_response.api_email_health_options_integration_response,
aws_api_gateway_method_response.api_email_health_options_response,
aws_api_gateway_resource.api_email_templates,
aws_api_gateway_method.api_email_templates_get,
aws_api_gateway_integration.api_email_templates_get_integration,
aws_api_gateway_method.api_email_templates_options,
aws_api_gateway_integration.api_email_templates_options_integration,
aws_api_gateway_integration_response.api_email_templates_options_integration_response,
aws_api_gateway_method_response.api_email_templates_options_response,
aws_api_gateway_resource.api_email_emails,
aws_api_gateway_method.api_email_emails_post,
aws_api_gateway_integration.api_email_emails_post_integration,
aws_api_gateway_method.api_email_emails_options,
aws_api_gateway_integration.api_email_emails_options_integration,
aws_api_gateway_integration_response.api_email_emails_options_integration_response,
aws_api_gateway_method_response.api_email_emails_options_response,
aws_api_gateway_resource.api_email
]
rest_api_id = aws_api_gateway_rest_api.api.id
stage_description = "Deployed at ${timestamp()}"
stage_name = var.aws_spotlight_environment
lifecycle {
create_before_destroy = true
}
}
Expected Behavior
As resource aws_api_gateway_deployment is configured as depends_on all API Gateway resources/methods/integrations/responses, it shouldnāt be created before all resources in API Gateway are provisioned so outcome should be (and was this way till recently): old API Gateway resources are destroyed, new are created, new deployment created, old deployment destroyed
We force replacement of aws_api_gateway_deployment so current API Gateway state is always deployed to main stage
This was behaviour in Terraform 0.11.x
Actual Behavior
Cycle Error
Error: Cycle: aws_api_gateway_integration.api_email_health_get_integration (destroy), aws_api_gateway_integration.api_email_health_options_integration (destroy), aws_api_gateway_integration_response.api_email_health_options_integration_response (destroy),
aws_api_gateway_method_response.api_email_health_options_response (destroy), aws_api_gateway_method.api_email_health_options (destroy), aws_api_gateway_resource.api_email_health (destroy), aws_api_gateway_deployment.deployment, aws_api_gateway_deployment.deployment (destroy deposed 359e79c1),
aws_api_gateway_method.api_email_health_get (destroy)
Removal off create_before_destroy = true in lifecycle of resource aws_api_gateway_deployment helps but causes it to fail anyway on different error:
Error: error deleting API Gateway Deployment (bdq86u): BadRequestException: Active stages pointing to this deployment must be moved or deleted
If I remove depends_on section instead, I have situations that deployment happens before all API methods are properly configured. Example:
Error: Error creating API Gateway Deployment: BadRequestException: No integration defined for method
I tried adding separate resource for stage aws_api_gateway_stage but problem persists
Steps to Reproduce
- Create API Gateway with
aws_api_gateway_deploymentwhich depends on API Gateway resources and is recreated with everyterraform apply - Run
terraform apply - Change one or more API Gateway resources which forces them to be destroyed and recreated (ie change API Gateway resource path)
- Run
terraform apply
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 133
- Comments: 55 (11 by maintainers)
Commits related to this issue
- docs/service/apigateway: aws_api_gateway_deployment usage overhaul to discourage stage_name and further encourage create_before_destroy Reference: https://github.com/hashicorp/terraform-provider-aws/... — committed to hashicorp/terraform-provider-aws by bflad 3 years ago
- docs/service/apigateway: aws_api_gateway_deployment usage overhaul to discourage stage_name and further encourage create_before_destroy Reference: https://github.com/hashicorp/terraform-provider-aws/... — committed to hashicorp/terraform-provider-aws by bflad 3 years ago
- docs/service/apigateway: aws_api_gateway_deployment usage overhaul to discourage stage_name and further encourage create_before_destroy (#17230) * docs/service/apigateway: aws_api_gateway_deployment ... — committed to hashicorp/terraform-provider-aws by bflad 3 years ago
Hi all! š Just a quick note to let you know this is on our radar and we will be taking a look in the near future to arrive at a resolution.
Hi folks š You may have noticed me poking around a few other API Gateway v1 issues and pull requests earlier today to warm up for this one. I wanted to fully context switch into this service and ensure we had a clear runway for any code changes that need to get in so we didnāt break other existing contributions.
Apologies for the long delay here and the very frustrating behavior with the API Gateway v1 functionality with regards to deployment. Those aspects of this AWS service, which is unique compared to others, has consistently challenged Terraformās abilities to model it successfully and our ability to document recommended configuration patterns in a discoverable manner. At the end of this, beyond just fixing the reported issue(s) here, it seems necessary that the maintainers take some extra steps to add more robust service-level and use-case examples are added into the
examplesdirectory of the repository (with links from the resource-level reference pages) and/or expand the Learn platform content (e.g. Serverless Applications with AWS Lambda and API Gateway). If you all have other ideas in this manner, it would be great to discuss them. That aside, letās dive into this.First and foremost, I would like to ensure that Iām understanding and covering expectations for the followers here. At a high level, the problem statement seems to be:
aws_api_gateway_*Terraform resources or the OpenAPI specification import ability of theaws_api_gateway_rest_apiresourcebodyargument.BadRequestException: Active stages pointing to this deployment must be moved or deleted) or requires resource recreation that causes potential downtime.And what is expected out of this effort, which will be a focus of mine until its complete:
If Iām missing anything up until this point, please let me know.
To begin these efforts, I will need to reproduce the issues by having self-contained API Gateway configurations ready that match the problem statement along with reproduction steps. The initial report has some good details and I should be able to assemble an all Terraform resource configuration with some minor effort on my part tomorrow morning. https://github.com/hashicorp/terraform-provider-aws/issues/11344#issuecomment-699612070 has a starting configuration for the OpenAPI case. I will reach out if I am having trouble in this regard. In the meantime, if you also have a self-contained configuration handy that displays these issues and would like investigated, please feel free to reach out or post a link to a Gist/repository. I cannot promise Iāll be able to look at or solve every configuration scenario, but the extra context could be valuable.
It is very late for me now (almost 3am) so Iāll pick this up again first thing in the morning. Before I go though, for those attempting to use the resource
lifecyclecreate_before_destroybehavior please note that in the more recent versions of Terraform CLI it seems more sensitive to needing that configuration being applied to every resource in that portion of the dependency graph to have the ordering successfully applied. This means not just theaws_api_gateway_deploymentoraws_api_gateway_stageresources where it seems intuitive, but also the upstreamaws_api_gateway_*resources that are being updated. I only mention this because as an older practitioner of Terraform, it has tripped me up as seeming different than before. I will try to write up more how to debug issues like that tomorrow.TL;DR
aws_api_gateway_stageinstead ofaws_api_gateway_deploymentresourcestage_nameargumentlifecycleblockcreate_before_destroy = trueargument insideaws_api_gateway_deploymentresource configurationtime_staticresource instead oftimestamp()function, if saving the current time is necessaryHi again, folks š Here are some updates.
Terraform AWS Provider version 3.25.0, released today, includes some fixes (https://github.com/hashicorp/terraform-provider-aws/pull/17099 / https://github.com/hashicorp/terraform-provider-aws/pull/17209) for the
aws_api_gateway_rest_apiresource to better respect configuration via OpenAPI if you are working in that model. The resource should no longer show plan differences for āmissingā Terraform configuration that was sourced from the OpenAPI specification. It should also now handle any Terraform configuration beyond thebodyandnamearguments as overrides to any OpenAPI specification. Hopefully this should help remove some previously frustrating behavior in that resource.Now letās turn the focus towards API Gateway REST API Deployments. After some extensive testing, it seemed like most issues captured here and in other similar issues relate around the
aws_api_gateway_deploymentresource also attempting to manage a stage. Terraform and resources are typically designed with a 1:1 mapping and this type of āshadowā resource management has historically been the source of confusion and headaches. The maintainers are now very cognizant not to introduce more of these types of resources, but of course we are stuck with any existing ones until they can be fixed or removed. In the future we may deprecate the problematic behavior.The good news is that these deployment problems lean towards being fixable via configuration and documentation updates. Iāll provide an outline of these below, which should hopefully guide you towards less problematic Terraform environments. You can find proposed API Gateway documentation changes and a new end-to-end example configuration (which I was using to verify my recommendations) here: https://github.com/hashicorp/terraform-provider-aws/pull/17230
Iāll also briefly touch on
timestamp()function usage, since that is not a recommended pattern and can make Terraform edge cases even sharper.As a quick overview of API Gatewayās lifecycle expectations and how they map to the various Terraform resources, REST APIs can be configured via two methods:
aws_api_gateway_rest_apiresourcebodyargument with other arguments serving as overridesaws_api_gateway_resource,aws_api_gateway_method,aws_api_gateway_integration, etc. resourcesOnce the REST API is configured, the
aws_api_gateway_deploymentresource can be used along with theaws_api_gateway_stageresource to snapshot and publish the REST API. Stages can be optionally managed further with theaws_api_gateway_base_path_mapping,aws_api_gateway_domain, andaws_api_method_settingsresources.Both configuration methods achieve the same end goal and operators can choose which style is preferable for their environment or use cases. However from a deployment standpoint, it is worth noting up front that it is much simpler in Terraform to setup the OpenAPI deployment properly. This is because a direct 1:1 configuration dependency can be setup. The Terraform resource method for configuring REST APIs is not going anywhere or any less supported, just additional care needs to be put in place to set it up properly for deployments.
The deeper explanation here is that Terraform currently only knows about differences when a state value has changed and only performs a node operation when there is a local state value change. There are configuration methods for creating edges on the graph (e.g. attribute references and
depends_on), but there is not a method (configuration, internally, or protocol-wise) to remotely trigger another node to do something. In practice, this means the local node (aws_api_gateway_deploymentresource) can only do something when it has local changing state values. Our workaround for this in Terraform Providers is adding a conventionaltriggersmap argument that accepts arbitrary keys and values that can implement local value changes. Collecting and acting on node changes from other nodes has not been a design focus in Terraform before as far as I know, but maybe this can be investigated in the future to improve the user experience in this area.REST API Deployment with OpenAPI
Here is a recommended starter configuration with this method:
There will soon be an end-to-end example available in the repository, which is based off this snippet and expands to include other downstream API Gateway resources to ensure they work as expected. Below you can see this in action, successfully deploying REST API updates without error:
REST API Deployment with Terraform Resources
Here is a recommended starter configuration with this method:
As you can see the triggers is much more complicated as we need to collect changes from many more sources of configuration to implement it properly. The two additional configuration options about potentially using the
filesha1()function against the configuration file itself or hashing whole resources are both widely used in the broader ecosystem, but they add some additional complexity/caveats. The HashiCorp Community Forums is likely a better place to discuss those types of configuration choices, where there are far more people ready to help than those watching the issues in this code repository.As an aside about the
timestamp()function, please note that it uses a special implementation (overriding the Terraform expectation that plan and apply values must exactly match) which generally translates to it sometimes introducing strange behavior into Terraform plan differences. If you need a static time value in Terraform configurations (e.g. when an API Gateway was deployed), a preferable solution is thetime_staticresource. Since it participates in the Terraform operation graph just like other resources and can store time with a stable value, it should be much more predictable.Here is an illustrative example (
aws_api_gateway_deploymentresource already has acreated_dateattribute):You can see updates by running a command similar to
terraform apply -var 'path=/new'after the initialterraform apply.Hopefully all this information helps. If these recommendations are not working as expected on Terraform AWS Provider version 3.25.0 or later, please reach out. We will be looking for reproducing configurations and plan output in those cases. š
@breathingdust you assigned @bflad to this over 2 months ago. Since then there has been zero visible activity, no updates, no documentation updates warning people away from using Terraform for API Gateway in a Production environment, nothing.
As I indicated in a private message to Hashicorp directly, I am happy to do a Medium article on how Terraform AWS Support for API Gateway is not ready for Production and should be avoided if possible. I think that is now the only responsible course of action since from reading documentation nothing would indicate to the casual reader that the only way to update an OpenAPI AWS API Gateway is to completely destroy and recreate your entire infrastructure on each minor change.
Hashicorp/Terraform AWS Team should do the responsible thing and update the public documentation to indicate this SEVERE fault and warn people away from using their solution in real world environments. The fact that you have STILL not done this is a huge stain.
Clearly this issue is not important to you, but it is VERY VERY important to the teams (like ours) stupid enough to get suckered into using this broken implementation. I think you need to be proactive to immediately ensure that more teams donāt get harmed by this lack of support.
Completely wiping out the value of declarative Infrastructure as Code. If we should manually do a whole bunch of extra work to the top level every single time some minor change in a bottom level happens, whatās the point of Terraform?
@shederman are you seriously threatening an open source project? Get a hold of yourself. Yes, the deployment resource is problematic. My company uses api gateway with terraform in production very successfully. If I need to remove an integration, I do a manual step of removing the deployment from the state file first. Thatās it. Yes, Iād like that to not be the case, but Iām not threatening the developers that are the ones most likely to fix this. If you hate terraforming api gateways, stop doing it.
hello everybody i did find a solution, terraform handel resources in singleton mode, it means on resource with a specific name should exist only one time in a tf state, in the case of apigateway deployment, a deployment cant be modified, its a partucularity of aws, and it is quite normal it is like a tag. my solution is to remove the resource from the tfstate after each apply
terraform state rm aws_api_gateway_deployment.gw_deploy_devand now i can see the history of terrform deployments on my Apii hope it will help you, corona virus is a mess but thanks to the time that i had i could made a reverse engineering of the apigw, but in the end i think that Terraform should add new type of ressource based of the design pattern Prototype
If like me youāve come to this issue because you got a cycle error while having implemented the recommended way of doing things in the docs (summarised here), then here follows how I solved things. I was getting this cycle error when running a
terraform planto remove a resource from thebodyof the API gateway REST API resource (openAPI definition). I spotted the cause of the cycle fairly easily - a lambda behind a separate API Gateway (letās call it B) referenced the invoke_url of the current stage of the main API Gateway (letās call it A) in its environment variables. The deployment resource of both API Gateways had a lifecycle policy ofcreate_before_destroy, which is a must have to ensure uptime. This caused the cycle, and as such I broke apart the cycle by manually assembling theinvoke_urlbased on the ID of the REST API resource of API Gateway A and the variable that was used as the stage name in API Gateway A. Great stuff, but unfortunately I simply had a new cycle to contend with, though a shorter one and one where only resources for the API Gateway A were present, mentioning some deposed resources. Basically what I had here is that the remote state still had this coupling between the two API Gateway deployments (because of the stage invoke_url reference), whereas locally I didnāt have it. To solve this, what I did was to change the lifecycle policy of theaws_api_gateway_deploymentresource of API Gateway A in the same PR as the change to remove the resource from it:What the above does is to simply not trigger a deployment while still removing the resource from the API Gateway in remote state. PR merged, terraform apply executed and in the next PR I simply removed the
ignore_changesblock to go back to normal šI also encountered the same issue. I tried two possible compromise solutions.
Wait for a while until all the dependent resources are created
I tried the following solution and I could change
methodandresourceat least. The drawback is that this will trigger deployment every time you apply even if you donāt have any change in the dependent resources.Pass
variablefor trigger In this way, we can control when to recreatedeployment, but you need to separate the resource update anddeploymenttrigger. If you put them in one apply, creating and destroyingdeploymentwill start before completing to update the dependent resources.Isnāt it manual wrangling to solve problem? We use CD software to deploy our TF code so we would prefer avoid such workarounds. Plus our stage is active as its attached to Custom Domain Name so we canāt have it destroyed or have not existing deployment.
Currently we use null resource with some sleep command and deployment resource explicitly set to depends on that null resource as form of workaround. Deployment resource itself isnāt set to depend on any API Gateway resources but delay gives time to all of required resources (methods, integrations and so on) to be provisioned before deployment is created (example below uses PowerShell as language for command because thatās what we use in our company mostly)
In my testing, if you trigger deployments off changes in id as per the example, that means that the resource will be destroyed and recreated to reflect the ID change (e.g. you change the method from a POST to a PUT). This leads to the following situation:
Essentially, the API gets deployed before the old integration is destroyed, which means your API deployment will contain both the old integration and the new one at the same time. This might not be desired, so unless Iāve missed something itās worth taking care when triggering deployments off resources that are getting destroyed and recreated as opposed to just being modified in place.
Since it seems this code has zero value, I will post the code that is not working. We have made numerous changes to try and get this working, and not one has worked. This particular variation builds the API Gateway just fine, but any slight change (e.g. to what parameters we validate) results in āError: error deleting API Gateway Deployment (ufn1gl): BadRequestException: Active stages pointing to this deployment must be moved or deletedā
The only way to make this work in tooling (fully automated) is to entirely destroy the entire API gateway and recreate it, resulting in a completely new URL. I would not be happy with that solution in a Development environment; in a Production one itās a joke.
The defects related to our issue are:
Just putting it here, in case that helps somebody: I followed the example of bflad (using the REST API Deployment with OpenAPI version), but still had this cycle error.
I finally found that I had some aws_lambda_permission resources to bind lambdas with API gateway that were being updated at the same time as the deployment resource.
After adding
depends_on = [aws_api_gateway_deployment.example]on my permission resources, the deployment went fine (ex below):Nobody is threatening anybody. My concern is that people (like us) are using this assuming it will work in Production and it just wonāt. The team do not seem to be telling anybody about this.
I think itās pretty bad form to know about such a serious issue and not indicate it in their documentation. I think it SHOULD be indicated in their documentation, and I asked them to do that MONTHS ago, and they still havenāt.
So what is the responsible thing to do? Ignore this and wait for however long it takes while more and more users get sucked into the same hole? Ask them to update their documentation? Tell people to avoid it because itās broken? And I donāt hate terraforming API Gateway, I want to be able to but am blocked by this critical bug.
Any traction on this?
I am using Terraform to deploy and maintain API Gateways in numerous projects. I have not had any production outages. There are very simple ways to handle this particular scenario. You can just break your changes down into multiple applys and they will go through fine. Terraform 0.13.3/0.14 might resolve the cycle issue as there are various changes around cycles and plans.
I definitely detected a Medium blog post threat š
Iām sure a pull request would be appreciated if you fancy mucking in @shederman ā¦
Iāve been using Terraform for API Gateway in production for a couple of years with daily deployments and its working very well for me. I appreciate all the efforts people contribute to this project also. Thanks everyone ā¤ļø Happy Christmas š š
@bflad Is there any progress on this issue? Given it is a major issue blocking all usage of AWS API Gateway via Open API in real world Production environments?
@riley-clarkson Do you get any service interruptions like that? We have mission-critical services running on API Gateway and the idea of destroying stages on every deploy is not a popular one I can tell you!
@Glen-Moonpig your solution sounds interesting. The one piece I would dispute is that Terraform supports API Gateway. Terraform is supposed to be a tool to manage infrastructure as code - this is a production focused tool. If Terraform cannot create and manage components like API Gateway without causing production outages not required in normal operation of the component, then I would argue quite vehemently that it is not in fact supported.
Especially since this has been unresolved in one shape or form for over a year.
I also had this issue, the following solution worked well for me. Iām using
random_uuidresource to produce a value that is passed totriggersblock inaws_api_gateway_deploymentresource. Therandom_uuidis re-generated whenkeepersvalues change, which can be set to anything e.gjsonencode(aws_api_gateway_method.method)andjsonencode(aws_api_gateway_integration.integration). It is important to make sure thataws_api_gateway_deploymentis created after everything, I achieved it by extracting it into a module and using mandatory variable.The above resource is placed in its own module.
I placed stuff required for adding new method into its own module as well so I donāt have to write
"random_uuid" "deployment_trigger"multiple times. This seems to be working fine for consecutive deployments and changes to api gateway integration/method.I published modules I use, they are very basic and might not work for all projects but code can be adapted for your needs. https://github.com/vladcar/terraform-aws-serverless-common-api-gateway-method https://github.com/vladcar/terraform-aws-serverless-common-api-gateway-deployment