pulumi-gcp: GCP Instance templates / Instance group manager - error deleting
Hello!
- Vote on this issue by adding a π reaction
- To contribute a fix for this issue, leave a comment (and link to your pull request, if youβve opened one already)
Issue details
When working with GCP and instance templates / instance groups, any failure to the run, corrupts the state without ability to cleanly recover. Say the casual scenario, where you create instance template and use instance group manager in GCP(along with other things). When provisioning, if something else fails during the run, Pulumi marks in the state that instance group should have new template(which was assigned during that run) and then on the next run tries to delete the previous template, but this actually being used as the previous update failed, so it cannot delete it. And this becomes real problem when having multiple instance groups. The only way to recover from it, is to create new templates, switch instance groups to use them and re-run Pulumi, so that it will be able to remove those marked for deletion(refresh does not help in this scenario). Example:
my-app-instance-template (mystack:group$mystack:mygroup$gcp:compute/instanceTemplate:InstanceTemplate)
error: deleting urn:pulumi:myproject::mystack::mystack:group$mystack:mygroup$gcp:compute/instanceTemplate:InstanceTemplate::my-app-instance-template: 1 error occurred:
* Error waiting for Deleting Instance Template: The instance_template resource 'projects/myproject/global/instanceTemplates/my-app-instance-template-COMMIT_HASH' is already being used by 'projects/myproject/zones/myzone/instanceGroupManagers/my-app'
Steps to reproduce
- Create multiple compute instance templates and instance groups in GCP, setting instance template name to, say:
my-app-instance-template-COMMIT_HASH - After provisioning, run Pulumi again with setting instance template to
my-app-instance-template-COMMIT_HASH2and kill the deployment (process kill, network or anything else) in the middle of it, say when it starts creating instance templates or updating instance groups. Build will be marked as failed. - Run
pulumi refresh/upagain - If the build was cancelled at the βrightβ point in time at step 2, the new
upcommand will fail with the above message, that it cannot deletemy-app-instance-template-COMMIT_HASH, but it cannot delete because actually the instance group is still using that template(because of failed run), yet Pulumi thinks, that it should be already usingmy-app-instance-template-COMMIT_HASH2.
Expected: not mark the template for deletion, unless the instance group manager has been actually assigned that template and previous instance was removed from the group.
Actual: Pulumi marks that instance group is running on my-app-instance-template-COMMIT_HASH2, while it actually running on my-app-instance-template-COMMIT_HASH and tries to remove my-app-instance-template-COMMIT_HASH unsuccessfully, as itβs still being used by instance group manager.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 32
- Comments: 26 (10 by maintainers)
Commits related to this issue
- Upgrade to v4.24.0 of the Google Beta Terraform Provider Fixes: #680 Fixes: #826 Also, please note, upgrades to pulumi-terraform-bridge v3.25.0 This new version of the bridge allows us to control t... — committed to pulumi/pulumi-gcp by stack72 2 years ago
- Upgrade to v4.24.0 of the Google Beta Terraform Provider (#828) Fixes: #680 Fixes: #826 Also, please note, upgrades to pulumi-terraform-bridge v3.25.0 This new version of the bridge allows us ... — committed to pulumi/pulumi-gcp by stack72 2 years ago
Iβve seen this this issue multiple times, but had not enough details for report. Thanks @stakauskas!
I ran into this issue today when I was scaling down my instances to zero across three managed groups.
Any chance to get some eyes on this from Pulumi staff? This is the number 1 thumbs upβd bug with the next closest issue having 5 thumbs.
Not being able to reliably use instance templates and groups is a show stopper on adopting Pulumi for advanced deployments on GCP.
I started investigating this issue today, and have a consistent repro with the latest provider version:
Step 1: Create a stack with the following program
Step 2: Change the name of the
InstanceTemplateStep 3: Run anotherpulumi up. The update will fail:This is my experience with the issue.
You can work around this by creating a junk template and then you manually switch the instance group to use that junk template and then run
pulumi upand it can delete the old template and replace the junk template on the instance group with the updated new one.For some reason Pulumi is not creating the new instance template, attaching the new one to the instance group, and then deleting the old one. It is trying to delete the old one while it is still attached to the instance group and that causes the error.
Hey @danielrbradley any word on when this can be fixed?
Hi all
I must apologise for the fact that there has no response on this issue - we are looking into this with immediate effect. I have been able to reproduce this perfectly. The issue happens if we are using a
nameparameter. It doesnβt happen if we are using anameprefix or an autonaming based resourceBy changing to use the
namePrefixwe get the createBeforeDelete behaviour that we need but we will work on getting the correct functionality when changing the namePaul