pulumi-gcp: GCP Instance templates / Instance group manager - error deleting

Hello!

  • Vote on this issue by adding a πŸ‘ reaction
  • To contribute a fix for this issue, leave a comment (and link to your pull request, if you’ve opened one already)

Issue details

When working with GCP and instance templates / instance groups, any failure to the run, corrupts the state without ability to cleanly recover. Say the casual scenario, where you create instance template and use instance group manager in GCP(along with other things). When provisioning, if something else fails during the run, Pulumi marks in the state that instance group should have new template(which was assigned during that run) and then on the next run tries to delete the previous template, but this actually being used as the previous update failed, so it cannot delete it. And this becomes real problem when having multiple instance groups. The only way to recover from it, is to create new templates, switch instance groups to use them and re-run Pulumi, so that it will be able to remove those marked for deletion(refresh does not help in this scenario). Example:

my-app-instance-template (mystack:group$mystack:mygroup$gcp:compute/instanceTemplate:InstanceTemplate)
error: deleting urn:pulumi:myproject::mystack::mystack:group$mystack:mygroup$gcp:compute/instanceTemplate:InstanceTemplate::my-app-instance-template: 1 error occurred:
	* Error waiting for Deleting Instance Template: The instance_template resource 'projects/myproject/global/instanceTemplates/my-app-instance-template-COMMIT_HASH' is already being used by 'projects/myproject/zones/myzone/instanceGroupManagers/my-app'

Steps to reproduce

  1. Create multiple compute instance templates and instance groups in GCP, setting instance template name to, say: my-app-instance-template-COMMIT_HASH
  2. After provisioning, run Pulumi again with setting instance template to my-app-instance-template-COMMIT_HASH2 and kill the deployment (process kill, network or anything else) in the middle of it, say when it starts creating instance templates or updating instance groups. Build will be marked as failed.
  3. Run pulumi refresh/up again
  4. If the build was cancelled at the β€œright” point in time at step 2, the new up command will fail with the above message, that it cannot delete my-app-instance-template-COMMIT_HASH, but it cannot delete because actually the instance group is still using that template(because of failed run), yet Pulumi thinks, that it should be already using my-app-instance-template-COMMIT_HASH2.

Expected: not mark the template for deletion, unless the instance group manager has been actually assigned that template and previous instance was removed from the group. Actual: Pulumi marks that instance group is running on my-app-instance-template-COMMIT_HASH2, while it actually running on my-app-instance-template-COMMIT_HASH and tries to remove my-app-instance-template-COMMIT_HASH unsuccessfully, as it’s still being used by instance group manager.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 32
  • Comments: 26 (10 by maintainers)

Commits related to this issue

Most upvoted comments

I’ve seen this this issue multiple times, but had not enough details for report. Thanks @stakauskas!

I ran into this issue today when I was scaling down my instances to zero across three managed groups.

Type                             Name                        Status                   Info
     pulumi:pulumi:Stack              internal-testing            **failed**               1 error
 +-  β”œβ”€ gcp:compute:InstanceTemplate  instancetemplate-command    **replacing failed**     1 error
 +-  β”œβ”€ gcp:compute:InstanceTemplate  instancetemplate-compute    **replacing failed**     1 error
 +-  └─ gcp:compute:InstanceTemplate  instancetemplate-ephemeral  **replacing failed**     1 error
 
Diagnostics:
  gcp:compute:InstanceTemplate (instancetemplate-compute):
    error: deleting urn:pulumi:testing::internal::gcp:compute/instanceTemplate:InstanceTemplate::instancetemplate-compute: 1 error occurred:
      * Error waiting for Deleting Instance Template: The instance_template resource 'projects/eng-internal/global/instanceTemplates/instancetemplate-compute-dc279c0' is already being used by 'projects/eng-internal/regions/us-central1/instanceGroupManagers/mig-compute-98fe1f2'
 
  gcp:compute:InstanceTemplate (instancetemplate-ephemeral):
    error: deleting urn:pulumi:testing::internal::gcp:compute/instanceTemplate:InstanceTemplate::instancetemplate-ephemeral: 1 error occurred:
      * Error waiting for Deleting Instance Template: The instance_template resource 'projects/eng-internal/global/instanceTemplates/instancetemplate-ephemeral-5fd9833' is already being used by 'projects/eng-internal/regions/us-central1/instanceGroupManagers/mig-ephemeral-e21d83a'
 
  gcp:compute:InstanceTemplate (instancetemplate-command):
    error: deleting urn:pulumi:testing::internal::gcp:compute/instanceTemplate:InstanceTemplate::instancetemplate-command: 1 error occurred:
      * Error waiting for Deleting Instance Template: The instance_template resource 'projects/eng-internal/global/instanceTemplates/instancetemplate-command-340eef8' is already being used by 'projects/eng-internal/regions/us-central1/instanceGroupManagers/mig-command-f9e2c73'

Any chance to get some eyes on this from Pulumi staff? This is the number 1 thumbs up’d bug with the next closest issue having 5 thumbs.

Not being able to reliably use instance templates and groups is a show stopper on adopting Pulumi for advanced deployments on GCP.

I started investigating this issue today, and have a consistent repro with the latest provider version:

Step 1: Create a stack with the following program

import * as gcp from "@pulumi/gcp";
import * as pulumi from "@pulumi/pulumi";

const myImage = pulumi.output(gcp.compute.getImage({
    family: "debian-11",
    project: "debian-cloud",
}));
const instanceTemplate = new gcp.compute.InstanceTemplate("it", {
    name: "test1",
    machineType: "e2-medium",
    region: "us-central1",
    networkInterfaces: [{
        network: "default",
    }],
    disks: [{
        sourceImage: myImage.selfLink,
    }],
}, {deleteBeforeReplace: true});

const appserver = new gcp.compute.InstanceGroupManager("appserver", {
    baseInstanceName: "app",
    zone: "us-central1-a",
    versions: [{
        instanceTemplate: instanceTemplate.id,
    }],
    targetSize: 2,
    namedPorts: [{
        name: "customhttp",
        port: 8888,
    }],
});

Step 2: Change the name of the InstanceTemplate Step 3: Run another pulumi up. The update will fail:

pulumi up
Previewing update (dev)

View Live: https://app.pulumi.com/lblackstone/gcp-test/dev/previews/3ec0535c-0fcd-4c10-9a81-245a2145a8be

     Type                                 Name          Plan        Info
     pulumi:pulumi:Stack                  gcp-test-dev
 +-  β”œβ”€ gcp:compute:InstanceTemplate      it            replace     [diff: ~name]
 ~   └─ gcp:compute:InstanceGroupManager  appserver     update      [diff: ~versions]


Resources:
    ~ 1 to update
    +-1 to replace
    2 changes. 1 unchanged

Do you want to perform this update? yes
Updating (dev)

View Live: https://app.pulumi.com/lblackstone/gcp-test/dev/updates/18

     Type                             Name          Status                   Info
     pulumi:pulumi:Stack              gcp-test-dev  **failed**               1 error
 +-  └─ gcp:compute:InstanceTemplate  it            **replacing failed**     1 error


Diagnostics:
  pulumi:pulumi:Stack (gcp-test-dev):
    error: update failed

  gcp:compute:InstanceTemplate (it):
    error: deleting urn:pulumi:dev::gcp-test::gcp:compute/instanceTemplate:InstanceTemplate::it: 1 error occurred:
    	* Error waiting for Deleting Instance Template: The instance_template resource 'projects/pulumi-development/global/instanceTemplates/test1' is already being used by 'projects/pulumi-development/zones/us-central1-a/instanceGroupManagers/appserver-b34c08d'

Resources:
    1 unchanged

Duration: 13s
pulumi about
CLI
Version      3.51.1
Go Version   go1.19.5
Go Compiler  gc

Plugins
NAME    VERSION
gcp     6.47.0
nodejs  unknown

Host
OS       darwin
Version  12.6.1
Arch     arm64

This project is written in nodejs: executable='/Users/levi/.nvm/versions/node/v16.10.0/bin/node' version='v16.10.0'

Current Stack: lblackstone/gcp-test/dev

TYPE                                                   URN
pulumi:pulumi:Stack                                    urn:pulumi:dev::gcp-test::pulumi:pulumi:Stack::gcp-test-dev
pulumi:providers:gcp                                   urn:pulumi:dev::gcp-test::pulumi:providers:gcp::default_6_47_0
gcp:compute/instanceTemplate:InstanceTemplate          urn:pulumi:dev::gcp-test::gcp:compute/instanceTemplate:InstanceTemplate::it
gcp:compute/instanceGroupManager:InstanceGroupManager  urn:pulumi:dev::gcp-test::gcp:compute/instanceGroupManager:InstanceGroupManager::appserver


Found no pending operations associated with dev

Dependencies:
NAME            VERSION
@pulumi/gcp     6.47.0
@pulumi/pulumi  3.52.0
@types/node     14.18.36

Pulumi locates its logs in /var/folders/ny/f_y5fsqd235fpx5bs6ghyk4w0000gn/T/ by default

This is my experience with the issue.

  1. Instance Template updates (user data changes)
  2. Pulumi tries to replace the template but it cannot be cause it tries to delete old one while it is still in use by the Instance Group.
    error: deleting urn:pulumi:production::internal::gcp:compute/instanceTemplate:InstanceTemplate::instancetemplate-nomad-compute-test: 1 error occurred:
    	* Error waiting for Deleting Instance Template: The instance_template resource 'projects/eng-internal/global/instanceTemplates/instancetemplate-nomad-compute-test-bd001d8' is already being used by 'projects/eng-internal/regions/us-central1/instanceGroupManagers/mig-nomad-compute-test-512b433'

You can work around this by creating a junk template and then you manually switch the instance group to use that junk template and then run pulumi up and it can delete the old template and replace the junk template on the instance group with the updated new one.

For some reason Pulumi is not creating the new instance template, attaching the new one to the instance group, and then deleting the old one. It is trying to delete the old one while it is still attached to the instance group and that causes the error.

Hey @danielrbradley any word on when this can be fixed?

Hi all

I must apologise for the fact that there has no response on this issue - we are looking into this with immediate effect. I have been able to reproduce this perfectly. The issue happens if we are using a name parameter. It doesn’t happen if we are using a name prefix or an autonaming based resource

import * as gcp from "@pulumi/gcp";
import * as pulumi from "@pulumi/pulumi";

const myImage = pulumi.output(gcp.compute.getImage({
    family: "debian-9",
    project: "debian-cloud",
}));
const instanceTemplate = new gcp.compute.InstanceTemplate("myinstanceTemplatenew", {
    // namePrefix: "instance-template-", <----- works
    name: "myinstancetemplateanother", <------ doesn't work
    machineType: "e2-medium",
    region: "us-central1",
    networkInterfaces: [{
        network: "default",
    }],
    disks: [{
        sourceImage: myImage.selfLink,
    }],
});

const autohealing = new gcp.compute.HealthCheck("autohealing", {
    checkIntervalSec: 5,
    timeoutSec: 5,
    healthyThreshold: 2,
    unhealthyThreshold: 10,
    httpHealthCheck: {
        requestPath: "/healthz",
        port: 8080,
    },
});
const appserver = new gcp.compute.InstanceGroupManager("appserver", {
    baseInstanceName: "app",
    zone: "us-central1-a",
    versions: [{
        instanceTemplate: instanceTemplate.id,
    }],
    targetSize: 2,
    namedPorts: [{
        name: "customhttp",
        port: 8888,
    }],
    autoHealingPolicies: {
        healthCheck: autohealing.id,
        initialDelaySec: 300,
    },
});

By changing to use the namePrefix we get the createBeforeDelete behaviour that we need but we will work on getting the correct functionality when changing the name

Paul