copilot-cli: Second (and following) deployments of services fail after copilot upgrade

Hey,

last week we upgraded our main copilot app by running app upgrade. Since then we’re running into strange issues when redeploying any kind of service in the same app.

We were creating the app with copilot 1.22.0. We’ve been using the latest release of the copilot-cli for each deployment. And we’ve only triggered app upgrade last week in order to use static sites.

For App Runner based services we see the following error:

deploy service retro to environment staging: deploy service: determine image repository type: image is not supported by App Runner: @sha256:96b7d5824ba87ef965f74db9a4f7babd95832852d3ce9b3b27219b7aa308a2ef

For ECS based services we see this error:

- Updating the infrastructure for stack tech-staging-oreo-dl                  [update rollback complete]  [15.3s]
  The following resource(s) failed to update: [TaskDefinition].                                           
  - An ECS service to run and maintain your tasks in the environment cluster  [not started]               
  - An ECS task definition to group your containers and run them on ECS       [delete complete]           [0.0s]
    Resource handler returned message: "Invalid request provided: Create T                                
    askDefinition: Container.image repository should not be null or empty.                                
     (Service: AmazonECS; Status Code: 400; Error Code: ClientException; R                                
    equest ID: abc12f74-a49a-42f9-ac85-418debf2f7b2; Proxy: null)" (Reques                                
    tToken: 1d5e884d-bb98-74b6-fddb-8f2bc2265329, HandlerErrorCode: Invali                                
    dRequest)

So both errors seem to be related to ECR.

What we found out already:

  • we can create new services and deploy them once. The second and all following attempts to deploy will result in the same error.
  • It’s hard to verify now. But we think for some services we were able to deploy once after the app upgrade. but for others the next deployment directly failed.

I would love to get any feedback on how we can further debug this issue as this is blocking our teams. I’ll happily provide more information, if you tell me which.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 2
  • Comments: 18 (8 by maintainers)

Most upvoted comments

That’s very good to know. Thanks for addressing this issue.

For me this issue is resolved right now as we know what was causing the problems and how we can avoid them in the future. Therefore I’m going to close it even though we didn’t find a good solution to fix services affected by this problem but completely delete and recreate them.

Thanks again for your support. That’s much appreciated.

Hello @schm.

Is this a server side check or is this built into the CLI.

It is built into the CLI.

will this now block clients < 1.29 from interacting with my updated app? Or will this check only work in the future for all clients >= 1.29 (e.g. blocking a 1.29 client from accessing a 1.30 app)

I think “blocking a 1.29 client from accessing a 1.30 app” this one is a correct statement (if by “client” you meant Copilot CLI), so that your 1.29 client won’t be able to accidentally downgrade your 1.30 app (however, this can be overridden by passing --allow-downgrade flag).

@iamhopaul123 it’s strange because im using v.1.28 and today i’ve got this problem 3 times (without touching the manifest). In one case the deploy failed and in the other two the state machine failed with error “failed to normalize image reference …” when running the job(issue #5032 ). They where jobs that I haven’t touched for weeks/months and the previous task version was likely deployed with an older Copilot version. On the jobs where i did copilot job delete ecc the following deploys and runs are going fine.

Monday morning I will try the workaround you suggested.