copilot-cli: Unable to deploy existing or new jobs

We have two scheduled jobs deployed to our environments (staging, production) via copilot - JobA and JobB. As of sometime last week, we are unable to deploy JobB due to the following error:

The push refers to repository [1234567890.dkr.ecr.us-west-2.amazonaws.com/our-app/JobB]
1fc38fed3e67: Preparing
....more preparing/waiting....
3e207b409db3: Waiting
b6eb9fce359e: Pushed
1fc38fed3e67: Pushed
6be4f4658087: Pushed
03edd48566ad: Pushed
9b9f23013122: Pushed
2d7b189be53f: Layer already exists
ca45630a0e49: Layer already exists
b2b222e5b623: Layer already exists
6f9d3d2bf332: Pushed
3e207b409db3: Layer already exists
4ae80a939a2a: Pushed
5a54ff35e4df: Pushed
d9c889a: digest: sha256:786d57e6e86e0374d26c1ae901ced96eca1ba6d9a968eab0fd63b9953f9e3fbc size: 2841
latest: digest: sha256:ffd8b2b9ad52d7c444eb8a63dd859df17976990c75feb8110bf9aaf2f56f756a size: 2842
✘ ECR repository not found for service JobB in region us-west-2 and account 12346567890

Images appeared to still be pushed/show up in the ECR UI, though many were <untagged>.

Amongst attempts to debug this issue, the job was deleted via copilot job delete, and now re-init via copilot job init fails when using a cron expression of any kind. Trying to provide a cron expression results in:

β ‹ Creating ECR repositories for JobB
✘ Failed to create ECR repositories for job JobB.

✘ add job JobB to application our-app: adding job JobB resources to application our-app: operation 25 for stack set our-app-infrastructure failed

However, using a fixed schedule allows the job to init, so we went with that to get our job re-deployed. When running copilot job deploy for the job, we again get ECR repository not found for service JobB in region us-west-2 and account 1234567890. Manually creating the ECR repository and trying copilot job deploy again results in the same error, however a new image latest does appear in ECR. Subsequent commands continue to fail with ECR repository not found and no new images appear in ECR.

Interestingly, JobA has had no issues. It was created a considerable time and several versions before JobB.

This all happens from a variety of developer machines and via CI systems, so does not appear to be related to any specific environment/architecture/dependencies. We have tried init/deploy combos with and without existing manifest.yml files and related directories, and found that any new job is giving us these errors, not specifically JobB.

Any and all advice or troubleshooting steps appreciated!

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

We were able to recover everything, though it required some manual intervention with Cloudformation. Our pipelines did fail for a moment due to no changes though after some local init/deploy we got everything going.

Thank you very much for the help. I would definitely agree on the diagnosis comment πŸ˜ƒ

@iamhopaul123 we will test this in the morning and report back. Thank you very much for the detailed response!

❯ copilot --version
copilot version: v1.10.1

@huanjani It’s safe to assume this is the same version on all tested machines/systems as well.