aws-cdk: cdk deploy in endless loop cause of Fargate Service cant fire up task

I am deploying a codepipeline stack with deployment to a fargate service. Problem is, when there is an issue starting the fargate task, the deployment never returns because fargate tries to start the task over and over again (like every minute or so).

Roughly my code is:

public createEcsDeployAction(vpc: Vpc, ecrRepo: ecr.Repository, buildOutput : Artifact): EcsDeployAction {
    return new EcsDeployAction({
      actionName: 'EcsDeployAction',
      service: this.createLoadBalancedFargateService(this, vpc, ecrRepo).service,
      input: buildOutput,
    })
  };


  createLoadBalancedFargateService(scope: Construct, vpc: Vpc, ecrRepository: ecr.Repository) {
    return new ecspatterns.ApplicationLoadBalancedFargateService(scope, 'myLbFargateService', {
      vpc: vpc,
      serviceName: "HelloWorldFargateService",
      memoryLimitMiB: 512,
      cpu: 256,
      taskImageOptions: {
        image: ecs.ContainerImage.fromEcrRepository(ecrRepository, "latest"),
      },
    });
  }

My problem could be that i define an image in the LoadBalancedFargateService which isnt available during deployment of the stack because codePipeline didnt run yet. Dont know for sure.

Question remains if its wise to just never terminate the “cdk deploy” cause of neverending tries to fire up a task in the backend.

Reproduction Steps

hard to reproduce out of context.

Error Log

no error in console on cdk deploy. Hard to find the real error. Tried it via AWS console without success.

Environment

  • CLI Version : aws-cli/2.0.10 Python/3.8.2 Darwin/19.4.0 botocore/2.0.0dev14
  • Framework Version: 1.36.1 (build 4df7dac)
  • OS : Mac OS X

This is 🐛 Bug Report

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 19
  • Comments: 21 (3 by maintainers)

Commits related to this issue

Most upvoted comments

		const fargateTask = new ecs.FargateTaskDefinition(this, 'FargateTask', {
			cpu: 256,
			memoryLimitMiB: 512,
		})

		fargateTask.addContainer("GinContainer", {
			image: ecs.ContainerImage.fromAsset('services/api')
		})

		const cluster = new ecs.Cluster(this, 'Cluster', {
			containerInsights: true,
			vpc
		})

		const fargateService = new ecs.FargateService(this, 'FargateService', {
			cluster,
			taskDefinition: fargateTask,
			desiredCount: 1,
			assignPublicIp: true,
			platformVersion: ecs.FargatePlatformVersion.VERSION1_4
		})

I dropped the ecs pattern and did everything from scratch and the deployment works just fine(no ALB yet)

If I remove assignPublicIp I get the following error message Stopped reason ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 1 time(s): RequestError: send request failed caused by: Post https://api.ecr.... and the deployment is back to being stuck

Hey @logemann ,

yes, this is an issue. Basically, the problem is that we’re missing a concept in the CDK currently, that represent “an image that doesn’t exist yet, but will be created when the CodePipeline runs”.

In a demo project we’ve done a long time ago, we have a class that represents exactly that. This is how it is used: [1], [2].

Would adding this class to the main CDK project solve your issue @logemann ? If so, I will convert this issue to a feature request.

Thanks, Adam

Yeah makes sense and then i got it right that you cant use PipelineContainerImage isolated. But still i dont think its developer friendly. Using CloudFormationCreateUpdateStackAction feels like quite a big workaround too if at the end you just want to use EcsDeployAction. I think we can close this one, adding PipelineContainerImage to the distro would only make sense if there is a ton of documentation how to use it in conjunction with CloudFormationCreateUpdateStackAction as kind of a replacement to EcsDeployAction for this specific use case. A use case which is IMO quite mainstream.

Ok. This is the detailed error when directly referencing a non-exising image with fromEcrRepository():

CannotPullContainerError: Error response from daemon: manifest for 985582282849.dkr.ecr.eu-central-1.amazonaws.com/hello-world-webapp:latest not found

So to me it looks like the placeholder-dummy image for 1st time deployment is the only way to go. If you do it this way, you need to add a policy like mentioned in my previous post, because otherwise the CDK created TaskExecutionRole has not enough permissions.

Hope i have not put too much infos in here, but this way other people can get an idea what to do. To the AWS-CDK dev team: Is there a way to solve this in an elegant way?

@jonny-rimek thats a different scenario than mine. I dont have IAM problems.

I digged in deeper and to me it looks like a chicken/egg problem. When i remove my EcsDeployment Stage from Codepipeline and deploy my stack from scratch everything works. Of course this gets me a docker image in my ECS repo (because codepipeline runs). Now when i re-add the ECS Deployment stage in my code and re-deploy the stack, everything works because now there is a docker image in the ECR repo. Subsequent codepipeline runs triggered via Github repo change work too and i get full auto-deployment and stuff.

So currently i must deploy my stack in two steps, first without the deployment stage and then with it included. Looks wrong to me.

IMO the problem is that ApplicationLoadBalancedFargateService directly wants to bootstrap an image via:

taskImageOptions: {
        image: ecs.ContainerImage.fromEcrRepository(ecrRepository, "latest"),
      },

it doesnt know that its embedded in EcsDeployAction where it should act only when there is an imagedefinitions.json on input attribute.