aws-cdk: (aws-ecs): ELB TG can't connect to ECS EC2 instances ( healthcheck failed )

ELB TG can’t connect to ECS EC2 instances ( healthcheck failed ) when use cluster.AsgCapacity over cluster.addCapacity .

Reproduction Steps


const.taskDefinition = new ecs.TaskDefinition(this, 'Backend', {
    family: 'someFamily',
    compatibility: ecs.Compatibility.EC2,
    executionRole,
    networkMode: ecs.NetworkMode.BRIDGE,
    taskRole,
});
taskDefinition.addContainer('backend', {
    image: ecs.ContainerImage.fromRegistry('hashicorp/http-echo'),
    memoryLimitMiB: 512,
    command: [
        `-listen=:${containerPort}`,
        '-text="hello world"'
    ],
    environment: {},
    portMappings: [
        {
            containerPort: containerPort,
            protocol: ecs.Protocol.TCP,
        },
    ],
});

const sg = new ec2.SecurityGroup(this, `SG${identifier}`, {
    vpc: this.cluster.vpc,
});

const autoScalingGroup = new autoscaling.AutoScalingGroup(this, `asg${identifier}`, {
    vpc: this.cluster.vpc,
    instanceType: new ec2.InstanceType(instanceType),
    machineImage: ecs.EcsOptimizedImage.amazonLinux2(),
    minCapacity: clusterMinCapacity,
    maxCapacity: clusterMaxCapacity,
    desiredCapacity: clusterDesiredCapacity,
    associatePublicIpAddress: true,
    cooldown: cdk.Duration.minutes(1),
    keyName: clusterKeyName,
    securityGroup: sg,
});

const asgProvider = new ecs.AsgCapacityProvider(this, `AsgProvider${identifier}`, {
    autoScalingGroup,
    canContainersAccessInstanceRole: true,
    enableManagedScaling: false,
    enableManagedTerminationProtection: false,
});

this.cluster.addAsgCapacityProvider(asgProvider);

What did you expect to happen?

I expect aws-ecs library automatically create security group with required inbound rules or have some method to allow connect ELB TG to EC2 instances

I expect method addAsgCapacityProvider add automatically access ELB TG to EC2 instances.

Normal SG created with cluster.addCapacity
image

What actually happened?

Actually EC2 instances create with only my security group inbound rules ( SSH ).
image

How I temporarily fixed this issue. I compared security group where create with cluster.addCapacity() and created SG with ASG provider.

This code fix trouble but I think this code must be default in aws-cdk.
Or I don’t understand from AWS CDK ECS last update and deprecation cluster.addCapacity

this.ecsPatternService.loadBalancer.connections.allowTo(sg, ec2.Port.tcpRange(32768, 65535), `allow ELB TG connect to EC2 ${instanceType}`);

Environment

  • CDK CLI Version : 1.104.0 (build 44d3383)
  • Framework Version : ^1.104.0
  • Node.js Version : v14.16.0
  • OS : Fedora release 33 (Thirty Three)
  • Language (Version) : TypeScript (3.8.3)

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 8
  • Comments: 15 (7 by maintainers)

Most upvoted comments

@MrArnoldPalmer Sure, I will try to create my first PR to open source πŸ˜ƒ

@rix0rrr still an issue, specifically when upgrading from CDK v1 to v2 where AddAutoScalingGroup is deprecated. Above workaround from @spg works.

With .AddAutoScalingGroup, the following rules are in place. With .AddAsgCapacityProvider, the following diff appears (i.e. the rules get dropped), making the service unavailable for requests.

Tested with cdk v2.33.0.

Group Dir Protocol Peer
- ${alb/SecurityGroup.GroupId} Out TCP 32768 - 65535 ${clusterSG.GroupId}
- ${clusterSG.GroupId} In TCP 32768 - 65535 ${alb/SecurityGroup.GroupId}

Any news on this?

The latest CDK release is now printing warnings about our usage of Cluster.addCapacity, but we cannot switch to Cluster.addAsgCapacityProvider because of this bug…

Just ran into this because I noticed VSCode mentioned that addCapacity is deprecated. Which I guess happened in a recent cdk update.

After migrating to addAsgCapacityProvider there was this diff on a bunch of security groups which seemed suspect:

Security Group Changes
β”Œβ”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   β”‚ Group                                                                                 β”‚ Dir β”‚ Protocol        β”‚ Peer                                                                                  β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ - β”‚ {"Fn::ImportValue":"*********-All-ALB:ExportsOutputFnGetAttLBSecurityGroup8A41EA2BGro β”‚ Out β”‚ TCP 32768-65535 β”‚ {"Fn::ImportValue":"*********-All-EcsCluster-Production:ExportsOutputFnGetAttClusterS β”‚
β”‚   β”‚ upId851EE1F6"}                                                                        β”‚     β”‚                 β”‚ calingInstanceSecurityGroup8C9BAE52GroupId5449321F"}                                  β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ - β”‚ {"Fn::ImportValue":"*********-All-EcsCluster-Production:ExportsOutputFnGetAttClusterS β”‚ In  β”‚ TCP 32768-65535 β”‚ {"Fn::ImportValue":"*********-All-ALB:ExportsOutputFnGetAttLBSecurityGroup8A41EA2BGro β”‚
β”‚   β”‚ calingInstanceSecurityGroup8C9BAE52GroupId5449321F"}                                  β”‚     β”‚                 β”‚ upId851EE1F6"}                                                                        β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

I was able to manually add it back by adding this line below the cluster initialization:

    cluster.connections.connections.addSecurityGroup(
      ...autoScalingGroup.connections.securityGroups
    );

Now the diff properly shows the ports being allowed again.

@Insidexa will you be able to add tests to the PR you opened? If not I could open a new PR with the fix and some tests.