aws-cdk: (vpc): context lookup does not occur/complete before vpc is referenced in other constructs

Describe the bug

This is a bit of a weird bug. If I try to lookup a vpc, and then try to select subnets based on that vpc, it will fail under certain conditions. When I try to use subnetFilters when selecting subnets following a lookup in my code, I will always get the error Cluster requires at least 2 subnets, got 0 before cdk.context.json is created. However if I instead try to filter by subnetType instead of with subnetFilters, a lookup will be successfully performed and the code will run normally.

Expected Behavior

For context lookup to occur before the vpc is referenced elsewhere

Current Behavior

Lookup does not complete and errors occur due to lack of subnets found

Reproduction Steps

Here’s a minimal reproduction stack. Comment out subnetFilter and remove the comment on subnetType to create cdk.context.json. Once this file is created, you can remove the comment on subnetFilter` and successfully filter subnets.

    const vpc = ec2.Vpc.fromLookup(this, 'vpc',{
      vpcName: 'MyVpc'
    });

    const docDbCluster = new docdb.DatabaseCluster(this, 'DocDB', {
      masterUser: {
          username: 'myusername',
          excludeCharacters: ':*/?#[];%@"\'\\',
          secretName: `mysecretname`,
      },
      instanceType: ec2.InstanceType.of(
          ec2.InstanceClass.BURSTABLE3,
          ec2.InstanceSize.MEDIUM,
      ),
      vpcSubnets: {
          subnetFilters: [ 
            ec2.SubnetFilter.availabilityZones(["us-east-1a", "us-east-1b"]),
          ],
          // subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
      },
      vpc: vpc,
    });

Possible Solution

Lookup should fully complete before code referencing that lookup is ran. However, I’m not immediately sure why this is only occurring conditionally based on how subnets are selected from a looked up Vpc.

Additional Information/Context

May be related to #21690

CDK CLI Version

2.41.0

Framework Version

No response

Node.js Version

16

OS

mac

Language

Typescript

Language Version

No response

Other information

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 4
  • Comments: 16 (7 by maintainers)

Most upvoted comments

I think I was wrong.

I just tested this without the existence of cdk.context.json and it can successfully synthesize.

new eks.Cluster(scope, 'EksCluster', {
        vpc,
        version: eks.KubernetesVersion.V1_25,
        placeClusterHandlerInVpc: true,
        kubectlLayer: new KubectlLayer(scope, 'KubectlLayer'),
        vpcSubnets: [
            { subnetGroupName: 'Private'}
        ]
    })

Let’s first discuss the EKS issue here

self.cluster = eks.Cluster(
            self,
            "eks-cluster",
            version=eks.KubernetesVersion.V1_28,
            kubectl_layer=lambda_layer_kubectl_v28.KubectlV28Layer(
                self, "kubectl-layer"
            ),
            place_cluster_handler_in_vpc=True,
            cluster_name=f"{props.customer}-eks-cluster",
            default_capacity_instance=ec2.InstanceType(props.worker_node_instance_type),
            default_capacity=2,
            alb_controller=eks.AlbControllerOptions(
                version=eks.AlbControllerVersion.V2_6_2
            ),
            vpc=ec2.Vpc.from_lookup(
                self, "vpc-lookup", vpc_name=f"{props.customer}-{props.region}/vpc"
            ),
            vpc_subnets=[ec2.SubnetSelection(subnet_group_name="application")],
        )

And you are seeing

RuntimeError: Cannot place cluster handler in the VPC since no private subnets could be selected

Actually I can synthesize this even before cdk.context.json is generated

 const cluster = new eks.Cluster(scope, 'EksCluster', {
        version: eks.KubernetesVersion.V1_28,
        placeClusterHandlerInVpc: true,
        kubectlLayer: new KubectlLayer(scope, 'KubectlLayer'),
        vpc,
        defaultCapacity: defaultCapacity ?? 1,
        vpcSubnets: [
            { subnetGroupName: 'Private'}
        ]
    });

Looking at the code here

https://github.com/aws/aws-cdk/blob/4034adb5e4453435b959fde5eea16a7824f21e73/packages/aws-cdk-lib/aws-eks/lib/cluster.ts#L630-L636

and

https://github.com/aws/aws-cdk/blob/4034adb5e4453435b959fde5eea16a7824f21e73/packages/aws-cdk-lib/aws-eks/lib/cluster.ts#L1557-L1559

this option requires privateSubnets.length != 0 which comes from here

https://github.com/aws/aws-cdk/blob/4034adb5e4453435b959fde5eea16a7824f21e73/packages/aws-cdk-lib/aws-eks/lib/cluster.ts#L1537C5-L1537C48

and looks like this function doesn’t think you are having private subnets hence the error.

https://github.com/aws/aws-cdk/blob/4034adb5e4453435b959fde5eea16a7824f21e73/packages/aws-cdk-lib/aws-eks/lib/cluster.ts#L1933-L1962

Now, let’s experiment a little bit like this:

// write a function
function selectPrivateSubnets(vpc: ec2.IVpc, vpcSubnets: ec2.SubnetSelection[]): ec2.ISubnet[] {
  const privateSubnets: ec2.ISubnet[] = [];
  const vpcPrivateSubnetIds = vpc.privateSubnets.map(s => s.subnetId);
  const vpcPublicSubnetIds = vpc.publicSubnets.map(s => s.subnetId);

  for (const placement of vpcSubnets) {

    for (const subnet of vpc.selectSubnets(placement).subnets) {

      if (vpcPrivateSubnetIds.includes(subnet.subnetId)) {
        // definitely private, take it.
        privateSubnets.push(subnet);
        continue;
      }

      if (vpcPublicSubnetIds.includes(subnet.subnetId)) {
        // definitely public, skip it.
        continue;
      }

      // neither public and nor private - what is it then? this means its a subnet instance that was explicitly passed
      // in the subnet selection. since ISubnet doesn't contain information on type, we have to assume its private and let it
      // fail at deploy time :\ (its better than filtering it out and preventing a possibly successful deployment)
      privateSubnets.push(subnet);
    }

  }

  return privateSubnets;
}


export class DemoStack extends Stack {
  constructor(scope: Construct, id: string, props: StackProps) {
    super(scope, id, props);

    const vpc = ec2.Vpc.fromLookup(this, 'Vpc', { vpcName: 'myVPC' });
    const privateSubnets = selectPrivateSubnets(vpc, [
      { subnetGroupName: 'Private' }
    ])

    console.log('private subnets number:' + privateSubnets.length)
    console.log('private subnets IDs:' + privateSubnets.map(s => s.subnetId))

  }
}

(please update the sample above with your subnet group name and VPC ID)

Now, try cdk diff or cdk synth. It should print out the number of the private subnets and their subnet IDs like this:

$ npx cdk synth
private subnets number:3
private subnets IDs:subnet-071c85610846aa9c0,subnet-0ef7ac49e1edb06e4,subnet-0e2177a10a166f87d

Can you verify if it’s returning any private subnet IDs? I guess for some reason selectPrivateSubnets() just doesn’t find any private subnets based on its logic.

Hi @peterwoodworth, Could you help please, I’m very stuck on this one.

I have the same issue where cdk.context.json isn’t getting populated with the AZs for my env. The difference with me is I’m trying to select a subnet to use to create an EC2 instance. I’ve tried using vpc.select_subnets(subnet_type=ec2.SubnetType.PUBLIC) and I get back ['dummy1a', 'dummy1b', 'dummy1c']. Then if I try and narrow it down to one subnet when creating the EC2 instance:

master_db=ec2.CfnInstance(
            self, "PSQL-Server",
            availability_zone="ap-southeast-2a",
            disable_api_termination=True,
            image_id=master_db_ami.image_id,
            instance_type="t3a.large",
            key_name=environment.ssh_keys[environment.deployEnv],
            private_dns_name_options=private_dns_name_opts,
            propagate_tags_to_volume_on_creation=True,
            subnet_id=vpc.select_subnets(
                availability_zones=["ap-southeast-2a"], 
                one_per_az=True, 
                subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS
            ).subnet_ids[0],
            tags=[CfnTag(key="Name",value="PSQL-Server"),
                CfnTag(key="Service",value="PostgreSQL")                
            ]

I get an IndexError: list index out of range.

doing `print(self.availability_zones)’ returns the expected result, but doesn’t cache it in the context file.

Thank you in advance and let me know if you want me to create a seperate ticket for this. Scott

Thanks for posting your findings @arewa, this is a big help!