aws-cdk: VPC: allow configuring NAT instances instead of gateways (and a 0 NAT gateways bug)

When using ec2.VpcNetwork the defaults are to create NAT gateways. I originally scoped this down to just creating a single NAT gateway for my public subnet, and a month later was slogged with a $90 AWS bill, with almost all of that cost attributed to the NAT gateway.

So today I decided to try and rework it to remove the NAT gateway (since my app really doesn’t need it anyway). Tried removing the key, but that uses the defaults (and makes more), so I tried setting the key to 0. Example config:

const vpc = new ec2.VpcNetwork(this, 'Tokenized-VPC', {
            natGateways: 0,
            // natGatewayPlacement: {subnetName: 'Public'},
            subnetConfiguration: [
                {
                    cidrMask: 26,
                    name: 'Public',
                    subnetType: ec2.SubnetType.Public,
                },
                {
                    name: 'Application',
                    subnetType: ec2.SubnetType.Private,
                },
            ],
            defaultInstanceTenancy: ec2.DefaultInstanceTenancy.Default,
        });

When I ran cdk diff, I got a number of errors back

Exactly one of [NetworkInterfaceId, VpcPeeringConnectionId, GatewayId, EgressOnlyInternetGatewayId, InstanceId, NatGatewayId] must be specified and not empty
 1/9 | 8:27:39 am | UPDATE_FAILED        | AWS::EC2::Route                       | Foo-VPC/ApplicationSubnet2/DefaultRoute (FooVPCApplicationSubnet2DefaultRoute2325F2C6) Exactly one of [NetworkInterfaceId, VpcPeeringConnectionId, GatewayId, EgressOnlyInternetGatewayId, InstanceId, NatGatewayId] must be specified and not empty
	VpcPrivateSubnet.addDefaultRouteToNAT (/foo/bar/deploy/aws-ec2/node_modules/@aws-cdk/aws-ec2/lib/vpc.js:259:9)
	\_ VpcPrivateSubnet.addDefaultNatRouteEntry (/foo/bar/deploy/aws-ec2/node_modules/@aws-cdk/aws-ec2/lib/vpc.js:316:14)
	\_ VpcNetwork.privateSubnets.forEach /foo/bar/deploy/aws-ec2/node_modules/@aws-cdk/aws-ec2/lib/vpc.js:127:31)
	\_ Array.forEach (<anonymous>)
	\_ new VpcNetwork (/foo/bar/deploy/aws-ec2/node_modules/@aws-cdk/aws-ec2/lib/vpc.js:120:33)
	\_ new TokenizedEC2Stack (/foo/bar/deploy/aws-ec2/bin/tokenized.js:15:21)
	\_ Object.<anonymous> (/foo/bar/deploy/aws-ec2/bin/tokenized.js:243:1)
	\_ Module._compile (internal/modules/cjs/loader.js:689:30)
	\_ Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
	\_ Module.load (internal/modules/cjs/loader.js:599:32)
	\_ tryModuleLoad (internal/modules/cjs/loader.js:538:12)
	\_ Function.Module._load (internal/modules/cjs/loader.js:530:3)
	\_ Function.Module.runMain (internal/modules/cjs/loader.js:742:12)
	\_ startup (internal/bootstrap/node.js:279:19)
	\_ bootstrapNodeJSCore (internal/bootstrap/node.js:696:3)

This implies that there isn’t good support currently for when NAT gateways is 0 (may need to improve checks around things there), and as best as I could tell skimming the docs, there isn’t a great way to use VpcNetwork without a NAT gateway.

Presumably I can use the override methods to ‘reach in’ and patch those keys manually, probably setting GatewayId/EgressOnlyInternetGatewayId it will probably work, but I was wondering if there is currently a ‘better’ solution than that when:

  • I have a public subnet that I just want to connect directly to the net (and control connections through security groups, etc)
  • I have a private subnet that only requires egress

It may be that most people using CDK have requirements greater than mine and/or don’t mind about the NAT gateway costs, but I feel like someone just playing around may be shockingly surprised at how much $$ the defaults end up costing them. Maybe some doco changes to call this out more explicitly? And/or an example that supports using methods other than the NAT gateway (eg. as mentioned above, or an example that shows how to set it up using the old way with a NAT instance so we can run it on a micro for tiny workloads without costing the world)

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 12
  • Comments: 25 (6 by maintainers)

Most upvoted comments

Hi all. @rix0rrr, I believe the 0 NAT gateway issue is not fixed.

If I do

    const vpc = new Vpc(this, 'Vpc', {
      cidr: '10.0.0.0/16',
      maxAzs: 1,
      natGateways: 0,
    });

I get the following CDK error:

If you do not want NAT gateways (natGateways=0), make sure you don’t configure any PRIVATE subnets in ‘subnetConfiguration’ (make them PUBLIC or ISOLATED instead)

And if I do

    const vpc = new Vpc(this, 'Vpc', {
      cidr: '10.0.0.0/16',
      maxAzs: 1,
      natGateways: 0,
      subnetConfiguration: [
        {
          cidrMask: 24,
          name: 'Public',
          subnetType: SubnetType.PUBLIC,
        }
      ],
    });

I get the following CDK error:

There are no ‘Private’ subnet groups in this VPC. Available types: Public

What am I missing ?

This seems to be possible with a recent commit #4898 in version 1.16.0 thanks to rix0rrr.

new Vpc(this, `vpc`, {
  cidr: '10.40.0.0/16',
  maxAzs: 2,
  natGateways: 2,
  natGatewayProvider: NatProvider.instance({
    instanceType: InstanceType.of(InstanceClass.T3A, InstanceSize.NANO),
  }),
  gatewayEndpoints: {
    s3: { service: GatewayVpcEndpointAwsService.S3 },
  },
});

@Pwntus I think this might actually be a gap in the ScheduledFargateTask construct.

It’s lacking a subnetSelection: ec2.SubnetSelection property. if you had that, you could say subnetSelection: { subnetType: ec2.SubnetType.PUBLIC } when creating it (right now, it’s trying to bind to the private subnets of your VPC, which is the default behavior).

@0xdabbad00 @Pwntus @sallar

I’m pretty much in the same boat as you and I figured out how to get rid of the nat gateways. The trick is to put a cloudformation condition on the AWS::EC2::Route resource that references the gateway and force it to evaluate to false. You can do this when nat gateway count = 0. Here it is in Python syntax:

from aws_cdk.aws_ec2 import Vpc, CfnRoute
from aws_cdk.core import CfnCondition, Fn

vpc = Vpc(self, 'vpc', nat_gateways=0)
exclude_condition = CfnCondition(
    self, 'exclude-default-route-subnet', expression=Fn.condition_equals(True, False)
)
for subnet in vpc.private_subnets:
    for child in subnet.node.children:
        if type(child) == CfnRoute:
            route: CfnRoute = child
            route.cfn_options.condition = exclude_condition # key point here

@rix0rrr I think the above is a pretty reasonable workaround for this issue.

I have this same problem. I’m trying to deploy a CDK app to spin up an ECS on a nightly job and my biggest cost is going to be the NAT Gateway that I don’t need to be using.

Hi @0xdevalias

I tried following your example, but it doesn’t work in the newest version of the cdk clients. I did some changes to make the typing work as follows:

    const vpc = new VpcNetwork(stack, "MyApp", {
        natGateways: 0
    });

    const natSecurityGroup = new SecurityGroup(stack, "NATSecurityGroup", {
        vpc,
        groupName: "NATSecurityGroup",
        description: "NAT Instance Security Group",
        allowAllOutbound: true
    });

    natSecurityGroup.tags.setTag("Name", natSecurityGroup.groupName);
    natSecurityGroup.connections.allowFromAnyIPv4(new TcpAllPorts());

    const natInstance = new CfnInstance(stack, "NATInstance", {
        imageId: "ami-d03288a3",
        instanceType: new InstanceTypePair(
            InstanceClass.T2,
            InstanceSize.None
        ).toString(),
        subnetId: vpc.publicSubnets[0].subnetId,
        securityGroupIds: [natSecurityGroup.securityGroupId],
        sourceDestCheck: false, // Required for NAT
        keyName: "myapp-ssh"
    });

    natInstance.propertyOverrides.tags = [
        { key: "Name", value: `${natInstance.stackPath}` }
    ];

    vpc.privateSubnets.forEach(subnet => {
        const defaultRoute = subnet.node.findChild("DefaultRoute") as CfnRoute;
        defaultRoute.propertyOverrides.instanceId = natInstance.instanceId;
    });

This yields the following error:

MyAppInfrastructureStack failed: ValidationError:
Circular dependency between resources: [NATSecurityGroupDB004F3B, MyAppVPCProductionPrivateSubnet2DefaultRoute2D986C86, NATInstance, MyAppPrimaryDBInstance2DD5D4CDB, MyAppVPCProductionPrivateSubnet3DefaultRoute710E04C8, MyAppPrimaryDBInstance171D1C604, MyAppVPCProductionPrivateSubnet1DefaultRouteC69EC64E]
Circular dependency between resources: [NATSecurityGroupDB004F3B, MyAppVPCProductionPrivateSubnet2DefaultRoute2D986C86, NATInstance, MyAppPrimaryDBInstance2DD5D4CDB, MyAppVPCProductionPrivateSubnet3DefaultRoute710E04C8, MyAppPrimaryDBInstance171D1C604, MyAppVPCProductionPrivateSubnet1DefaultRouteC69EC64E]

Any suggestions to how should I remove the default NAT Gateways and instead use my own NAT instance here? The default configuration is too expensive for us.

We are trying to deploy a ECS stack (available via an ALB) which doesn’t have much use for so many NAT Gateways.

Thanks a lot

Based on @sallar’s snippet - this stack is working for me on AWS CDK 0.33. It creates a VPC with a t2.nano NAT instance without SSH access. I grabbed the instance ID from this page and looked up my corresponding AWS deployment region for the ‘HVM (NAT) EBS-Backed 64-bit’ instance.

import cdk = require("@aws-cdk/cdk");
import ec2 = require("@aws-cdk/aws-ec2");

export class VpcStack extends cdk.Stack {
  public readonly vpc: ec2.Vpc;

  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    this.vpc = new ec2.Vpc(this, "VPC", { natGateways: 0 });
    const natSecurityGroup = new ec2.SecurityGroup(this, "NATSecurityGroup", {
      vpc: this.vpc,
      groupName: "NATSecurityGroup",
      description: "NAT Instance Security Group",
      allowAllOutbound: true
    });

    natSecurityGroup.connections.allowFromAnyIPv4(new ec2.TcpAllPorts());

    const natInstance = new ec2.CfnInstance(this, "NATInstance", {
      imageId: "ami-00c1445796bc0a29f",
      instanceType: new ec2.InstanceTypePair(
        ec2.InstanceClass.T2,
        ec2.InstanceSize.Nano
      ).toString(),
      subnetId: this.vpc.publicSubnets[0].subnetId,
      securityGroupIds: [natSecurityGroup.securityGroupId],
      sourceDestCheck: false // Required for NAT
    });

    natInstance.addPropertyOverride("Name", natInstance.stackPath);

    this.vpc.privateSubnets.forEach(subnet => {
      const defaultRoute = subnet.node.findChild(
        "DefaultRoute"
      ) as ec2.CfnRoute;
      defaultRoute.addPropertyOverride("instanceId", natInstance.instanceId);
    });
  }
}

@0xdevalias – for the cost conscious finding this issue we should also mention cross AZ network fees. If you really want to be least cost and run a single NAT (instance or GW) you should really use only one AZ. Depending on your data usage and instance sizes the cost implications will shift as well.