autoscaler: Reaction on "failed to launch a new EC2 instance" event

Hello, Our application uses different types of instances as k8s nodes: c5 and i3. Pods running on c5 instances (let’s name them ‘core pods’) use EBS volumes, so these pods are bound to a particular AZ. Pods running on i3 instances (let’s name them ‘compute pods’) don’t have EBS volumes and can run in any AZ. However, the application performs better if both core pods and compute pods run in the same AZ. The problem with i3 instances is that they might be lacking in some AZ’s at some points in time (I personally saw 2 cases when i3.2xlarge instances were not available in a given AZ, and there was a recommendation in AWS console to create instances in other AZ’s). Ideally we want to run compute pods in the same AZ as core pods, but in case when i3 instances are not available in the AZ, it is fine to run instances in another AZ. To actually configure this scenario and to test how autoscaler would behave, I’ve created 2 node groups:

nodes-1a
  machineType: i3.xlarge
  maxSize: 10
  minSize: 0
  nodeLabels:
    instance_macro_group: computenodes
    kops.k8s.io/instancegroup: nodes-1a
	
nodes-1b
  machineType: i3.2xlarge
  maxSize: 10
  minSize: 0
  nodeLabels:
    instance_macro_group: computenodes
    kops.k8s.io/instancegroup: nodes-1b

Then I’ve created a deployment with the template spec containing the following:

    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - preference:
              matchExpressions:
              - key: kops.k8s.io/instancegroup
                operator: In
                values:
                - nodes-1a
            weight: 1
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: instance_macro_group
                operator: In
                values:
                - computenodes

and container memory has the following configuration:

        resources:
          requests:
            memory: 25Gi

nodes-1a instance group has maxSize = 10, however, the AWS account has a limit on number of running i3.xlarge instances = 6 (I’m not able to reproduce lacking of instances of a particular type, so, using instance limits fits the purpose of the test)

When the deployment is scaled to 1 replica, cluster autoscaler changes the number of nodes in nodes-1a to 1 node; when the deployment is scaled to 6 replicas, cluster autoscaler changes the number of nodes in nodes-1a to 6 nodes; when the deployment is scaled to 7 replicas, the number of nodes in nodes-1a is changed to 7, however, launching a new instance in the autoscaling group fails with the following description “Launching a new EC2 instance. Status Reason: You have requested more instances (7) than your current instance limit of 6 allows for the specified instance type. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit. Launching EC2 instance failed.”

At this point the pod is stuck in ‘Pending’ state indefinitely. What would be useful in this case is a mechanism to get a “feedback” from the Autoscaling group that launching a new EC2 instance failed, and try to find an alternative instance group (in our case - nodes-1b) to scale it out to be able to schedule the pod on it. Does this request seem reasonable, or is there a way round the problem using some existing functionality?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 18 (13 by maintainers)

Most upvoted comments

Yes and ideally also mark the ASG size increase as failed so the autoscaler tries to create a new node in a different AZ and doesn’t wait the full node not coming only timeout.