autoscaler: AWS - CA does not tolerate balance-similar-node-groups when ASG min and desired capacity is 0
There are 3 autoscaling groups tagged with k8s.io/cluster-autoscaler/SandboxEksCluster and k8s.io/cluster-autoscaler/enabled. SandboxEksCluster is my the name of my cluster.

Autoscaler is started up with the balance flag
I1031 08:41:01.352107 1 flags.go:52] FLAG: --balance-similar-node-groups="true"
I am operating on simple nginx deployment
kubectl run nginx --image=nginx --replicas=10
and scaling it accordingly so that new worker nodes should be added
kubectl scale deployment/nginx --replicas xx
Each time CA picks a node from the most loaded ASG
I1031 09:00:36.572074 1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-8zhq2 is unschedulable
I1031 09:00:36.572081 1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-7kvwn is unschedulable
I1031 09:00:36.572089 1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-skm9v is unschedulable
I1031 09:00:36.572095 1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-c6hjx is unschedulable
I1031 09:00:36.572132 1 scale_up.go:300] Upcoming 0 nodes
I1031 09:00:36.572607 1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0 would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 09:00:36.572624 1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 09:00:36.572632 1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1CASG93B5A949-1V77ZIQFJJY7K would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 09:00:36.572644 1 scale_up.go:423] Best option to resize: sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0
I1031 09:00:36.572656 1 scale_up.go:427] Estimated 1 nodes needed in sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0
I1031 09:00:36.572698 1 scale_up.go:529] Final scale-up plan: [{sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0 2->3 (max: 5)}]
I1031 09:00:36.572722 1 scale_up.go:694] Scale-up: setting group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0 size to 3
I1031 09:00:36.572751 1 auto_scaling_groups.go:211] Setting asg sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0 size to 3
the same happens two times more until ASG reaches its maximum size of 5
I resized the deployment once again and this time CA picked a node from remaining ASGs as the previous is already full
I1031 10:05:37.255691 1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-2mhvj is unschedulable
I1031 10:05:37.255696 1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-lk7gx is unschedulable
I1031 10:05:37.255737 1 scale_up.go:300] Upcoming 0 nodes
I1031 10:05:37.255751 1 scale_up.go:338] Skipping node group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0 - max size reached
I1031 10:05:37.256404 1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 10:05:37.256420 1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1CASG93B5A949-1V77ZIQFJJY7K would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 10:05:37.256432 1 scale_up.go:423] Best option to resize: sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW
I1031 10:05:37.256439 1 scale_up.go:427] Estimated 1 nodes needed in sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW
I1031 10:05:37.256492 1 scale_up.go:521] Splitting scale-up between 2 similar node groups: {sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW, sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1CASG93B5A949-1V77ZIQFJJY7K}
I1031 10:05:37.256503 1 scale_up.go:529] Final scale-up plan: [{sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW 0->1 (max: 5)}]
I1031 10:05:37.256518 1 scale_up.go:694] Scale-up: setting group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW size to 1
I1031 10:05:37.256547 1 auto_scaling_groups.go:211] Setting asg sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW size to 1
Repeating the scenario from the beginning with all 3 ASG having min set to 1
I1031 10:23:12.752831 1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-wb9p8 is unschedulable
I1031 10:23:12.752847 1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-g87s8 is unschedulable
I1031 10:23:12.752914 1 scale_up.go:300] Upcoming 0 nodes
I1031 10:23:12.753948 1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0 would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 10:23:12.754003 1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 10:23:12.754026 1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1CASG93B5A949-1V77ZIQFJJY7K would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 10:23:12.754048 1 scale_up.go:423] Best option to resize: sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW
I1031 10:23:12.754080 1 scale_up.go:427] Estimated 1 nodes needed in sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW
I1031 10:23:12.754185 1 scale_up.go:521] Splitting scale-up between 3 similar node groups: {sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW, sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0, sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1CASG93B5A949-1V77ZIQFJJY7K}
I1031 10:23:12.754221 1 scale_up.go:529] Final scale-up plan: [{sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW 1->2 (max: 5)}]
I1031 10:23:12.754249 1 scale_up.go:694] Scale-up: setting group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW size to 2
I1031 10:23:12.754298 1 auto_scaling_groups.go:211] Setting asg sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW size to 2
Please note the log entry:
I1031 10:23:12.754185 1 scale_up.go:521] Splitting scale-up between 3 similar node groups: {sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW, sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0, sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1CASG93B5A949-1V77ZIQFJJY7K}
which could indicate CA really respects --balance-similar-node-groups property.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 5
- Comments: 29 (10 by maintainers)
I’ve just run into this with our clusters. Even with the
randomexpander, all nodes were placed within a single AZ rather than distributed between ASGs in three. Settingasg_min_size/asg_desired_capacityto 1 causes CA to distribute further nodes across the three ASGs correctly, but it means we need to keep a baseline of 3 instances running when at times we could go lower.Seriously, for everyone who uses K8S autoscaler in AWS - try Karpenter (https://karpenter.sh/)
Hello,
any updates on it?