autoscaler: Scale up from 0 does not work with existing AWS EBS CSI PersistentVolume
Which component are you using?:
- cluster-autoscaler
What version of the component are you using?:
- v1.18.3 ( also happened with v1.18.2)
Cluster-Autoscaler Deployment YAML
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::AWS_ACCOUNT_ID_OMMITTED:role/mycompany-iam-k8s-cluster-autoscaler-test
name: cluster-autoscaler
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cluster-autoscaler
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
rules:
- apiGroups: [""]
resources: ["events", "endpoints"]
verbs: ["create", "patch"]
- apiGroups: [""]
resources: ["pods/eviction"]
verbs: ["create"]
- apiGroups: [""]
resources: ["pods/status"]
verbs: ["update"]
- apiGroups: [""]
resources: ["endpoints"]
resourceNames: ["cluster-autoscaler"]
verbs: ["get", "update"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["watch", "list", "get", "update"]
- apiGroups: [""]
resources:
- "pods"
- "services"
- "replicationcontrollers"
- "persistentvolumeclaims"
- "persistentvolumes"
verbs: ["watch", "list", "get"]
- apiGroups: ["extensions"]
resources: ["replicasets", "daemonsets"]
verbs: ["watch", "list", "get"]
- apiGroups: ["policy"]
resources: ["poddisruptionbudgets"]
verbs: ["watch", "list"]
- apiGroups: ["apps"]
resources: ["statefulsets", "replicasets", "daemonsets"]
verbs: ["watch", "list", "get"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses", "csinodes"]
verbs: ["watch", "list", "get"]
- apiGroups: ["batch", "extensions"]
resources: ["jobs"]
verbs: ["get", "list", "watch", "patch"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["create"]
- apiGroups: ["coordination.k8s.io"]
resourceNames: ["cluster-autoscaler"]
resources: ["leases"]
verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["create", "list", "watch"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["cluster-autoscaler-status", "cluster-autoscaler-priority-expander"]
verbs: ["delete", "get", "update", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cluster-autoscaler
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-autoscaler
subjects:
- kind: ServiceAccount
name: cluster-autoscaler
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: cluster-autoscaler
subjects:
- kind: ServiceAccount
name: cluster-autoscaler
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
app: cluster-autoscaler
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8085'
spec:
serviceAccountName: cluster-autoscaler
priorityClassName: cluster-critical
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.18.3 #Major & Minor should match cluster version: https://docs.aws.amazon.com/de_de/eks/latest/userguide/cluster-autoscaler.html#ca-deploy
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/mycompany-test-eks
- --ignore-daemonsets-utilization=true
- --scale-down-delay-after-add=10m
- --scale-down-unneeded-time=10m
- --balance-similar-node-groups=false
- --min-replica-count=0
volumeMounts:
- name: ssl-certs
mountPath: /etc/ssl/certs/ca-certificates.crt
readOnly: true
imagePullPolicy: "Always"
volumes:
- name: ssl-certs
hostPath:
path: "/etc/ssl/certs/ca-bundle.crt"
Component version:
What k8s version are you using (kubectl version)?:
kubectl version Output
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
What environment is this in?:
- AWS EKS with multiple ASGs
- https://github.com/kubernetes-sigs/aws-ebs-csi-driver installed via helm chart v 0.8.2
What did you expect to happen?: I do have an ASG dedicated to a single CronJob, that get’s triggered 6 times a day. That ASG is pinned to a specific AWS AZ by it’s assigned subnet. The Cronjob is pinned to that specific ASG by Affinity+Toleration The job uses a PV, that will be provisioned (AWS EBS) on the first ever run and then subsequently reused on each run. I expect the ASG to be scaled up to 1 after the Pod gets created and removed shortly after the Pod/Job has finished.
What happened instead?:
The ASG will not be scaled up by the cluster-autoscaler.
cluster-autoscaler log output after the Job is created and the Pod is pending
2021-01-25T05:19:22.523Z : Starting main loop
2021-01-25T05:19:22.524Z : "Found multiple availability zones for ASG "mycompany-test-eks-myapp-elastic-group-1-20210108154118845300000003" using eu-central-1a"
2021-01-25T05:19:22.525Z : "Found multiple availability zones for ASG "mycompany-test-eks-myapp-worker-group-2-20201029130225136800000004" using eu-central-1a"
2021-01-25T05:19:22.525Z : "Found multiple availability zones for ASG "mycompany-test-eks-worker-group-1-20201029130715836900000005" using eu-central-1a"
2021-01-25T05:19:22.526Z : Filtering out schedulables
2021-01-25T05:19:22.526Z : 0 pods marked as unschedulable can be scheduled.
2021-01-25T05:19:22.526Z : No schedulable pods
2021-01-25T05:19:22.526Z : Pod myapp-masterdata/masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw is unschedulable
2021-01-25T05:19:22.526Z : Upcoming 0 nodes
2021-01-25T05:19:22.526Z : Skipping node group mycompany-test-eks-myapp-elastic-group-1-20210108154118845300000003 - max size reached
2021-01-25T05:19:22.526Z : "Pod masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw can't be scheduled on mycompany-test-eks-myapp-elastic-group-2-20201029130715759300000004, predicate checking error: node(s) didn't match node selector predicateName=NodeAffinity reasons: node(s) didn't match node selector debugInfo="
2021-01-25T05:19:22.526Z : No pod can fit to mycompany-test-eks-myapp-elastic-group-2-20201029130715759300000004
2021-01-25T05:19:22.526Z : "Could not get a CSINode object for the node "template-node-for-mycompany-test-eks-myapp-masterdata-import-20210120105639236000000003-8426967936887117836": csinode.storage.k8s.io "template-node-for-mycompany-test-eks-myapp-masterdata-import-20210120105639236000000003-8426967936887117836" not found"
2021-01-25T05:19:22.527Z : "PersistentVolume "pvc-ef85dcce-e63e-42da-b869-c3389bbd948d", Node "template-node-for-mycompany-test-eks-myapp-masterdata-import-20210120105639236000000003-8426967936887117836" mismatch for Pod "myapp-masterdata/masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw": No matching NodeSelectorTerms"
2021-01-25T05:19:22.527Z : "Pod masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw can't be scheduled on mycompany-test-eks-myapp-masterdata-import-20210120105639236000000003, predicate checking error: node(s) had volume node affinity conflict predicateName=VolumeBinding reasons: node(s) had volume node affinity conflict debugInfo="
2021-01-25T05:19:22.527Z : No pod can fit to mycompany-test-eks-myapp-masterdata-import-20210120105639236000000003
2021-01-25T05:19:22.527Z : "Pod masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw can't be scheduled on mycompany-test-eks-myapp-worker-group-120200916154409048800000006, predicate checking error: node(s) didn't match node selector predicateName=NodeAffinity reasons: node(s) didn't match node selector debugInfo="
2021-01-25T05:19:22.527Z : No pod can fit to mycompany-test-eks-myapp-worker-group-120200916154409048800000006
2021-01-25T05:19:22.527Z : Skipping node group mycompany-test-eks-myapp-worker-group-2-20201029130225136800000004 - max size reached
2021-01-25T05:19:22.527Z : Skipping node group mycompany-test-eks-worker-group-1-20201029130715836900000005 - max size reached
2021-01-25T05:19:22.527Z : "Pod masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw can't be scheduled on mycompany-test-eks-worker-group-220200916162252020100000006, predicate checking error: node(s) didn't match node selector predicateName=NodeAffinity reasons: node(s) didn't match node selector debugInfo="
2021-01-25T05:19:22.527Z : No pod can fit to mycompany-test-eks-worker-group-220200916162252020100000006
2021-01-25T05:19:22.527Z : No expansion options
2021-01-25T05:19:22.527Z : Calculating unneeded nodes
[...]
2021-01-25T05:19:22.528Z : Scale-down calculation: ignoring 2 nodes unremovable in the last 5m0s
2021-01-25T05:19:22.528Z : Scale down status: unneededOnly=false lastScaleUpTime=2021-01-25 05:00:14.980160831 +0000 UTC m=+6970.760701246 lastScaleDownDeleteTime=2021-01-25 03:04:22.928996296 +0000 UTC m=+18.709536671 lastScaleDownFailTime=2021-01-25 03:04:22.928996376 +0000 UTC m=+18.709536751 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false
2021-01-25T05:19:22.528Z : Starting scale down
2021-01-25T05:19:22.528Z : No candidates for scale down
2021-01-25T05:19:22.528Z : "Event(v1.ObjectReference{Kind:"Pod", Namespace:"myapp-masterdata", Name:"masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw", UID:"97956c38-55f3-4749-ab74-7e7fc674e832", APIVersion:"v1", ResourceVersion:"217276797", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 3 max node group size reached, 3 node(s) didn't match node selector, 1 node(s) had volume node affinity conflict"
2021-01-25T05:19:22.946Z : k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309: Watch close - *v1beta1.PodDisruptionBudget total 0 items received
2021-01-25T05:19:32.542Z : Starting main loop
Anything else we need to know?: Basically this works fine without the volume. With the volume it works when the volume is not provisioned yet, but fails when it already has been provisioned. The job also get’s scheduled right away when I manually upscale the ASG.
I noticed the volume affinity on the PVC :
Node Affinity: │
Required Terms: │
Term 0: topology.ebs.csi.aws.com/zone in [eu-central-1b]
That tag is probably set on the node by the “ebs-csi-node” DaemonSet and therefore is unknown for the cluster-autoscaler.
Am I expected to tag the ASG with k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone ?
If so, how am I supposed to set them in a Multi-AZ ASGs ?
Possibly related: https://github.com/kubernetes/autoscaler/issues/3230
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 13
- Comments: 26 (2 by maintainers)
Yes, but when the ASG is at 0, there are no nodes. cluster-autoscaler needs the labels tagged on the ASG to know what labels the node would have if it would scale up the ASG from 0.
k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zoneis the approach I am taking and it works like charm.I can do some footwork in terraform to get the tags setup. Not sure what you’re using to provision your cluster.
Though, it would be nice to have the labels generated from the list of AZs assigned to an ASG
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/staleis appliedlifecycle/stalewas applied,lifecycle/rottenis appliedlifecycle/rottenwas applied, the issue is closedYou can:
/remove-lifecycle stale/closePlease send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Also, from your comment, what do you mean by
When your ASG is at 0? You mean if I set the desired count to be ‘0’?@FarhanSajid1 you should have one node group (and thus one ASG) for each AZ. The above tag needs to be applied to the ASG.