autoscaler: Could not get a CSINode object for the node

Which component are you using?: CA

cluster-autoscaler

What version of the component are you using?: 1.20.1

k8s.gcr.io/autoscaling/cluster-autoscaler:v1.20.1

Component version:

What k8s version are you using (kubectl version)?:

1.23.5

What environment is this in?:

AWS

Could someone please tell me what this error is about? I found sometimes that it takes ages for cluster to scale up and I am wondering if this is related somehow:

I0412 08:06:16.062769       1 scheduler_binder.go:775] Could not get a CSINode object for the node "template-node-for-nodes-a.domain.net-7982597919630627426-0": csinode.storage.k8s.io "template-node-for-nodes-a.domain.net-7982597919630627426-0" not found
I0412 08:06:16.062801       1 scheduler_binder.go:801] All bound volumes for Pod "namespace/pod-75b64dff96-99vxn" match with Node "template-node-for-nodes-a.domain.net-7982597919630627426-0"
I0412 08:06:16.062828       1 filter_out_schedulable.go:157] Pod namespace.pod-75b64dff96-99vxn marked as unschedulable can be scheduled on node template-node-for-nodes-a.domain.net-7982597919630627426-0. Ignoring in scale up.
I0412 08:06:16.063127       1 scheduler_binder.go:775] Could not get a CSINode object for the node "template-node-for-nodes-c.domain.net-4246696157256546175-0": csinode.storage.k8s.io "template-node-for-nodes-c.domain.net-4246696157256546175-0" not found
I0412 08:06:16.063143       1 scheduler_binder.go:801] All bound volumes for Pod "namespace/pod-64755c698f-ghcdt" match with Node "template-node-for-nodes-c.domain.net-4246696157256546175-0"
I0412 08:06:16.063166       1 filter_out_schedulable.go:157] Pod namespace.pod-64755c698f-ghcdt marked as unschedulable can be scheduled on node template-node-for-nodes-c.domain.net-4246696157256546175-0. Ignoring in scale up.

The thing is that in each node group there is a still place for the new nodes. At least 5 in each.

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 26
Comments: 17 (3 by maintainers)

Most upvoted comments

I had a brand new AWS ASG scaled to 0 and had the same issue at deploy time. It was solved by manually scaling up. Afterwards, the CAS started working as expected.

RicHincapie on Jul 14, 2022

What version of CA did the fix? Could not get a CSINode object for the node “ip-10.xxx.x.xx…ap-south-1.compute.internal”: csinode.storage.k8s.io “ip-10-xxx.xx-ap-south-1.compute.internal” not found

decipher27 on Sep 12, 2022

That way of segregating node pools in zones is way older than the aws-ebs-csi-driver.

For as long as I can remember, at least 4 years, I’ve always done that, because scaling never worked 100% for multi-zones pools.

bcouetil on Jun 25, 2023

@bcouetil what you do in your example is creating the node group only in one availability zone.

This is the same is how @afirth noticed in the commend here above .

zentavr on Jun 25, 2023

We’re still observing this issue on AWS EKS 1.24 with Cluster Autoscaler 1.26.1

Chili-Man on Jan 18, 2023

I think this happens when the pod requests a PVC on AWS (or others) that is not available in the AZ of the node. The real scheduler sees that this won’t work, but the CAS “fake scheduler run” doesn’t. After awhile CAS marks the node as underutilized, kills it, and scales up again. Eventually the scale-up node lands in the right AZ, and the pod is scheduled. On other providers which support multi-zone storage, this is not a problem. solution - make a separate node group for each AZ. caveat - scale to/from 0 is broken in default EKS. Workarounds and issue at https://github.com/aws/containers-roadmap/issues/608

If CAS update really did fix it, I’m very interested in how. If it’s caused by something else, feel free to chime in here. And, feel free to chat with your AWS AM about this. https://github.com/aws/containers-roadmap/issues/608 and 724 have some of the most 👍 of all in the roadmap and aren’t particularly hard to fix.

afirth on Jan 4, 2023