autoscaler: cluster-autoscaler 1.15.5 does not work with IAM Roles for Service Accounts

Just upgraded my EKS cluster to 1.15, and upgraded cluster-autoscaler along with it.

It appears that 1.15.5 has the same problem that we used to have with 1.14. It does not work with IAM Roles for Service Accounts.

Getting the following error:

E0311 03:36:42.052777 1 aws_manager.go:259] Failed to regenerate ASG cache: cannot autodiscover ASGs: AccessDenied: User: arn:aws:sts::<my-account>:assumed-role/eks-worker/i-059d924ed6f52a032 is not authorized to perform: autoscaling:DescribeTags

This should not be happening, as the autoscaling:DescribeTags permission is assigned to the service-account-based IAM Role, not to the eks-worker instance profile.

This problem was fixed in 1.14.6 but doesn’t seem to have made it into 1.15.5

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 24
  • Comments: 27 (11 by maintainers)

Most upvoted comments

FYI: We’ll be doing new patch releases on Monday: #2988.

For folks who can not wait for upstream release, please either build cluster-autoscaler-release-1.15 by yourself or use my image seedjeffwan/cluster-autoscaler:1.15-dev to mitigate the problem for short term.

I will go fix this issue. I remember I did the cherry-pick. Need to double check.

Update: #2323 is missing in 1.15. I cherry-picked the one to use new Session but not go modules. I will make the change.

@Jeffwan , Is it available for 1.18 Cluster autoscaler?

I am using image k8s.gcr.io/autoscaling/cluster-autoscaler:v1.18.3

My EKS k8s version is v1.18

Getting this error:-

E0730 14:18:49.622795       1 aws_manager.go:259] Failed to regenerate ASG cache: cannot autodiscover ASGs: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
	status code: 403, request id: c3d5fdbd-fb4e-4e6f-a329-b687f8ccc234
F0730 14:18:49.622817       1 aws_cloud_provider.go:379] Failed to create AWS Manager: cannot autodiscover ASGs: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
	status code: 403, request id: c3d5fdbd-fb4e-4e6f-a329-b687f8ccc234

Note that a similar patch will be required in the cluster-autoscaler 1.16 branch (since 1.16 is using aws-sdk-go 1.23.12, and IRSA support was introduced in 1.23.13). Not an immediate issue, but will be once AWS releases EKS with 1.16; if you’re already doing a PR for 1.15, might as well do the same thing for 1.16.

We are scoping 1.18 recently and it will be available soon. currently , there’s no support. User has to choose corresponding minor version with largest patch version. CA uses scheduler logic to do simulation. Keep in mind this is important! 😄

Thank you @Jeffwan !!!

@canhnt I sync with release team. We backport ./hack/update-vendor.sh from later branch and update base for those changes… I update to v1.28.14 and problem should be resolved. Thanks for the feedback!

I hit the same issue, downgrading to 1.14.7 seems to work fine. I believe we need to cherry-pick #2323 (or a more recent AWS SDK update).