cluster-api-provider-aws: "The workload cluster kubeconfig (for use by controllers, not end users) causes unauthorized errors after exactly 15 minutes"

What steps did you take and what happened: With the cluster-api-provider-aws, the kubeconfig secret contains a hinted token which is retrieved by the AWS STS service (check here for reference) and needs to be refreshed every 15 minutes in the current configuration.

Instead of recreating the client for the observed cluster before the token expires/after the token expires, cluster-api fails with unauthorised errors and stops working for a few minutes. Interestingly this is also affecting the cluster-autoscaler, failing unauthorised, restarting and continuing. It might also be a bug in the client-go dependency.

What did you expect to happen: The kubeconfig provided allows the client to refresh credentials after they are expired.

Anything else you would like to add: Logs of all cap* component: https://gist.github.com/xvzf/78d47ce6c3d6fabd49bc902e5d22d467 Manifests to reproduce it: https://gist.github.com/xvzf/87c934945ad97fe3bb9ee4934c6478ce

Environment:

  • Cluster-api version: v1.0.2
  • Cluster-api-aws version v1.2.0
  • Minikube/KIND version: EKS 1.21.2 (should be irrelevant)
  • Kubernetes version: (use kubectl version): 1.21.2
  • OS (e.g. from /etc/os-release): AmazonLinux

/kind bug

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 17 (11 by maintainers)

Most upvoted comments

/retitle “The workload cluster kubeconfig (for use by controllers, not end users) causes unauthorized errors after exactly 15 minutes”

This only affects managed (EKS) clusters.

/area provider/eks

This is a known issue affecting cluster-autoscaler as well: https://github.com/kubernetes/autoscaler/issues/4784

/triage accepted

/priority important-soon