kube2iam: Pods are sometimes assigned to the incorrect IAM role
In our cluster sometimes the pods that have a proper “iam.amazonaws.com/role” annotation do not receive their role when they start up. kube2iam returns the default role to them which in our case does not have any permissions. After some time of the application requests the credentials again, it gets the proper assignment. Relevant log messages:
level=info msg="Requesting /latest/meta-data/iam/security-credentials/"
level=warning msg="Using fallback role for IP 10.233.109.12"
level=info msg="Requesting /latest/meta-data/iam/security-credentials/kube.no-permissions"
level=warning msg="Using fallback role for IP 10.233.109.12"
.... some time later ....
level=info msg="Requesting /latest/meta-data/iam/security-credentials/"
level=info msg="Requesting /latest/meta-data/iam/security-credentials/kube.kube-system.route53-kubernetes"
I am not really sure how to debug this further, it might be related to the issues described in #32
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 5
- Comments: 21 (11 by maintainers)
We had this problem in one of our production clusters just over a week ago- we noticed that some processes would go unhealthy periodically and through the logs noticed they were attempting to use credentials for a different role (and thus being denied access to the AWS resources they needed). We’d been fine for a few months but we crossed some kind of cluster load that meant stuff started breaking.
I started working on moving code around to support broader locks but made enough changes (both to address this and to incorporate prefetching and other bits) it ended up easier to just start from scratch. To address this (the incorrect IAM role) issue we found a thread-safe cache inside
k8s.io/client-gothat can watch + sync pod state.kiam (which I’d like to think is something like the child of kube2iam 😄 ) is at https://github.com/uswitch/kiam if its helpful.
Thanks for fixing this @jtblin! Can you tell already when there will be a new release that includes the fix?
IndexInformer will probably fix this too, so maybe we should wait.