terraform-provider-kubernetes: v2.0.1 Authentication failures with token retrieved via aws_eks_cluster_auth
Terraform Version, Provider Version and Kubernetes Version
Terraform version: 0.12.24
Kubernetes provider version: 2.0.1
Kubernetes version: v1.16.15-eks-ad4801
Affected Resource(s)
Terraform Configuration Files
data "aws_eks_cluster" "c" {
name = var.k8s_name
}
data "aws_eks_cluster_auth" "c" {
name = var.k8s_name
}
provider "kubernetes" {
host = data.aws_eks_cluster.c.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.c.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.c.token
}
Debug Output
Panic Output
Steps to Reproduce
Expected Behavior
What should have happened? Resources should have been created/modified/deleted.1
Actual Behavior
What actually happened?
Error: the server has asked for the client to provide credentials
Error: Failed to update daemonset: Unauthorized
Error: Failed to update deployment: Unauthorized
Error: Failed to update deployment: Unauthorized
Error: Failed to update service account: Unauthorized
Error: Failed to update service account: Unauthorized
Error: Failed to delete Job! API error: Unauthorized
Error: Failed to update service account: Unauthorized
Error: the server has asked for the client to provide credentials
Error: the server has asked for the client to provide credentials
Error: Failed to update deployment: Unauthorized
Error: Failed to update service account: Unauthorized
Error: the server has asked for the client to provide credentials
Error: Failed to delete Job! API error: Unauthorized
Error: Failed to update daemonset: Unauthorized
Important Factoids
No, we’re just using EKS.
References
- GH-1234
Community Note
- Please vote on this issue by adding a +1 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 39
- Comments: 37 (10 by maintainers)
Commits related to this issue
- Add terraform refresh to destroy eks_components This is to fix the authentication error, caused by this issue: https://github.com/hashicorp/terraform-provider-kubernetes/issues/1131 — committed to ministryofjustice/cloud-platform-infrastructure by vijay-veeranki 3 years ago
- Add terraform refresh to destroy eks_components This is to fix the authentication error, caused by this issue: https://github.com/hashicorp/terraform-provider-kubernetes/issues/1131 — committed to ministryofjustice/cloud-platform-infrastructure by vijay-veeranki 3 years ago
- Add terraform refresh to destroy eks_components This is to fix the authentication error, caused by this issue: https://github.com/hashicorp/terraform-provider-kubernetes/issues/1131 — committed to ministryofjustice/cloud-platform-infrastructure by vijay-veeranki 3 years ago
- Add terraform refresh to destroy eks_components (#1373) This is to fix the authentication error, caused by this issue: https://github.com/hashicorp/terraform-provider-kubernetes/issues/1131 — committed to ministryofjustice/cloud-platform-infrastructure by vijay-veeranki 3 years ago
Using
execis not a viable solution when running in terraform cloud using remote execution. Our current thinking is to implement a workaround to essentially taint theaws_eks_cluster_authdata source so it gets refreshed for every plan. It would be ideal if the kubernetes provider had native support for getting and refreshing managed kubernetes service authentication tokens / credentials in order to support environments in which the only guaranteed tooling is terraform itself.Can you try running
terraform refreshto see if that pulls in a new token? The token generated byaws_eks_cluster_authis only valid for 15 minutes. For this reason, we recommend using an exec plugin to keep the token up to date automatically. Here’s an example of that configuration:Alternatively, running the Kubernetes provider in separate
terraform applyfrom the EKS cluster creation should work every time. (I’m not sure offhand if your EKS cluster is being created in the same apply, but just guessing since it’s a common configuration).There’s also a working EKS example you can compare with your configs. There are some improvements coming soon for the example, since we’re working on related authentication issues.
This issue in the very least should require a review of all of the official documentation, since you cannot actually use the provider in it’s documented state.
I’m just using local exec to deploy the few Kubernetes resources I want to “manage” with Terraform. At the moment I don’t want to split my rather small Terraform state into at least two layers just to be able to use the Kubernetes provider properly with an AWS EKS Kubernetes cluster 💁♀️
@dak1n1 This config worked for me. Thanks!
We run into this issue with virtually every apply now that we use Atlantis:
terraform planand comments the output on the PRatlantis apply(which causes Atlantis to runterraform apply)kubernetes_*resourcesThis happens whenever the time between step 2 and step 4 is more than 15 minutes.
The workaround of calling
aws eks get-tokenfrom the provider configuration would only work if we add the AWS CLI to the Atlantis container image. We can do that but it seems like a bit of a hack.Is it a limitation of Terraform that this provider cannot refresh the token during
apply? Is there a related Terraform issue?I run into the same problem in TFC. The cause is I used an IAM role as AWS provider.
I solved this problem by explicitly specifying the IAM role when I get a token such as:
Also, you may have to add your AWS region.
I would not call the
execsolution a hack. It’s the default and preferred mechanism for credentials access on both EKS and GKE and it’s what the official tooling from both cloud providers uses by default.Have a look at the contents of a kubeconfig file produces by the AWS CLI:
The same happens on GKE, and for good reason.
Most IAM systems advise to use short lived credentials obtained via some sort of dynamic role impersonation. EKS doesn’t allow setting the lifespan of the token for the same reason. They want users to adopt role impersonation, which is the least risky way to handle credentials. This really isn’t a hack.
Back on the topic of Terraform, there is a solid reason why the datasource is not refreshed before apply in your scenario. SInce Atlantis is supplying a pre-generated plan to the
terraform applycommand, the contract implies that those should be the only changes enacted by terraform during the apply. If it were to refresh datasources, that would potentially propagate new values through the plan potentially incurring changes to resources after the plan had been reviewed and approved, thus negating the value of that process.In conclusion, there really isn’t any better way of handling these short-lived credentials other than auth plugins.
Did some further digging and we may be barking in the wrong place: https://github.com/hashicorp/terraform-provider-aws/issues/10269#issuecomment-777906069
You can get around this with Kubernetes Service Account Tokens. The code snippet would look something like this:
Please see authn-authz example from the aidanmelen/kubernetes/rbac module for more information.
⚠️ This comes with the security trade-off since this token will need to be manually rotated.
A related issue to this, is that this provider seems to update the state with the changes that it attempted to apply, as if the apply was successful, even though the authentication failed due to expired credentials.
So if you plan a change, and then wait 15 minutes, and then try to apply the plan, you will get an error like “Error: the server has asked for the client to provide credentials”. Then if you try to plan again with
-refresh=false, there will be “No changes. Your infrastructure matches the configuration”. On large states this increases the pain of this issue considerably as it creates the need for repeated refreshing of the state which can take tens of minutes or more.TFC allows one to use custom agents, as docker containers. Should be easy to add the auth plugins to those. It implies managing your own worker pool, which isn’t what everyone may want to do. The TFC development team is aware of this limitation, but they may not be aware of the amount of users affected. It may help to add weight to the issue by letting them know about it using their support request inputs.
Thanks, this is the key insight I was missing, it is indeed not possible for the data source to be refreshed at apply time.
It’s unfortunate though that this means terraform cloud users are out of luck. We can build AWS CLI into our Atlantis image and set up processes for keeping it up to date, it’s an inconvenience but not that bad, but on some platforms there is no similar solution that would allow the
execapproach to be used.@jbg without logs and samples of your configuration, there isn’t a lot to go on in your report. Also, no Terraform, providers and cluster versions involved. Please help us help you.
@dak1n1 I am considering this as a temporary workaround.