terraform-provider-kubernetes: AKS - dial tcp [::1]:80: connect: connection refused on all plans modifying the azurerm_kubernetes_cluster resource
Please see this comment for the explanation of the root cause.
Terraform Version, Provider Version and Kubernetes Version
Terraform version: 0.15, 1.0
Kubernetes provider version: 2.3.0
Kubernetes version: 1.19
Affected Resource(s)
- all resources
Terraform Configuration Files
Our configuration is almost identical to your aks example code, so I tried using that and replicated the behaviour.
Note - I simulated by modifying the workers_count variable in the aks-cluster directory, however this isn’t actually implemented in your code. Modify line 23 to be node_count = var.workers_count, then pass a new value in via the aks-cluster module in main.tf.
Steps to Reproduce
- Update the
aks-clustermodule as directed above to support workers_count terraform apply- Change the workers_count variable to any other value
terraform plan
Expected Behavior
Terraform should display a plan showing the updated node pool count.
Actual Behavior
The following error is reported:
$ terraform plan
random_id.cluster_name: Refreshing state... [id=aE_9C3A]
module.aks-cluster.azurerm_resource_group.default: Refreshing state... [id=/subscriptions/ff83a9d2-8d6e-4c4a-8b34-641163f8c99f/resourceGroups/tf-k8s-684ffd0b70]
module.aks-cluster.azurerm_kubernetes_cluster.default: Refreshing state... [id=/subscriptions/ff83a9d2-8d6e-4c4a-8b34-641163f8c99f/resourcegroups/tf-k8s-684ffd0b70/providers/Microsoft.ContainerService/managedClusters/tf-k8s-684ffd0b70]
module.kubernetes-config.local_file.kubeconfig: Refreshing state... [id=1ca8ad3c1c7f4aff65e5eda0038b619788b0956a]
module.kubernetes-config.helm_release.nginx_ingress: Refreshing state... [id=nginx-ingress-controller]
module.kubernetes-config.kubernetes_namespace.test: Refreshing state... [id=test]
╷
│ Error: Get "http://localhost/api/v1/namespaces/test": dial tcp [::1]:80: connect: connection refused
│
│ with module.kubernetes-config.kubernetes_namespace.test,
│ on kubernetes-config/main.tf line 14, in resource "kubernetes_namespace" "test":
│ 14: resource "kubernetes_namespace" "test" {
│
╵
╷
│ Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
│
│ with module.kubernetes-config.helm_release.nginx_ingress,
│ on kubernetes-config/main.tf line 59, in resource "helm_release" "nginx_ingress":
│ 59: resource helm_release nginx_ingress {
The data source is clearly not passing back valid data, even though there is a dependency on the aks-cluster module.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 25
- Comments: 17 (5 by maintainers)
Hi, I’m sorry to hear you all are struggling with this dependency issue. I’ve done extensive research in this area and come across similar scenarios. The cause has to do with passing an unknown value to a provider configuration block, which is not supported in Terraform core. To quote their docs:
When you make a change to the underlying infrastructure, such as node count, you’re passing an unknown value into the Kubernetes provider configuration block, since the full scope of the cluster infrastructure is not known until after the change has been applied to the AKS cluster. That’s why Terraform is behaving as if it’s not reading the cluster’s data source properly.
Although I did write the initial guide to show that it can be possible to work around some of these issues, as you’ve found from experience, there are many edge cases that make it an unreliable and unintuitive process, to get the Kubernetes provider working alongside the underlying infrastructure. This is due to a long-standing limitation in Terraform, that can’t be fixed in any provider, but we do have plans to smooth out the bumps a little by adding better error messages upfront, so that users don’t run into this on subsequent applies.
I thought at first that I could list out every work-around to help users keep their preferred workflow of having the cluster in the same Terraform state as the Kubernetes resources. Most cases can be worked around using
terraform state rm module.kubernetes-configorterraform apply -target=module.aks-cluster, but I think encouraging this kind of work-around will cause more headaches in the long run, as it puts the user in charge of figuring out when to use special one-off apply commands, rather than setting up Terraform to behave reliably and predictably from the start. Plus it can have unintended side-effects, like orphaning cloud resources.That’s why I have a new guide in progress here, which shows the most reliable method that we have so far: the cluster infrastructure needs to be kept in a state separate from the Kubernetes and Helm provider resources.
https://github.com/hashicorp/terraform-provider-kubernetes/tree/e058e225e621f06e393bcb6407e7737fd43817bd/_examples/aks
I know this is inconvenient, which is why we continue to try and accommodate users in single-apply scenarios, and scenarios which contain the Kubernetes and cluster resources in the same Terraform state. However, until upstream Terraform can add support for this, the single-apply workflow will remain buggy and less reliable than separating cluster infrastructure from Kubernetes resources.
Running
terraform apply -refresh=falsewill just silently do the right thing BTW, as it won’t try to refresh the current state. (And in general since this only happens when cluster gets recreated - no refresh isn’t as bad as it sounds).I encountered same problem on GKE @dak1n1
terraform state rm -target modules.my-gke-moduleterraform plan -target modules.my-gke-moduleterraform apply -target modules.my-gke-modulehelped to fix my problem. Thanks. I will try your new guide with GKE@dak1n1 this error message is unintuitive as it isn’t explaining why the error is occurring and leads to significant lost time tracking down the cause. Furthermore, if the error here is correct it means that Terraform has attempted to connect to the localhost cluster, which could have unintended consequences if there is such a cluster.
I have been testing using an alias azure provider for the data query and this seems to be a viable workaround for the issue. More testing is obviously required…from Steph’s code example (https://github.com/hashicorp/terraform-provider-kubernetes/blob/main/_examples/aks/main.tf) the change will look like
Look forward to the communities feedback. thank you