terraform-provider-kubernetes: AKS - dial tcp [::1]:80: connect: connection refused on all plans modifying the azurerm_kubernetes_cluster resource

Please see this comment for the explanation of the root cause.

Terraform Version, Provider Version and Kubernetes Version

Terraform version: 0.15, 1.0
Kubernetes provider version: 2.3.0
Kubernetes version: 1.19

Affected Resource(s)

all resources

Terraform Configuration Files

Our configuration is almost identical to your aks example code, so I tried using that and replicated the behaviour.

Note - I simulated by modifying the workers_count variable in the aks-cluster directory, however this isn’t actually implemented in your code. Modify line 23 to be node_count = var.workers_count, then pass a new value in via the aks-cluster module in main.tf.

Steps to Reproduce

Update the aks-cluster module as directed above to support workers_count
terraform apply
Change the workers_count variable to any other value
terraform plan

Expected Behavior

Terraform should display a plan showing the updated node pool count.

Actual Behavior

The following error is reported:

$ terraform plan
random_id.cluster_name: Refreshing state... [id=aE_9C3A]
module.aks-cluster.azurerm_resource_group.default: Refreshing state... [id=/subscriptions/ff83a9d2-8d6e-4c4a-8b34-641163f8c99f/resourceGroups/tf-k8s-684ffd0b70]
module.aks-cluster.azurerm_kubernetes_cluster.default: Refreshing state... [id=/subscriptions/ff83a9d2-8d6e-4c4a-8b34-641163f8c99f/resourcegroups/tf-k8s-684ffd0b70/providers/Microsoft.ContainerService/managedClusters/tf-k8s-684ffd0b70]
module.kubernetes-config.local_file.kubeconfig: Refreshing state... [id=1ca8ad3c1c7f4aff65e5eda0038b619788b0956a]
module.kubernetes-config.helm_release.nginx_ingress: Refreshing state... [id=nginx-ingress-controller]
module.kubernetes-config.kubernetes_namespace.test: Refreshing state... [id=test]
╷
│ Error: Get "http://localhost/api/v1/namespaces/test": dial tcp [::1]:80: connect: connection refused
│
│   with module.kubernetes-config.kubernetes_namespace.test,
│   on kubernetes-config/main.tf line 14, in resource "kubernetes_namespace" "test":
│   14: resource "kubernetes_namespace" "test" {
│
╵
╷
│ Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
│
│   with module.kubernetes-config.helm_release.nginx_ingress,
│   on kubernetes-config/main.tf line 59, in resource "helm_release" "nginx_ingress":
│   59: resource helm_release nginx_ingress {

The data source is clearly not passing back valid data, even though there is a dependency on the aks-cluster module.

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 25
Comments: 17 (5 by maintainers)

Most upvoted comments

Hi, I’m sorry to hear you all are struggling with this dependency issue. I’ve done extensive research in this area and come across similar scenarios. The cause has to do with passing an unknown value to a provider configuration block, which is not supported in Terraform core. To quote their docs:

You can use expressions in the values of these configuration arguments, 
but can only reference values that are known before the configuration is applied.

When you make a change to the underlying infrastructure, such as node count, you’re passing an unknown value into the Kubernetes provider configuration block, since the full scope of the cluster infrastructure is not known until after the change has been applied to the AKS cluster. That’s why Terraform is behaving as if it’s not reading the cluster’s data source properly.

Although I did write the initial guide to show that it can be possible to work around some of these issues, as you’ve found from experience, there are many edge cases that make it an unreliable and unintuitive process, to get the Kubernetes provider working alongside the underlying infrastructure. This is due to a long-standing limitation in Terraform, that can’t be fixed in any provider, but we do have plans to smooth out the bumps a little by adding better error messages upfront, so that users don’t run into this on subsequent applies.

I thought at first that I could list out every work-around to help users keep their preferred workflow of having the cluster in the same Terraform state as the Kubernetes resources. Most cases can be worked around using terraform state rm module.kubernetes-config or terraform apply -target=module.aks-cluster, but I think encouraging this kind of work-around will cause more headaches in the long run, as it puts the user in charge of figuring out when to use special one-off apply commands, rather than setting up Terraform to behave reliably and predictably from the start. Plus it can have unintended side-effects, like orphaning cloud resources.

That’s why I have a new guide in progress here, which shows the most reliable method that we have so far: the cluster infrastructure needs to be kept in a state separate from the Kubernetes and Helm provider resources.

https://github.com/hashicorp/terraform-provider-kubernetes/tree/e058e225e621f06e393bcb6407e7737fd43817bd/_examples/aks

I know this is inconvenient, which is why we continue to try and accommodate users in single-apply scenarios, and scenarios which contain the Kubernetes and cluster resources in the same Terraform state. However, until upstream Terraform can add support for this, the single-apply workflow will remain buggy and less reliable than separating cluster infrastructure from Kubernetes resources.

+23

dak1n1 on Jul 2, 2021

Running terraform apply -refresh=false will just silently do the right thing BTW, as it won’t try to refresh the current state. (And in general since this only happens when cluster gets recreated - no refresh isn’t as bad as it sounds).

favoretti on Dec 14, 2021

I encountered same problem on GKE @dak1n1 terraform state rm -target modules.my-gke-module terraform plan -target modules.my-gke-module terraform apply -target modules.my-gke-module helped to fix my problem. Thanks. I will try your new guide with GKE

BoHuang2018 on Nov 2, 2021

@dak1n1 this error message is unintuitive as it isn’t explaining why the error is occurring and leads to significant lost time tracking down the cause. Furthermore, if the error here is correct it means that Terraform has attempted to connect to the localhost cluster, which could have unintended consequences if there is such a cluster.

stevehipwell on Oct 21, 2021

I have been testing using an alias azure provider for the data query and this seems to be a viable workaround for the issue. More testing is obviously required…from Steph’s code example (https://github.com/hashicorp/terraform-provider-kubernetes/blob/main/_examples/aks/main.tf) the change will look like

provider "azurerm" {
  alias = "azurerm_k8s"
  features {}
}

data "azurerm_kubernetes_cluster" "default" {
  provider            = azurerm.azurerm_k8s
  depends_on          = [module.aks-cluster] # refresh cluster state before reading
  name                = local.cluster_name
  resource_group_name = local.cluster_name
}

Look forward to the communities feedback. thank you

pmnathan on Sep 21, 2021