terraform-provider-azurerm: AKS node pool k8s version not being updated

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

  • terraform version: 0.12.8
  • azurerm provider version: 1.41

Affected Resource(s)

  • azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_kubernetes_cluster" "k8s" {
  name                = var.cluster_info.name
  location            = azurerm_resource_group.k8s.location
  dns_prefix          = var.cluster_info.dns_prefix
  resource_group_name = azurerm_resource_group.k8s.name
  kubernetes_version  = var.kubernetes_version

  role_based_access_control {
    enabled = ! local.aks_aad_skip_rbac
    dynamic "azure_active_directory" {
      for_each = "${! local.aks_aad_skip_rbac ? list(local.aks_rbac_setting) : []}"
      content {
        client_app_id     = local.aks_rbac_setting.client_app_id
        server_app_id     = local.aks_rbac_setting.server_app_id
        server_app_secret = local.aks_rbac_setting.server_app_secret
        tenant_id         = local.aks_rbac_setting.tenant_id
      }
    }
  }

  default_node_pool {
    name               = var.agent_pool.name
    node_count         = var.agent_pool.count
    vm_size            = "Standard_DS2_v2"
    type               = "VirtualMachineScaleSets"
    os_disk_size_gb    = 30
    max_pods           = 30
    availability_zones = local.sanitized_availability_zones

    enable_auto_scaling= true
    min_count = 3
    max_count = 12
  }

  service_principal {
    client_id     = var.aks_login.client_id
    client_secret = var.aks_login.client_secret
  }

  addon_profile {
    oms_agent {
      enabled                    = true
      log_analytics_workspace_id = azurerm_log_analytics_workspace.k8s_logs.id
    }
  }

  tags = {
    CreatedBy   = var.tags.created_by != "" ? var.tags.created_by : null
    ChangedBy   = var.tags.changed_by != "" ? var.tags.changed_by : null
    Environment = var.tags.environment != "" ? var.tags.environment : null
  }

  lifecycle {
    ignore_changes = [
      role_based_access_control, 
      role_based_access_control["azure_active_directory"],
      agent_pool_profile.0.count]
  }

  network_profile {
    network_plugin    = "azure"
    load_balancer_sku = var.aks_load_balancer_sku
  }
}

Expected Behavior

With azurerm_provider == 1.39, the kubelet version in our cluster’s node pool would be updated (in addition to the AKS k8s version) by changing var.kubernetes_version

Actual Behavior

With azurerm_provider == 1.41, the kubelet version in our cluster’s node pool is not updated (in addition to the AKS k8s version) by changing var.kubernetes_version

Important Factoids

  • Switching back to azurerm_provider == 1.39 and doing terraform apply seems to perform the expected behavior, even if the exact same config was previously run with azurerm_provider == 1.41.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 61
  • Comments: 19 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Using azurerm provider at version “=2.1.0”, I’ve upgraded an azurerm_kubernetes_cluster resource from 1.14 to 1.15. The Control Plane has seemed to upgrade to 1.15, but the VMSS node pool has stayed behind at 1.14.

What is the current expected behavior Kubernetes upgrade and their node pools? I understand (https://docs.microsoft.com/en-us/azure/aks/use-multiple-node-pools#validation-rules-for-upgrades) expresses certain validation conditions.

Will the node pool upgrade when it’s obligated to? or can we control this as @jstevans suggested with an input?

I’d like to know this!

Coupling by default could work, as long as it can be disabled. Upgrading control plane and default node pool with one terraform apply could be very impactful to a cluster. I’m commenting to vote on exposing OrchestratorVersion for default node pool as well as the node pool resource.

I would rather have OrchestratorVersion exposed ASAP without the proper locking even if it means I can control it. My current plans are to use the GRAPH API directly to upgrade node pools. https://docs.microsoft.com/en-us/rest/api/aks/agentpools/createorupdate

I’m available (with Azure resources as well) to test potential patches and I am able to write golang, but I lack terraform internals experience. Let me know how I can help.

The ruleset between control plane and agent pools are defined in this public document. There is a window of config drift you are allowed to have between the control plane and each agent pool. https://docs.microsoft.com/en-us/azure/aks/use-multiple-node-pools#upgrade-a-cluster-control-plane-with-multiple-node-pools

Rules for valid versions to upgrade node pools:

The node pool version must have the same major version as the control plane. The node pool minor version must be within two minor versions of the control plane version. The node pool version cannot be greater than the control major.minor.patch version.

Hope this helps.

@tombuildsstuff 🆙 any info on the above ?