terraform-provider-rancher2: [BUG] kubelet-args are not supported via TF

Rancher Server Setup

  • Rancher version: v2.7.0
  • Installation option (Docker install/Helm Chart):
    • If Helm Chart, Kubernetes Cluster and version: 3-node cluster, rancher-stable, v1.24.9+rke2r1
  • Proxy/Cert Details: no proxy, self-signed

Information about the Cluster

  • In process of creating downstream custom cluster

User Information

  • What is the role of the user logged in?: Admin

Provider Information

  • What is the version of the Rancher v2 Terraform Provider in use?
    • 1.25.0
  • What is the version of Terraform in use?
    • v1.2.9

Describe the bug

Using kubelet-arg via Terraform rancher2 provider resulting in unexpected behavior of cluster not starting. This bug is described here https://github.com/rancher/rancher/issues/38112 I think this is specific to the Terraform provider, not the Rancher itself.

Kubelet-arg which is part of config under machine_selector_config should be list type not string. In other words, config should be able to accept list types.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 20 (17 by maintainers)

Most upvoted comments

above test results have been updated + completed. Closing out this issue now

Ticket rancher/dashboard#1074 - Test Results - ✅

Verified on Rancher v2.7.8-rc1:

Scenario Test Case Result
1. Provision a downstream rke2 cluster with Machine Selector Config and 2 kubelet args set
2. Update: Add/remove a kubelet-arg via tf
3. Provision a downstream rke2 cluster with tf 3.1.0 => Upgrade tf to v3.2.0-rc3 and add machine selector config with 2 kubelet args => update/modify kubelet args once more and verify they are successfully accepted + functional pending/blocked

Scenario 1 - ✅

  1. Fresh install of Rancher v2.7.8-rc1
  2. Using tfp-rancher2 v3.2.0-rc4, provision a downstream RKE2 AWS Node driver cluster, using machine_selector_config block and defining 2 kubelet arguments - [ i used the main.tf shown below]
terraform {
  required_providers {
    rancher2 = {
      source  = "terraform.local/local/rancher2"
      version = "3.2.0-rc3"
    }
  }
}

provider "rancher2" {
  api_url   = "<REDACTED>"
  token_key = "<REDACTED>"
  insecure  = true
}

resource "rancher2_cloud_credential" "rancher2_cloud_credential" {
  name = "tf-creds-rke2"
  amazonec2_credential_config {
    access_key = "<REDACTED>"
    secret_key = "<REDACTED>"
  }
}

resource "rancher2_machine_config_v2" "rancher2_machine_config_v2" {
  generate_name = "tf-rke2"
  amazonec2_config {
    ami            = ""
    region         = "<REDACTED>"
    security_group = ["<REDACTED>"]
    subnet_id      = "<REDACTED>"
    vpc_id         = "<REDACTED>"
    zone           = "<REDACTED>"
  }
}

resource "rancher2_cluster_v2" "rancher2_cluster_v2" {
  name                                     = "jkeslar"
  kubernetes_version                       = "v1.26.8+rke2r1"
  enable_network_policy                    = false
  default_cluster_role_for_project_members = "user"
  rke_config {
    machine_selector_config {
      config = <<EOF
        kubelet-arg:
          - cloud-provider=external
          - max-pods=250
    EOF
    }
    machine_pools {
      name                         = "pool1"
      cloud_credential_secret_name = rancher2_cloud_credential.rancher2_cloud_credential.id
      control_plane_role           = false
      etcd_role                    = true
      worker_role                  = false
      quantity                     = 1
      machine_config {
        kind = rancher2_machine_config_v2.rancher2_machine_config_v2.kind
        name = rancher2_machine_config_v2.rancher2_machine_config_v2.name
      }
    }
    machine_pools {
      name                         = "pool2"
      cloud_credential_secret_name = rancher2_cloud_credential.rancher2_cloud_credential.id
      control_plane_role           = true
      etcd_role                    = false
      worker_role                  = false
      quantity                     = 1
      machine_config {
        kind = rancher2_machine_config_v2.rancher2_machine_config_v2.kind
        name = rancher2_machine_config_v2.rancher2_machine_config_v2.name
      }
    }
    machine_pools {
      name                         = "pool3"
      cloud_credential_secret_name = rancher2_cloud_credential.rancher2_cloud_credential.id
      control_plane_role           = false
      etcd_role                    = false
      worker_role                  = true
      quantity                     = 1
      machine_config {
        kind = rancher2_machine_config_v2.rancher2_machine_config_v2.kind
        name = rancher2_machine_config_v2.rancher2_machine_config_v2.name
      }
    }
  }
}
  1. Verified - cluster successfully provisions and kubelet args are successfully set; kubelet args are seen via Rancher UI; as expected

Scenario 2 - ✅

  1. Resuming where Scenario 1 left off, update max-pod limit to 255, using tfp-rancher2 v3.2.0-rc4 and re-run terraform apply
  2. Verified - max-pod kubelet arg successfully updated; as expected
  3. Using tfp-rancher2 v3.2.0-rc4, remove and delete kubelet args
  4. Verified - kubelet args successfully removed; as expected

Scenario 3 - ✅

  1. Fresh install of Rancher v2.7.8
  2. Using tfp-rancher2 v3.1.0, provision a downstream RKE2 AWS Node driver cluster
  3. Once active, update tfp-rancher2 to v3.2.0-rc5
  4. Using tfp-rancher2 v3.2.0-rc5, define a machine_selector_config block and set multiple kubelet-args under config
  5. Verified - cluster successfully and accurately updates w/ kubelet args; verified via cluster.yml

@Josh-Diamond There’s some confusion about how/which arguments to pass via TF to the kubelet for a working v2 cluster. Here’s a working example. I updated the Test Template.

// Working example

machine_selector_config {
  config = <<EOF
    kubelet-arg:
      - protect-kernel-defaults=true
      - cloud-provider=external
EOF
}

I also missed a backport to release/v3. After I get that in and cut a new RC, please re-test this on v3.2.0-rc4.

@boris-stojnev Sorry, I was asking about machine_global_config earlier which supports yaml. Made a spelling error in the comment. Since investigating, I don’t think using machine_global_config is an ideal workaround because it deviates from the options exposed in the rancher UI.

I tried with the following config

machine_selector_config {
      config = {
        kubelet-arg = "cloud-provider=external"
      }
    }

with TF 1.25 and reproduced https://github.com/rancher/rancher/issues/38112

The kubelet-arg = "--protect-kernel-defaults" causes rancher to treat every char as an array element which makes it look like this in rancher. machineSelectorConfig: - config: kubelet-arg: - '-' - '-' - p - r - o - t - e - c - t - '-' - k - e - r - "n" - e - l - '-' - d - e - f - a - u - l - t - s

causing the UI to look like this image

TF needs to have parity with Rancher so it needs to support passing multiple kubelet-arg to the machine_selector_config but from your comment https://github.com/rancher/terraform-provider-rancher2/issues/1074#issuecomment-1648412146 and what I see, this doesn’t appear possible without a fix. This is a confirmed bug. I think updating the machine_selector_config.Config to a Type.List should solve this but will have to test it. We may need to add state migration logic as well or users with clusters already provisioned with an earlier version of TF may see them break.