terraform-provider-rancher2: Intermittently imports of EKS clusters never finish
Versions
- Rancher version: 2.6.8
- Rancher Terraform provider: 1.24.0
- Terraform: 1.2.2
Information about the Cluster
- Kubernetes version: 1.21
- Cluster Type (Local/Downstream): Downstream
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): Hosted EKS
Describe the bug
Sometimes importing an EKS cluster will never complete (saying “Still creating…” for 30 min then time-out), but the cluster is active in the Rancher instance. Other times it finishes in seconds. To Reproduce
Using this code to import the cluster. The aws-auth configMap has already been updated with the user referred to by the cloud_credential.
resource "rancher2_cloud_credential" "this" {
name = var.name_prefix
description = "Credentials used for managing ${var.name_prefix}"
amazonec2_credential_config {
access_key = aws_iam_access_key.rancher.id
secret_key = aws_iam_access_key.rancher.secret
}
}
resource "rancher2_cluster" "imported_eks_cluster" {
name = var.cluster_id
description = "Terraform EKS cluster"
eks_config_v2 {
cloud_credential_id = rancher2_cloud_credential.this.id
name = var.cluster_id
region = var.region
imported = true
}
}
Result Sometimes this happens until the time-out but the cluster is active in Rancher:
module.import_to_rancher[0].rancher2_cluster.imported_eks_cluster: Still creating... [10m40s elapsed]
module.import_to_rancher[0].rancher2_cluster.imported_eks_cluster: Still creating... [10m50s elapsed]
module.import_to_rancher[0].rancher2_cluster.imported_eks_cluster: Still creating... [11m0s elapsed]
module.import_to_rancher[0].rancher2_cluster.imported_eks_cluster: Still creating... [11m10s elapsed]
...
Error: [ERROR] waiting for cluster (c-xfbkg) to be created: timeout while waiting for state to become 'pending' (last state: 'active', timeout: 30m0s)
│
│ with module.import_to_rancher[0].rancher2_cluster.imported_eks_cluster,
│ on .terraform/modules/import_to_rancher/main.tf line 27, in resource "rancher2_cluster" "imported_eks_cluster":
│ 27: resource "rancher2_cluster" "imported_eks_cluster" {
Expected Result
The cluster is consistently imported in a few seconds. Screenshots
Additional context
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 2
- Comments: 17 (6 by maintainers)
Will be tested via https://github.com/rancher/eks-operator/issues/84
We also sometimes encountered the problem mentioned at the beginning that the expectedState between Terraform (pending) and the status of the Rancher import (active) did not match.
Our previous workarounds were to avoid the “active” state by, for example, setting up the authorisation or the network connection at a later time. In the end, however, this only resulted in the status in Rancher being “waiting” and also did not match the “pending” expected by the Terraform provider.
In my opinion, “active” should definitely be included in the expectedStates. Whether “waiting” should be part of the expectedState is certainly a topic for discussion and depends on whether the status of a successful import or only the status of a successfully created import “resource” is to be checked here. The latter would also include “waiting”, since as soon as all prerequisites have been met, the import continues and hopefully jumps to the “Active” state.
Currently our solution is to use the implemented fix from PR https://github.com/rancher/terraform-provider-rancher2/pull/1114 and we can confirm that it works fine.
Versions used:
@cpinjani - please link your testplan here once you start working on rancher/eks-operator#84
Hey,
I have been testing this locally and was not able to reproduce it after trying and applying it multiple times (maybe I was lucky).
All tests that I have done with versions:
Test1:
Test 2:
Also, there is a PR that I submitted https://github.com/rancher/terraform-provider-rancher2/pull/1114 that tries to fix this issue
As workaround for those who think they need to destroy their entire state to reimport, I was able to get away with just removing rancher2_cluster via
and then import it via
Thus I didn’t need to kill everything terraform had managed to provision so far. Seemed working.
Ran into this today. From provider config: https://github.com/rancher/terraform-provider-rancher2/blob/master/rancher2/resource_rancher2_cluster.go#L135
it appears provider expects state to become pending first. However, if rancher side is faster than provider polling loop then rancher cluster may become active so fast that provider misses it. From what limited understanding of Go I have, I understand that it would actually possible to wait for multiple targets in
If for EKS it would be allowed to test against both pending and active targets, this probably could be fixed?
I have been seeing this as well, on successful runs it takes seconds, but occasionally this hangs.