terraform-provider-rancher2: rancher2_cluster_sync failed with [file-deployer] is still running
Hi,
we have the problem that the rancher2_cluster_sync resource failed with an error.
Versions:
terraform 1.0.0
rancher2 = {
source = "rancher/rancher2"
version = "1.15.1"
}
rke = {
source = "rancher/rke"
version = "1.2.1"
}
Resource:
resource "rancher2_cluster_sync" "wait-for-cluster-is-ready" {
depends_on = [
null_resource.init_cluster,
]
cluster_id = rancher2_cluster.rancher-cluster.id
state_confirm = 2
wait_alerting = false
wait_catalogs = false
wait_monitoring = false
}
Error: Cluster ID c-5srfq: Container [file-deployer] is still running on host [10.106.84.13]: stderr: [], stdout: []
with module.ahoj-cluster.rancher2_cluster_sync.wait-for-cluster-is-ready,
on modules/rancher-worker/rancher-cluster.tf line 190, in resource "rancher2_cluster_sync" "wait-for-cluster-is-ready":
190: resource "rancher2_cluster_sync" "wait-for-cluster-is-ready" {
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 25 (11 by maintainers)
It seems the issue is caused by a cluster state flapping.
Submitted PR https://github.com/rancher/terraform-provider-rancher2/pull/732 to fix the issue. When the condition is false, added extra check if
Condition.LastUpdateTime> 120s, before returning error.@gohumble , unfortunately this is expected due to 1.17.1 doesn’t have any other fix for this.
The issue seems a race condition occurring at Rancher not at the tf provider. The condition should be
unknownuntil the transition is finished, if not there is no chance to check the cluster state from the API. Investigating on this way.