terraform-provider-rancher2: rancher2_cluster_sync wait_catalogs=true causing 500 errors
Hi.
We are trying the new wait_for_catalogs=true attribute setting on our rancher2_cluster_sync resource, on order to resolve this issue: rancher/terraform-provider-rancher2#627 (I believe this is the suggested fix, as simply taking the rancher2 terraform provider v1.14.0 did not resolve that issue.)
With wait_for_catalogs=true we are getting Terraform apply failures due to a 500 error. After running Terraform, we can verify that the URL that the error mentions is working. I think the retry count should be increased or made configurable.
resource "rancher2_cluster_sync" "this" {
cluster_id = rancher2_cluster.this.id
wait_catalogs = true
}
module.stellar.rancher2_cluster_sync.this: Still creating... [10s elapsed]
module.stellar.rancher2_cluster_sync.this: Still creating... [20s elapsed]
module.stellar.rancher2_cluster_sync.this: Still creating... [30s elapsed]
module.stellar.rancher2_cluster_sync.this: Still creating... [40s elapsed]
module.stellar.rancher2_cluster_sync.this: Still creating... [50s elapsed]
Error: [ERROR] waiting for cluster ID (c-98b2w) downloading catalogs: [ERROR] getting catalog V2 list at cluster ID (c-98b2w): Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [dial tcp 127.0.0.1:6080: connect: connection refused] from [https://redacted/k8s/clusters/c-98b2w/v1]
on ../rancher_cluster.tf line 33, in resource "rancher2_cluster_sync" "this":
What Happened
The rancher2_cluster_sync resource fails with a 500 status code when wait_for_catalogs=true
What I Expected
The rancher2_cluster_sync resource should be more tolerant to errors, or make retry counts configurable in the provider.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 7
- Comments: 16 (8 by maintainers)
I am also seeing this issue. Any updates on how to fix this?
Released tfp v1.15.1 including the PR #668 to fix the issue.
Hi @armsnyder , the retries logic seems to be working fine, but agreed with you that should be configurable. As you mentioned, default retries (3 retries with 5s ticks) are not enough, so getting
500errors.I’ve sumitted PR https://github.com/rancher/terraform-provider-rancher2/pull/663, deprecating the
retriesargument in favour oftimeoutnew argument. The main difference is that timeout can be configurable in more intuitive way (golang duration format), and same timeout would be applied when having rancher connection issues and when getting500andUnknown schema typeerrors. Please, take a look