terraform-provider-rancher2: rancher2_cluster_sync wait_catalogs=true causing 500 errors

Hi.

We are trying the new wait_for_catalogs=true attribute setting on our rancher2_cluster_sync resource, on order to resolve this issue: rancher/terraform-provider-rancher2#627 (I believe this is the suggested fix, as simply taking the rancher2 terraform provider v1.14.0 did not resolve that issue.)

With wait_for_catalogs=true we are getting Terraform apply failures due to a 500 error. After running Terraform, we can verify that the URL that the error mentions is working. I think the retry count should be increased or made configurable.

resource "rancher2_cluster_sync" "this" {
  cluster_id    = rancher2_cluster.this.id
  wait_catalogs = true
}
module.stellar.rancher2_cluster_sync.this: Still creating... [10s elapsed]
module.stellar.rancher2_cluster_sync.this: Still creating... [20s elapsed]
module.stellar.rancher2_cluster_sync.this: Still creating... [30s elapsed]
module.stellar.rancher2_cluster_sync.this: Still creating... [40s elapsed]
module.stellar.rancher2_cluster_sync.this: Still creating... [50s elapsed]
Error: [ERROR] waiting for cluster ID (c-98b2w) downloading catalogs: [ERROR] getting catalog V2 list at cluster ID (c-98b2w): Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [dial tcp 127.0.0.1:6080: connect: connection refused] from [https://redacted/k8s/clusters/c-98b2w/v1]
  on ../rancher_cluster.tf line 33, in resource "rancher2_cluster_sync" "this":

What Happened

The rancher2_cluster_sync resource fails with a 500 status code when wait_for_catalogs=true

What I Expected

The rancher2_cluster_sync resource should be more tolerant to errors, or make retry counts configurable in the provider.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 7
  • Comments: 16 (8 by maintainers)

Most upvoted comments

Still seeing this issue under tf1.1.7 / provider version 1.22.2 Im actually adding catalogs after creating a cluster and use cluster_sync to ensure the cluster is up. I need to add a insane state_confirm value (currently at 100, going to try and decrease it) to make it wait long enough else I get the following error: edit: its working at no value below 20

Error: Creating Catalog V2: Unknown schema type [catalog.cattle.io.clusterrepo]
 
   with rancher2_catalog_v2.helm_catalogs[2],
   on main.tf line 18, in resource "rancher2_catalog_v2" "helm_catalogs":
   18: resource "rancher2_catalog_v2" "helm_catalogs" {

I’m still seeing this issue with TF 1.3.2 and rancher provider 1.24.1.

I am also seeing this issue. Any updates on how to fix this?

Released tfp v1.15.1 including the PR #668 to fix the issue.

Hi @armsnyder , the retries logic seems to be working fine, but agreed with you that should be configurable. As you mentioned, default retries (3 retries with 5s ticks) are not enough, so getting 500 errors.

I’ve sumitted PR https://github.com/rancher/terraform-provider-rancher2/pull/663, deprecating the retries argument in favour of timeout new argument. The main difference is that timeout can be configurable in more intuitive way (golang duration format), and same timeout would be applied when having rancher connection issues and when getting 500 and Unknown schema type errors. Please, take a look