terraform-provider-ibm: Intermittently ibm_container_vpc_cluster fails saying: "A cluster with the same name already exists"

Internal Project Golden Eye

We have noticed that intermittently (but not very frequently), when creating a cluster using ibm_container_vpc_cluster the IKS api responds with Error: Request failed with status code: 409, ServerErrorResponse: {"incidentID":"1a1240b1-c05e-481d-90f4-eefdb89a03b0,1a1240b1-c05e-481d-90f4-eefdb89a03b0","code":"E0007","description":"A cluster with the same name already exists. Choose another name.","type":"Provisioning"}

Any time this has happened, we have logged in and checked for a cluster with that name. In every case, we have found that actually the ibm_container_vpc_cluster did successfully create the cluster! And the timestamp matches the timestamp in the logs. So why is IKS api failing with that error?

We reproduced the issue with trace logs, but to be honest I am struggling to see what the root cause is. Is it possible that somewhere in the provider code, a process to provision the cluster was kicked off, but due to some glitch, it was kicked off a second time, and so ended up with the IKS api response to say cluster with that name already exists?

Here is a screenshot which shows the creation time of the cluster (in UTC +1 time) - 12.04pm: image

And below I have attached the logs (including trace log) which show the timestamp matches the cluster creation (these logs are in UTC time):

TestOCPSMBasic/TestOCPSMBasic_0 2021-10-13T11:04:11Z logger.go:66: ibm_container_vpc_cluster.cluster: Creating...

Community Note

  • Please vote on this issue by adding a πŸ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave β€œ+1” or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform IBM Provider Version

Terraform v1.0.8 on darwin_amd64

  • provider registry.terraform.io/ibm-cloud/ibm v1.34.0

Affected Resource(s)

  • ibm_container_vpc_cluster

Terraform Configuration Files

Internal url: https://github.ibm.com/GoldenEye/ocp-service-mesh-module/tree/master/examples/basic

Debug Output

stdout.txt trace.log

Panic Output

Expected Behavior

api should not fail if cluster provisioning passed

Actual Behavior

api response saying cluster already exists

Steps to Reproduce

  1. terraform apply

Important Factoids

References

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 19

Most upvoted comments

Thanks for confirming this. We will update our provider version to a newer version so that we get correct logging. I am working with IKS to find out the root cause of the 500. I don’t think that any changes are needed from the provider side - IKS need to ensure that the cluster provisioning process does not proceed if it returns a 500 response.