terraform-provider-helm: Multiple helm_release resources concurrency issue (fatal error: concurrent map read and map write)

TL;DR: I created my own wrapper module (see below) for Helm provider’s helm_release resource. I use that wrapper module multiple times in my main.tf to roll out complete stack for my cluster (see below). I experience random crashes during terraform plan and apply:

Error: rpc error: code = Unavailable desc = transport is closing
Error: rpc error: code = Unavailable desc = transport is closing
Error: rpc error: code = Unavailable desc = transport is closing

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave “+1” or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version and Provider Version

  • Terraform v0.12.24 and v0.12.25 (always hangs)
  • terragrunt version v0.23.10 and v0.23.18

Provider Version

  • terraform-provider-helm_v1.2.1_x4

Affected Resource(s)

  • helm_release

Terraform Configuration Files

kubernetes-services-copy.zip

Debug Output

Stack Trace
fatal error: concurrent map read and map write

goroutine 31 [running]:
runtime.throw(0x2c94295, 0x21)
	/opt/goenv/versions/1.13.7/src/runtime/panic.go:774 +0x72 fp=0xc000aa7020 sp=0xc000aa6ff0 pc=0x102e112
runtime.mapaccess2_faststr(0x2957ac0, 0xc0002aa360, 0xc000432700, 0x1b, 0x1, 0xc000432700)
	/opt/goenv/versions/1.13.7/src/runtime/map_faststr.go:116 +0x48f fp=0xc000aa7090 sp=0xc000aa7020 pc=0x1012bdf
github.com/hashicorp/terraform-plugin-sdk/helper/schema.(*DiffFieldReader).ReadField(0xc0002aa300, 0xc0006d2720, 0x3, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x5555555555555555, ...)
	github.com/terraform-providers/terraform-provider-helm/vendor/github.com/hashicorp/terraform-plugin-sdk/helper/schema/field_reader_diff.go:51 +0x10d fp=0xc000aa71d0 sp=0xc000aa7090 pc=0x197623d
github.com/hashicorp/terraform-plugin-sdk/helper/schema.(*MultiLevelFieldReader).ReadFieldMerge(0xc000b05940, 0xc0006d2720, 0x3, 0x3, 0x2c5b89f, 0x3, 0x0, 0x0, 0x0, 0x0, ...)
	github.com/terraform-providers/terraform-provider-helm/vendor/github.com/hashicorp/terraform-plugin-sdk/helper/schema/field_reader_multi.go:45 +0x1d8 fp=0xc000aa72e0 sp=0xc000aa71d0 pc=0x19796b8
github.com/hashicorp/terraform-plugin-sdk/helper/schema.(*ResourceData).get(0xc000116540, 0xc0006d2720, 0x3, 0x3, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	github.com/terraform-providers/terraform-provider-helm/vendor/github.com/hashicorp/terraform-plugin-sdk/helper/schema/resource_data.go:537 +0x2f8 fp=0xc000aa73c8 sp=0xc000aa72e0 pc=0x1986128
github.com/hashicorp/terraform-plugin-sdk/helper/schema.(*ResourceData).getRaw(0xc000116540, 0xc0004326e0, 0x1b, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	github.com/terraform-providers/terraform-provider-helm/vendor/github.com/hashicorp/terraform-plugin-sdk/helper/schema/resource_data.go:128 +0x75 fp=0xc000aa7430 sp=0xc000aa73c8 pc=0x19837c5
github.com/hashicorp/terraform-plugin-sdk/helper/schema.(*ResourceData).GetOk(0xc000116540, 0xc0004326e0, 0x1b, 0x2c6c3d0, 0xe, 0xc0004326e0)
	github.com/terraform-providers/terraform-provider-helm/vendor/github.com/hashicorp/terraform-plugin-sdk/helper/schema/resource_data.go:94 +0x5f fp=0xc000aa74e0 sp=0xc000aa7430 pc=0x198351f
github.com/terraform-providers/terraform-provider-helm/helm.k8sGetOk(0xc000116540, 0x2c6c3d0, 0xe, 0x24, 0x0, 0x0)
	github.com/terraform-providers/terraform-provider-helm/helm/provider.go:268 +0x93 fp=0xc000aa7588 sp=0xc000aa74e0 pc=0x27709b3
github.com/terraform-providers/terraform-provider-helm/helm.(*KubeConfig).toRawKubeConfigLoader(0xc0006d2060, 0x0, 0x0)
	github.com/terraform-providers/terraform-provider-helm/helm/structure_kubeconfig.go:86 +0xd38 fp=0xc000aa7718 sp=0xc000aa7588 pc=0x277bc68
github.com/terraform-providers/terraform-provider-helm/helm.(*KubeConfig).ToRawKubeConfigLoader(0xc0006d2060, 0x0, 0x0)
	github.com/terraform-providers/terraform-provider-helm/helm/structure_kubeconfig.go:68 +0xab fp=0xc000aa7780 sp=0xc000aa7718 pc=0x277aecb
github.com/terraform-providers/terraform-provider-helm/helm.(*KubeConfig).ToRESTConfig(0xc0006d2060, 0x0, 0x0, 0x0)
	github.com/terraform-providers/terraform-provider-helm/helm/structure_kubeconfig.go:31 +0x2b fp=0xc000aa77b0 sp=0xc000aa7780 pc=0x277ab1b
k8s.io/kubectl/pkg/cmd/util.(*factoryImpl).ToRESTConfig(...)
	github.com/terraform-providers/terraform-provider-helm/vendor/k8s.io/kubectl/pkg/cmd/util/factory_client_access.go:63
k8s.io/kubectl/pkg/cmd/util.(*factoryImpl).KubernetesClientSet(0xc0006d2090, 0x0, 0x0, 0x0)
	github.com/terraform-providers/terraform-provider-helm/vendor/k8s.io/kubectl/pkg/cmd/util/factory_client_access.go:79 +0x38 fp=0xc000aa77e0 sp=0xc000aa77b0 pc=0x25167e8
helm.sh/helm/v3/pkg/kube.(*Client).IsReachable(0xc0006d20c0, 0x1983656, 0x28d1360)
	github.com/terraform-providers/terraform-provider-helm/vendor/helm.sh/helm/v3/pkg/kube/client.go:91 +0x37 fp=0xc000aa7840 sp=0xc000aa77e0 pc=0x2518be7
helm.sh/helm/v3/pkg/action.(*Get).Run(0xc000aa78d0, 0xc000744960, 0x12, 0x28d1360, 0x28d1360, 0x3012010)
	github.com/terraform-providers/terraform-provider-helm/vendor/helm.sh/helm/v3/pkg/action/get.go:41 +0x3b fp=0xc000aa7888 sp=0xc000aa7840 pc=0x271acab
github.com/terraform-providers/terraform-provider-helm/helm.getRelease(0xc000a192c0, 0xc000744960, 0x12, 0x28d1360, 0xc0006ced50, 0x1)
	github.com/terraform-providers/terraform-provider-helm/helm/resource_release.go:894 +0x64 fp=0xc000aa7900 sp=0xc000aa7888 pc=0x2778fe4
github.com/terraform-providers/terraform-provider-helm/helm.resourceReleaseExists(0xc0002aca10, 0x2a55f40, 0xc0002aa2a0, 0xc0002aca10, 0x0, 0x0)
	github.com/terraform-providers/terraform-provider-helm/helm/resource_release.go:730 +0x10b fp=0xc000aa7958 sp=0xc000aa7900 pc=0x277748b
github.com/hashicorp/terraform-plugin-sdk/helper/schema.(*Resource).RefreshWithoutUpgrade(0xc000596c60, 0xc000b685a0, 0x2a55f40, 0xc0002aa2a0, 0xc000146960, 0x0, 0x0)
	github.com/terraform-providers/terraform-provider-helm/vendor/github.com/hashicorp/terraform-plugin-sdk/helper/schema/resource.go:445 +0x24d fp=0xc000aa79c8 sp=0xc000aa7958 pc=0x19812dd
github.com/hashicorp/terraform-plugin-sdk/internal/helper/plugin.(*GRPCProviderServer).ReadResource(0xc00000ebc8, 0x3098960, 0xc000573020, 0xc00013c720, 0xc00000ebc8, 0xc000573020, 0xc0006b5b30)
	github.com/terraform-providers/terraform-provider-helm/vendor/github.com/hashicorp/terraform-plugin-sdk/internal/helper/plugin/grpc_provider.go:525 +0x3d8 fp=0xc000aa7ad0 sp=0xc000aa79c8 pc=0x19a0b28
github.com/hashicorp/terraform-plugin-sdk/internal/tfplugin5._Provider_ReadResource_Handler(0x2bc4c00, 0xc00000ebc8, 0x3098960, 0xc000573020, 0xc00013c6c0, 0x0, 0x3098960, 0xc000573020, 0xc000684580, 0x54a)
	github.com/terraform-providers/terraform-provider-helm/vendor/github.com/hashicorp/terraform-plugin-sdk/internal/tfplugin5/tfplugin5.pb.go:3269 +0x217 fp=0xc000aa7b40 sp=0xc000aa7ad0 pc=0x18b2787
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00053fb00, 0x30c10e0, 0xc000532780, 0xc00013a200, 0xc0003063c0, 0x4108710, 0x0, 0x0, 0x0)
	github.com/terraform-providers/terraform-provider-helm/vendor/google.golang.org/grpc/server.go:1024 +0x4f4 fp=0xc000aa7e18 sp=0xc000aa7b40 pc=0x147ad24
google.golang.org/grpc.(*Server).handleStream(0xc00053fb00, 0x30c10e0, 0xc000532780, 0xc00013a200, 0x0)
	github.com/terraform-providers/terraform-provider-helm/vendor/google.golang.org/grpc/server.go:1313 +0xd97 fp=0xc000aa7f48 sp=0xc000aa7e18 pc=0x147ea47
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0004ba890, 0xc00053fb00, 0x30c10e0, 0xc000532780, 0xc00013a200)
	github.com/terraform-providers/terraform-provider-helm/vendor/google.golang.org/grpc/server.go:722 +0xbb fp=0xc000aa7fb8 sp=0xc000aa7f48 pc=0x148be9b
runtime.goexit()
	/opt/goenv/versions/1.13.7/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc000aa7fc0 sp=0xc000aa7fb8 pc=0x105b761
created by google.golang.org/grpc.(*Server).serveStreams.func1
	github.com/terraform-providers/terraform-provider-helm/vendor/google.golang.org/grpc/server.go:720 +0xa1
Full Stack Trace

https://gist.github.com/krzysztof-miemiec/2d44d187a75a2f106ccd9c71fbf67883

Expected Behavior

Helm properly plans and applies changes to multiple releases.

Actual Behavior

Random crashes happen during plan phase (I provided stack trace for that) OR sometimes everything gets stuck during deployment (no logs 😞).

Steps to Reproduce

  1. Have Kubernetes cluster up and running, be able to freely access it via configured context
  2. Download zip file attached
  3. terraform plan

Important Factoids

I never debugged or contributed to any Go tool, so these are mostly guesses based on my knowledge of Terraform:

  • It happens randomly, I guess it occurs more often when you define more helm_release resources (it probably won’t occur when multiple instances of helm provider are not running at the same time 🤔)
  • I noticed that it takes ages to refresh state of Helm resources, especially ones that are using CRDs (not the ones that define them). Using -parallelism=1 flag helps a bit with performance and with this flag I didn’t encounter crashes or infinite wait on plan phase.
  • Infinite wait happens sometimes during terraform apply even with wait = false.
  • If I understand stack trace correctly, it is a concurrency issue with Helm provider or Terraform itself

References

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 18
  • Comments: 16 (8 by maintainers)

Commits related to this issue

Most upvoted comments

The error is not constant. It fails 50-60% of the time. The success rate might be related to the performance of the systems.

Today Terraform version 0.12.25 was released, which fixed a concurrency bug. https://github.com/hashicorp/terraform/blob/v0.12.25/CHANGELOG.md There’s a chance that could have fixed this issue, but we still need to test using the reproducer.

I have encountered a similar problem. I have created a module that will deploy helm releases in a rancher namespace. (helm provider and rancher 2 provider)

At first, I did try to run this with 80-90 modules and did encounter the problem as well. So I did go down to 10 modules.

Deploying >= 10 modules will encounter in constant errors. (Did not test less than 10) Some of the helm releases where reported as deployed by the helm provider but in the “pending” state. When I checked in Kubernetes I could not see any resources regarding the specific helm release (Pods, deployments, etc), also helm ls did not show anything.

The namespaces were created successfully since this is done by the rancher2 provider.

When running with -parallelism=1 everything worked fine. -parallelism=2 does not work either.

Also running 2 parallel terraform apply -parallelism=1 with 2 states with 10 modules each does work fine. I think it is not an issue by Kubernetes itself. I think it is related to terraform or the helm provider.

Tested with terraform version

  • 0.12.16
  • 0.12.26 helm provider
  • 1.0.0
  • 1.2.2

Any idea of how I could provide more useful information to debug this?

@krzysztof-miemiec 1.0.0 also works for me, it’s appeared right after update to 1.1.0 and upper.

im using 1.0.0 in production atm (without any additional flags)

i’ve tested helm-provider again with native kubernetes 1.16.9

so basically it works 15 out of 15 times with token specified in kubernetes spec.

other ways (like exec command or direct kubeconfig with or without exec path - causing the problem described problem).

I also encounter this error.

It seems that helm provider can not talk to k8s api. When i set my local kubeconfig to working k8s context, the terraform plan works. When I want to rely on the kubernetes provider only and set KUBE_LOAD_CONFIG_FILE=false then I see the issue above.