helm-controller: High Memory Usage after helm-controller v0.12.0 upgrade

I updated to helm-controller v0.12.1 and started using ReconcileStrategy Revision for all my local helm charts. Now helm-controller is restarted each time I push a commit to the GitRepository source, because helm-controller uses too much memory and is killed by kubernetes (OOMKilled). As a result of the controller being killed by kubernetes, some helm release are stuck in the upgrade process which must be manually rolled back (https://github.com/helm/helm/issues/8987). image

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 39 (21 by maintainers)

Commits related to this issue

Most upvoted comments

We’ve pushed a release candidate for #352, here is the image: ghcr.io/fluxcd/helm-controller:rc-4fe7a7c8

Please take it for a spin and let us know if it fixes the issue.

I’ve spend the day digging around to find the root cause of the sudden increase in memory usage. Here is what I’ve found:

We can’t do much in Flux, we have to wait for that PR to get merged, then wait for a Kubernetes release, then wait for a Helm release that uses the latest Kubernetes release and finally update Helm in Flux to fix the OOM issues.

I propose we revert Helm to v3.6.3 for a couple of months until the kube-openapi fixes end up in Helm.

We’re seeing a 75% drop in helm-controller memory (peaks and base level) since picking up this version 👍

Helm released a patch yesterday which likely addresses this issue

Due to the holiday period that is arriving pretty soon however, I am hesitant in releasing this as I will be on leave for 3 weeks. Unless someone has specific needs for the v3.7.x release range, in which case I can provide a RC.

Thanks all for testing. This should now also be solved by updating the helm-controller Deployment image to v0.12.2.

CLI release for flux bootstrap, etc. will arrive later today.

Awesome, and thanks a lot for helping out. This seems to indicate that we can at least temporary work around the upstream problems by forcing the replacement of that specific Helm dependency, without having to stop receiving new Helm updates.

@stefanprodan I have deployed helm controller with the image provided, will monitor it for a couple of hours and look for restarts due to OOM Kill

flux: v0.21.0
helm-controller: rc-725fd784
image-automation-controller: v0.16.0
image-reflector-controller: v0.13.0
kustomize-controller: v0.16.0
notification-controller: v0.18.1
source-controller: v0.17.1

No, I think your observations are correct based on other reports on Slack.

Did a quick dive into it with the limited time I had available, but the helm-controller didn’t really change much besides Helm, kustomize and controller-runtime updates. It would be useful if someone could pinpoint the resource behavior change to an exact helm-controller version, which would help identifying the issue.

I am at present working on Helm improvements for the source-controller in the area of Helm repository index, dependency, and chart build memory consumption. Once that’s done, I have time (and am planning) to look in much greater detail at the current shape of the helm-controller (as part of https://github.com/fluxcd/helm-controller/milestone/1).