helm: Helm v3.4 Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

Upgraded from Helm 3.3 to Helm 3.4, existing charts started failing the upgrade with the message: Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

At the same time a helm list -n myns the chart disappeared and didn’t show up in the list at all

This is a chart that’s been upgraded over 800 times successfully, only change was the helm version bump, chart failed twice in an attempt to deploy with the command:

helm upgrade --install --namespace myns --timeout 1800s --atomic mychart charts/app/standalone --values values-override.yaml

Once I rolled back to 3.3 I was able to upgrade the chart successfully.

Output of helm version:

version.BuildInfo{Version:"v3.4.0", GitCommit:"7090a89efc8a18f3d8178bf47d2462450349a004", GitTreeState:"dirty", GoVersion:"go1.15.3"}

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.13", GitCommit:"39a145ca3413079bcb9c80846488786fed5fe1cb", GitTreeState:"clean", BuildDate:"2020-07-15T16:18:19Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-gke.401", GitCommit:"eb94c181eea5290e9da1238db02cfef263542f5f", GitTreeState:"clean", BuildDate:"2020-09-09T00:57:35Z", GoVersion:"go1.13.9b4", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.): GKE

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 33
  • Comments: 51 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Hi @bacongobbler, the described workaround did indeed works!

$ helm history kyc-api
REVISION	UPDATED                 	STATUS         	CHART             	APP VERSION	DESCRIPTION             
1       	Mon Nov  9 14:57:36 2020	pending-install	generic-base-0.2.1	0.1.0      	Initial install underway

$ helm rollback kyc-api 1
Rollback was a success! Happy Helming!

 $ helm history kyc-api
REVISION	UPDATED                 	STATUS         	CHART             	APP VERSION	DESCRIPTION             
1       	Mon Nov  9 14:57:36 2020	pending-install	generic-base-0.2.1	0.1.0      	Initial install underway
2       	Mon Nov  9 15:06:15 2020	deployed       	generic-base-0.2.1	0.1.0      	Rollback to 1  

Looking at the werf/helm PR pretty much confirms that CTRL+C breaks the helm installation on 3.4.0.

Experienced this on Helm v3.5.2 caused by CTRL+C pressed during upgrade. Workaround: kubectl delete secret sh.helm.release.v1.<RELEASE_NAME>.v<LATEST_REVISION>

I also have the same issue described on this thread with v3.4.1

helm rollback can be a workaround for development machines, but unacceptable on production CI/CD pipelines

When the helm CLI receives a SIGTERM signal, it should exit gracefully leaving helm labels in a stable state, allowing further deployments without issues

The issue is not fixed yet and should be open again for further research

the problem is that helm rollback && helm upgrade is not a suitable solution for production deployments

I can reproduce with helm upgrade --install --atomic and interrupting it during the execution. The second run will always return an error:

Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

Can be solved by:

helm rollback

I guess this proposal could solve this issue https://github.com/helm/helm/issues/8040

same on v3.5.4, fixed by

kubectl delete secret sh.helm.release.v1.asdf.v1 -n asdf

I have the same issues. I’ll try to get a debug output if possible.

I noticed this issue usually happens when you upgrade a helm chart with --wait and the upgrade clearly fails (like a crashloopbackoff or something like that) and helm is waiting until it reaches the timeout but the user does CTRL+C before reaching the timeout. After that I’ll get the same error as posted above:

STDERR:
  Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

I’m using helmfile, not helm directly, maybe its another problem with helmfile not sending the SIGTERM correctly.

Run ‘helm history —all’, job is probably pending, you’ll have to rollback to last successful deployment.

On Nov 29, 2020, at 5:39 PM, Victor Login notifications@github.com wrote:

my out:

helm upgrade shortlink-api ops/Helm/shortlink-api --install --wait --namespace=shortlink --set deploy.image.tag=0.7.0.16 --debug -v 6

history.go:53: [debug] getting history for release shortlink-api upgrade.go:121: [debug] preparing upgrade for shortlink-api Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress helm.go:81: [debug] another operation (install/upgrade/rollback) is in progress helm.sh/helm/v3/pkg/action.init /home/circleci/helm.sh/helm/pkg/action/action.go:62 runtime.doInit /usr/local/go/src/runtime/proc.go:5474 runtime.doInit /usr/local/go/src/runtime/proc.go:5469 runtime.main /usr/local/go/src/runtime/proc.go:190 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1373 UPGRADE FAILED main.newUpgradeCmd.func2 /home/circleci/helm.sh/helm/cmd/helm/upgrade.go:156 github.com/spf13/cobra.(*Command).execute /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842 github.com/spf13/cobra.(*Command).ExecuteC /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950 github.com/spf13/cobra.(*Command).Execute /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887 main.main /home/circleci/helm.sh/helm/cmd/helm/helm.go:80 runtime.main /usr/local/go/src/runtime/proc.go:203 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1373 helm ls --all

NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION GitLab CI Pipeline Job https://gitlab.com/shortlink-org/shortlink/-/jobs/879030237

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

why is this issue still closed? repro: on empty cluster

helm upgrade --install --version=3.13.0 --create-namespace --namespace ingress-nginx-2 --set controller.kind=DaemonSet --set controller.service.type=LoadBalancer --set controller.service.loadBalancerIP=127.0.0.1 ingress-nginx-2 ingress-nginx/ingress-nginx

and ^C it

this problem is because skaffold/helm was interrupted, i fixed it with deleting the broken namespace

“fixed” by deleting all helm secrets

I still think there is a regression in 3.4.0 since I never had this issue before.

@bacongobbler I haven’t had luck reproducing the issue, I’ve just bumped to 3.4.1 and upgraded the same deployment that previously failed under 3.4.0, will just assume the issue is resolved unless I see something else. Thanks for everything.

on v3.3.4 such case handled fine (see a picture attached). Using helm in GitLab CI and job cancelation became a problem after upgrading to 3.4.0. v3.5.1 has the same issue too. image

same on v3.5.4, fixed by

kubectl delete secret sh.helm.release.v1.asdf.v1 -n asdf

This works for me. Thanks! @okunc

same on v3.5.4, fixed by

kubectl delete secret sh.helm.release.v1.asdf.v1 -n asdf

Thank you, deleting the last secret in the list fixed it in my case too. I’m also on helm v3.5.4. Also using fluxcd helm-controller - and to bring it back completely, after deleting that secret i also had to run:

flux resume helmrelease asdf -n asdf

And now all the failed / stuck releases are working again!

helm history -n (ns) xxx helm rollback -n (ns) xxx (num)

“fixed” by deleting all helm secrets

you don’t need to delete all helm secrets but only the last one. it sounds like a workaround but not like fix.

same in 3.8.1, still when killing helm install --upgrade hard enough

Workaround:

  1. Write the secret representing the release out to memory
kubectl get secret -n <whatever> sh.helm.release.v1.<name>.v<version>    -o yaml > release.yaml
  1. Unpack the secret so you can manually edit it
cat release.yaml | yq .data.release | base64 -d | base64 -d | gunzip > release-contents.json
  1. Edit the JSON document that is unpacked, changing the “description” key to “Deployed” and the “status” key to the value “deployed”
  2. Repack the json document
cat release.json | gzip | base64 -w 0 > newreleasevalue
  1. Edit the file release.yaml, changing the label status to deployed and the data.release key to have as its value the contents of the file newreleasevalue created in step 6
  2. Update the secret using the file release.yaml which you just updated:
kubectl apply -n <whatever> -f release.yaml
  1. Run helm list to verify that your release shows up in the list of deployed stuff
  2. Now you can re-run your helm command and it won’t die.

Getting this error when running kubectl apply after following those steps:

Warning: resource secrets/sh.helm.release.v1.nextcloud.v11 is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
The Secret "sh.helm.release.v1.nextcloud.v11" is invalid: metadata.annotations: Too long: must have at most 262144 bytes

Also, the command you provided for repacking the release contents only encodes it once, I believe it should be doing that twice, and the yq command is giving output with quotes around. Used these commands instead: cat release.yaml | yq .data.release | sed 's/"//g' | base64 -d | base64 -d | gunzip > release-contents.json cat release-contents.json | gzip | base64 -w 0 | base64 -w 0 > newreleasevalue

UPDATE: found a workaround. Use kubectl edit to apply the changes instead of kubectl apply. Make sure you know how to paste in your terminal editor. I’d recommend just commenting out the old line to remove it instead of deleting it by hand, does same thing.

Workaround:

  1. Write the secret representing the release out to memory
kubectl get secret -n <whatever> sh.helm.release.v1.<name>.v<version>    -o yaml > release.yaml
  1. Unpack the secret so you can manually edit it
cat release.yaml | yq .data.release | base64 -d | base64 -d | gunzip > release-contents.json
  1. Edit the JSON document that is unpacked, changing the “description” key to “Deployed” and the “status” key to the value “deployed”
  2. Repack the json document
cat release.json | gzip | base64 -w 0 > newreleasevalue
  1. Edit the file release.yaml, changing the label status to deployed and the data.release key to have as its value the contents of the file newreleasevalue created in step 6
  2. Update the secret using the file release.yaml which you just updated:
kubectl apply -n <whatever> -f release.yaml
  1. Run helm list to verify that your release shows up in the list of deployed stuff
  2. Now you can re-run your helm command and it won’t die.

Curiously, the way we currently fix it in our cluster is by running the same upgrade command with Helm version 3.2.1 (the Helm plugin for our CI uses 3.6.2, previously 3.2.4 which had no such issues). Not only it goes through just fine, but after that it works again with the current version.

v3.6.3 brought me here, same behavior.

@VengefulAncient - Can you provide some more clarity on your fix please? Does v3.6.2 fix this issue? FYI - We are using v3.4.2 on our CI and seeing this issue. Thanks!

Sure. v3.6.2 is the problematic version. It’s the older version, v3.2.4 (not v3.4.2, AFAIK it’s already affected by the bug), that fixes the bug. I run exactly the same helm upgrade --install command as our CI from my local machine using this older version when this error comes up, and it updates the release just fine - and our CI with the newer Helm version also works after.

If it were that easy it would’ve been fixed by now.

But by all means, if you know how to fix it, we’d welcome a pull request. There’s instructions in the documentation for getting started.

I wonder what happened from v3.3 to v3.4 that caused this issue

Had this issue because I tried to cancel a helm deployment from the command line. The workaround suggested by @Skaronator using a rollback got me past the error. Helm history looks like this now:

35              Thu Mar 18 10:09:15 2021        superseded      wordpress-0.1.6 5.4.2           Upgrade complete 
36              Thu Mar 18 10:32:16 2021        superseded      wordpress-0.1.6 5.4.2           Rollback to 34   
37              Thu Mar 18 10:42:48 2021        pending-upgrade wordpress-0.1.6 5.4.2           Preparing upgrade
38              Thu Mar 18 10:48:11 2021        superseded      wordpress-0.1.6 5.4.2           Rollback to 36   
39              Thu Mar 18 10:49:02 2021        deployed        wordpress-0.1.6 5.4.2           Upgrade complete 

The OP determined his issue was a duplicate of #4558. As #4558 describes, there are a few cases where a helm upgrade can enter the PENDING_UPGRADE state in the event of a timeout. A helm rollback && helm upgrade resolves the issue; hence why it was closed as a duplicate of #4558 (the symptoms and the workaround is identical).

If you do not believe you are experiencing the same issue as the OP, please open a new ticket.

I just tried it with

▶ helm3 version --short
v3.5.1+g32c2223

the end result is the same.

Why was this issue closed?