helm: helm upgrade > timeout on pre-upgrade hook > revision stuck in `PENDING_UPGRADE` and multiple `DEPLOYED` revisions arise soon

Reproduction and symptom

helm upgrade with a helm pre-upgrade hook that times out.
Error: UPGRADE FAILED: timed out waiting for the condition.

helm history my-release-name

# the last line...
22      	Wed Aug 29 17:59:48 2018	PENDING_UPGRADE	jupyterhub-0.7-04ccf1a 	Preparing upgrade

Expected outcome

The revision should end up in FAILED rather than PENDING_UPGRADE right?

About this issue

Original URL
State: open
Created 6 years ago
Reactions: 71
Comments: 60 (13 by maintainers)

Commits related to this issue

Do not exit if one component fails Helm releases would stay in a "pending" change if the installer exits early. Maybe related to https://github.com/helm/helm/issues/4558 — committed to epinio/installer by manno 3 years ago
AppFwk: Recover apply from helm operation in progress It is observed that when a helm release is in pending state, another helm release can't be started by FluxCD. FluxCD will not try to do steps to ... — committed to starlingx/config by deleted user 2 years ago

Most upvoted comments

This happened to me when I SIGTERMd an upgrade. I solved it by deleting the helm secret associated with this release, e.g.

$ k get secrets
NAME                                 TYPE                                  DATA   AGE
sh.helm.release.v1.app.v1            helm.sh/release.v1                    1      366d
sh.helm.release.v1.app.v2            helm.sh/release.v1                    1      331d
sh.helm.release.v1.app.v3            helm.sh/release.v1                    1      247d
sh.helm.release.v1.app.v4            helm.sh/release.v1                    1      77d
sh.helm.release.v1.app.v5            helm.sh/release.v1                    1      77d
sh.helm.release.v1.app.v6            helm.sh/release.v1                    1      15m
sh.helm.release.v1.app.v7            helm.sh/release.v1                    1      66s

$ k delete secret sh.helm.release.v1.app.v7

+24

jpiper on Apr 17, 2021

If you are deleting the secret in a pipeline, you could use the following before deploying:

kubectl -n NS delete secret -l name=release-name,status=pending-upgrade

That way you don’t need to query for the version when deleting the secret.

+20

gsimoes on Nov 3, 2021

We have the same problem in our GitLab pipelines. The workaround (running rollback) is not a good solution for prod CI/CD pipelines.

+14

klose4711 on Mar 22, 2021

Is there a workaround for this? Is upgrading to helm3 a solution?

I’ve just run into this issue and worked around it by performing a helm rollback to a previous release as follows:

problem:

26      	Mon Jun 15 14:13:24 2020	superseded     	elasticsearch-7.5.1	7.5.1      	Upgrade complete
27      	Mon Jun 15 17:52:09 2020	pending-upgrade	elasticsearch-7.5.1	7.5.1      	Preparing upgrade

fix:

$ helm rollback elasticsearch-release 26
Rollback was a success! Happy Helming!

$ helm history elasticsearch-release
26      	Mon Jun 15 14:13:24 2020	superseded     	elasticsearch-7.5.1	7.5.1      	Upgrade complete
27      	Mon Jun 15 17:52:09 2020	pending-upgrade	elasticsearch-7.5.1	7.5.1      	Preparing upgrade
28      	Tue Jun 23 14:51:11 2020	deployed       	elasticsearch-7.5.1	7.5.1      	Rollback to 26

+13

naseemkullah on Jun 23, 2020

Can we add a parameter to helm to control whether to continue execution or return an error message when the pending-upgrade state appears?

+12

silenceper on Oct 18, 2021

The best “manual” solution:

kubectl --namespace $NAMESPACE get secrets -l owner=helm
# could get really specific with owner=helm,status=pending-upgrade
helm --namespace NS history RELEASE

It should be the latest release that’s blocking. You can use these to match up and check release versions. Delete the secret or rollback. The fastest is to just delete the secret and run a fresh build on your pipeline.

kubectl --namespace $NAMESPACE delete secret sh.helm.release.v1.$RELEASE.vAFFENDING_RELEASE

+12

jeremiahbowen on Jul 21, 2021

We are running into this same issue with helm 3. The pipeline gets canceled and the helm operation is stuck in pending-upgrade. The current workaround for running a rollback does work but it isn’t that great for an automated pipeline unless we add a check before to make sure to “rollback” before deploy.

Is there anyway to just bypass the “pending-upgrade” status on a new deploy without running a rollback?

+12

mitchellmaler on Nov 19, 2020

Same problem, coming here searching for a reason/fix 👍

+11

bcouetil on Dec 11, 2020

Same problem here. Our pipeline gets canceled when there is a new version running and afterwards we can’t deploy anymore because of Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress.

+10

brainbug89 on Mar 18, 2021

How about recording the timeout flag to the release data (if it isn’t there already)? That way, if a release

has status pending
and has timeout of N minutes
but started over N minutes ago

then we could treat it as failed, not pending. This behavior could be optional behind a flag.

vindvaki on Jan 4, 2022

In my case, the issue was related to the lack of permissions for the role that was performing helm upgrade. I resolved it by adding the “update” verb to all resources that have to be changed during the deployment, example:

apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: ci-cd-cluster-role
rules:
- apiGroups: [""]
  resources: ["namespaces"]
  verbs: ["create","get","list","update","watch"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["create","delete","get","list","patch","update","watch"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "create","patch","update"]```

Artemkulish on Jun 30, 2021

How to get an indeployable Deployment deployed

I tried to create a minimalistic reproduction and ended up with something slightly different but I bet that this is related.

The following Charts deployment should never be deployed, right? Because it has a hook that should keep running in eternity. But it will be deployed if you run two upgrades in succession and have a hook resource already available with the same name and about to terminate.

Chart.yaml:

apiVersion: v1
appVersion: "1.0"
description: A Helm chart for Kubernetes
name: issue-4558
version: 0.1.0

templates/deployment.yaml:

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: never-to-be-installed-deployment
spec:
  selector:
    matchLabels:
      dummy: dummy
  template:
    metadata:
      labels:
        dummy: dummy
    spec:
      containers:
        - name: never-to-be-installed-deployment
          image: "gcr.io/google_containers/pause:3.1"

templates/job.yaml:

apiVersion: batch/v1
kind: Job
metadata:
  name: never-finishing-job
  annotations:
    "helm.sh/hook": pre-install,pre-upgrade
    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: never-finishing-job
          image: "gcr.io/google_containers/pause:3.1"

Reproduction commands:

helm upgrade issue . --install --namespace issue
# abort
helm upgrade issue . --install --namespace issue

messy-helm-upgrading

consideRatio on Aug 29, 2018

Use this before any upgrade/install (maybe already posted in this issue) :

# delete failed previous deployment if any (that would else require a helm delete)
kubectl -n $NAMESPACE delete secret -l "name=$APP_NAME,status in (pending-install, pending-upgrade)" || true

I initially found it here : https://github.com/helm/helm/issues/5595#issuecomment-700742563

bcouetil on Apr 20, 2023

The delete of secret will remove the release in the status=pending-upgrade, which means the future executions of the Helm command will start the 3 way merge process with the previous release.

Any side-effects?

In the use case, you are reporting it will have a nasty side effect of deleting a release that a helm command is manipulating.

But I think is the same side effect that you are proposing. So let’s check this scenario Alice starts an upgrade to release foo with wait flag Meanwhile, Bob starts another upgrade in the release foo with your switch to continue execution. That means Bob’s release will overwrite Alice’s upgrade. So what happens to Alice’s upgrade? It will fail? And to the Bob’s release, the 3way merge will use the previous successful release or the pending one?

Moser-ss on Nov 3, 2021

To clarify because others have asked: yes, this is a bug. PRs are welcome to help fix/mitigate this issue. It appearss https://github.com/helm/helm/pull/9180 has stalled out, so if someone wants to help out @Moser-ss with a fix, please feel free.

For now, helm rollback will help you return back to a known working state before re-attempting an upgrade.

bacongobbler on Jun 17, 2021

You can recreate this. I’m using helm v3.5.4 to do this.

Start the install, upgrade, etc in one terminal.
After is created the pending status in helm history, run this in another terminal ps -ax | grep [h]elm|cut -d' ' -f1|xargs kill -9

Why is this an issue? Because any additional install, upgrade, etc will fail with this error. 737 Thu Jun 17 08:14:33 2021 pending-upgrade pagerinc-11.15.2 11.15.2 Preparing upgrade

What makes this difficult for developers? We have everything in CI tools and when a build or run is canceled, punted, times out, etc, it puts helm in this state. The issue is that it now requires manual intervention to correct. Is there a solution that we can build into our pipelines to bypass or correct this? Something along the lines of ignoring a pending build that is more than a configurable time like 20 minutes.

jeremiahbowen on Jun 17, 2021

previously I was able to reproduce the reported behavior by:

starting a helm upgrade with --wait
killing the process from another terminal

the above resulted to pending-upgrade status when checked with helm history Using the same steps with helm v3.8.0 results to status failed when checked with helm history based on about a dozen of tests since Friday and Im yet to experience the pending-upgrade` status.

atanev on Apr 11, 2022

There is no way to stop releases getting stuck in pending upgrade state. The reason is that the local helm client is responsible for updating the progress of a chart upgrade by writing to the k8s API. If the network connection drops / client exits unexpectedly / k8s API stops responding etc. then the “pending upgrade” status simply cannot be updated to “failed”, because there is nothing to do the update.

https://github.com/helm/helm/issues/4558#issuecomment-1004477657 seems like a reasonable way of handling this situation IMO, but there is no way to stop it from happening in the first place.

How about recording the timeout flag to the release data (if it isn’t there already)? That way, if a release

has status pending

and has timeout of N minutes

but started over N minutes ago

then we could treat it as failed, not pending. This behavior could be optional behind a flag.

smlx on Apr 11, 2022

@Artemkulish yes, however that seems to be more related to the discussion in #7139. This ticket discusses issues when a timeout occurs during an upgrade. #7139 discusses issues around improper role-based access controls in place.

bacongobbler on Jul 7, 2021

We are running on Helm 3.4.1 and are running into the same issue as here from time to time. Worth mentioning that the previous version 3.3.x had no such trouble with the deployments… Can someone from the Helm team take a look at this and give an update or something?

tamasbege on Dec 10, 2020

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

github-actions[bot] on Dec 15, 2023

If you know the previous timeout it is better to delete secrets which are pending and hit timeout to avoid unnecessary race condition.

something like

kubectl -n $NAMESPACE get secrets -o=custom-columns=NAME:.metadata.name,AGE:.metadata.creationTimestamp --no-headers=true --sort-by=.metadata.creationTimestamp -l "name=${RELEASE_NAME},status in (pending-install, pending-upgrade)" | awk '$2 < "'`date -d "${TIMEOUT_IN_MINS} minutes ago" -Ins --utc | sed 's/+0000/Z/'`'" { print $1 }' | xargs --no-run-if-empty kubectl delete secret -n $NAMESPACE

provided clock to be in sync

vijaySamanuri on Sep 15, 2023

If you deploy the same module in production multiple times at the exact same time, you have bigger problems than this one, my friend. For other environments, just deploy again.

Before avoiding problems occuring once in a million, there are other everyday’s problems to solve, generally speaking 😉

bcouetil on Aug 22, 2023

Use this before any upgrade/install (maybe already posted in this issue) :
# delete failed previous deployment if any (that would else require a helm delete)
kubectl -n $NAMESPACE delete secret -l "name=$APP_NAME,status in (pending-install, pending-upgrade)" || true
I initially found it here : #5595 (comment)

Brilliant, this is exactly what I was looking for.

TonyMcTony on May 30, 2023

2+ really required

Kalin-the-Builder on Mar 31, 2023

1+ really required

kladiv on Mar 20, 2023

Yes, this issue is still existing with new version. We got the same with v3.8.2: version.BuildInfo{Version:"v3.8.2", GitCommit:"6e3701edea09e5d55a8ca2aae03a68917630e91b", GitTreeState:"clean", GoVersion:"go1.17.5"}

mr-yaky on May 30, 2022

I didn’t fix it for me, I’ve cancelled a deployment using v3.8.2 and it still got stuck on pending-upgrade.

goenning on May 17, 2022

Can we add a parameter to helm to control whether to continue execution or return an error message when the pending-upgrade state appears? What do you mean by that? A flag to force an upgrade when the pending-upgrade state appears? Isn’t that dangerous? The error appears because helm identifies another helm instance that is executing an upgrade. Basically is a mechanism to avoid the corruption of data.

Moser-ss on Oct 25, 2021