helm: helm install --wait does not wait for deployment pod readiness properly

I recently discovered and started using --wait:

--wait                   if set, will wait until all Pods, PVCs, Services, and minimum number of Pods of a Deployment are in a ready state before marking the release as successful. It will wait for as long as --timeout

It’s great!

However, when I run helm install --wait, the command exits successfully before the pods in my deployments pass their configured readiness probe.

I did a little spelunking to confirm that my expectations were correct, and found this suspicious-looking code attempting to do the deployment readiness check:

func (c *Client) deploymentsReady(deployments []deployment) bool {
	for _, v := range deployments {
		if !(v.replicaSets.Status.ReadyReplicas >= *v.deployment.Spec.Replicas-deploymentutil.MaxUnavailable(*v.deployment)) {
			c.Log("Deployment is not ready: %s/%s", v.deployment.GetNamespace(), v.deployment.GetName())
			return false
		}
	}
	return true
}

permalink: https://github.com/kubernetes/helm/blob/d790f7d843182f1c126cfa64989ffdd4e9583747/pkg/kube/wait.go#L174

I’m happy to send a PR to fix the if-check, but might need some advice on testing the fix.

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 22
Comments: 37 (13 by maintainers)

Commits related to this issue

Merge pull request #3407 from paolomainardi/feature/3173_fix_wait_for_deployments refs #3173: add appsv1beta1 and appsv1beta2 apiVersion — committed to helm/helm by mattfarina 6 years ago
[elasticsearch] Add upgrade integration test This test will install the latest chart release and then attempt to upgrade to what is currently in git. This is designed to test two things. 1. That any... — committed to elastic/helm-charts by Crazybus 5 years ago
Add Sysinv enhancements for FluxCD In this commit we added 2 enhancement to the the FluxCD functionality: 1. Dynamically change the host in the default helm repository with the system controller net... — committed to starlingx/config by deleted user 2 years ago

Most upvoted comments

@thomastaylor312 I still have the problem with Helm 3. I install a Helm chart (service catalog for example, but this is not important) with the flag --wait and helm terminates before pods are ready. The chart contains one deployment of one pod. The update strategy is set to type RollingUpdate. The problem seems to be in function deploymentReady : https://github.com/helm/helm/blob/ec1d1a3d3eb672232f896f9d3b3d0797e4f519e3/pkg/kube/wait.go#L221-L228 The deployment does not provide configuration for the rolling update so some default values are used. A kubectl describe show these values :

replicas: 1
strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate

The calculation of expectedReady variable in deploymentReady function gives a value of 0 as replica is 1 and maxUnavailable is also 1. So the condition rs.Status.ReadyReplicas >= expectedReady is always true, even when no pod started yet. I think that the variable expectedReady should have a minimum value of 1 if it is equal to 0.

+21

fredgate on Mar 9, 2020

Please reopen the issue @Crazybus. I am having this problem with 2.14 version of Helm.

To reproduce just try the script:

helm repo add storageos https://charts.storageos.com
helm install storageos/storageos-operator --namespace storageos-operator --wait
kubectl patch storageclass fast -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

It installs a apiVersion: apps/v1 Deployment. https://github.com/storageos/charts/blob/master/stable/storageos-operator/templates/operator.yaml

If you wait some seconds you can patch the created storageclass.

staticdev on May 23, 2019

I can also reproduce it: Trying to deploy the nginx operator: -> the chart is considered like deployed when pod is in “Creating container…” status

–wait option enabled + timeout increase to 15min without effects (also tried with the wait jobs options)

anthosz on Apr 11, 2023

Hello @thomastaylor312

I understand this issue is close but i have few interrogation about, maybe some misunderstanding on my side

The wait option wait until the pods are ready according to rule Status.ReadyReplicas >= expectedReady But how can we seperate the behaviour during the deployment and the expected final state. For instance with a strategy like below

replicas: 2 strategy: rollingUpdate: maxSurge: 0 maxUnavailable: 1 type: RollingUpdate

We accept to have a pod unavailable during deployment as one of another they will be restarted( for a zero downtime scenario) but we would like the deployment to be considered finished and ok only when all pods are up and ready. If the two are not restarted correctly, the deployment should not be considered ok.

For now, we use ‘kubectl rollous status’ as it will wait until all pods are ready and if not the deployment will be rollback.

This behaviour is possible, planned ?

jcamu on Jan 15, 2021

ping @jgiles @cueBall123

paolomainardi on Jan 25, 2018