helm: helm install --wait does not wait for deployment pod readiness properly
I recently discovered and started using --wait:
--wait if set, will wait until all Pods, PVCs, Services, and minimum number of Pods of a Deployment are in a ready state before marking the release as successful. It will wait for as long as --timeout
It’s great!
However, when I run helm install --wait, the command exits successfully before the pods in my deployments pass their configured readiness probe.
I did a little spelunking to confirm that my expectations were correct, and found this suspicious-looking code attempting to do the deployment readiness check:
func (c *Client) deploymentsReady(deployments []deployment) bool {
for _, v := range deployments {
if !(v.replicaSets.Status.ReadyReplicas >= *v.deployment.Spec.Replicas-deploymentutil.MaxUnavailable(*v.deployment)) {
c.Log("Deployment is not ready: %s/%s", v.deployment.GetNamespace(), v.deployment.GetName())
return false
}
}
return true
}
I’m happy to send a PR to fix the if-check, but might need some advice on testing the fix.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 22
- Comments: 37 (13 by maintainers)
Commits related to this issue
- Merge pull request #3407 from paolomainardi/feature/3173_fix_wait_for_deployments refs #3173: add appsv1beta1 and appsv1beta2 apiVersion — committed to helm/helm by mattfarina 6 years ago
- [elasticsearch] Add upgrade integration test This test will install the latest chart release and then attempt to upgrade to what is currently in git. This is designed to test two things. 1. That any... — committed to elastic/helm-charts by Crazybus 5 years ago
- Add Sysinv enhancements for FluxCD In this commit we added 2 enhancement to the the FluxCD functionality: 1. Dynamically change the host in the default helm repository with the system controller net... — committed to starlingx/config by deleted user 2 years ago
@thomastaylor312 I still have the problem with Helm 3. I install a Helm chart (service catalog for example, but this is not important) with the flag
--waitand helm terminates before pods are ready. The chart contains one deployment of one pod. The update strategy is set to typeRollingUpdate. The problem seems to be in functiondeploymentReady: https://github.com/helm/helm/blob/ec1d1a3d3eb672232f896f9d3b3d0797e4f519e3/pkg/kube/wait.go#L221-L228 The deployment does not provide configuration for the rolling update so some default values are used. Akubectl describeshow these values :The calculation of
expectedReadyvariable indeploymentReadyfunction gives a value of 0 asreplicais 1 andmaxUnavailableis also 1. So the conditionrs.Status.ReadyReplicas >= expectedReadyis always true, even when no pod started yet. I think that the variableexpectedReadyshould have a minimum value of 1 if it is equal to 0.Please reopen the issue @Crazybus. I am having this problem with 2.14 version of Helm.
To reproduce just try the script:
It installs a apiVersion: apps/v1 Deployment. https://github.com/storageos/charts/blob/master/stable/storageos-operator/templates/operator.yaml
If you wait some seconds you can patch the created storageclass.
I can also reproduce it: Trying to deploy the nginx operator: -> the chart is considered like deployed when pod is in “Creating container…” status
–wait option enabled + timeout increase to 15min without effects (also tried with the wait jobs options)
Hello @thomastaylor312
I understand this issue is close but i have few interrogation about, maybe some misunderstanding on my side
The wait option wait until the pods are ready according to rule Status.ReadyReplicas >= expectedReady But how can we seperate the behaviour during the deployment and the expected final state. For instance with a strategy like below
replicas: 2 strategy: rollingUpdate: maxSurge: 0 maxUnavailable: 1 type: RollingUpdate
We accept to have a pod unavailable during deployment as one of another they will be restarted( for a zero downtime scenario) but we would like the deployment to be considered finished and ok only when all pods are up and ready. If the two are not restarted correctly, the deployment should not be considered ok.
For now, we use ‘kubectl rollous status’ as it will wait until all pods are ready and if not the deployment will be rollback.
This behaviour is possible, planned ?
ping @jgiles @cueBall123