kubernetes: Incorrect Status: 'deployment "testapp" exceeded its progress deadline'

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened: When setting an image using the example command and following it up with a status check, the command incorrectly indicated that the deployment exceeded it’s progress deadline.

I’m using a shell script for deploys, those commands look similar to this:

kubectl set image deployments/${deployment_name} ${container_name}=${image_name}:${tag} --record
kubectl rollout status deployments/${deployment_name} --namespace ${environment}
if [[ $? != 0 ]]
then
  echo "The deploy has failed"
fi

During a deploy, sometimes the output looks like this:

deployment "testapp" image updated
error: deployment "testapp" exceeded its progress deadline
The deploy has failed

When in fact the deployment continued to work just fine. We never hit the limit of our progressDeadlineSeconds at all.

The problem comes in that my deploy script is larger then the above. I rely on the exit code of the status command to determine if if should perform a rollback. When the status command runs it sometimes receives the wrong information and exits with status 1.

What you expected to happen: I expected the status to report to me the legit status of the on-going deployment. Example successful output:

deployment "testapp" image updated
Waiting for rollout to finish: 1 old replicas are pending termination...
Waiting for rollout to finish: 1 old replicas are pending termination...
deployment "testapp" successfully rolled out

How to reproduce it (as minimally and precisely as possible):

  1. Create a deployment with .spec.progressDeadlineSeconds set to something reasonable, in my case it’s set to 100. I’m using this gist for testing.
  2. In one terminal execute: while true; do kubectl rollout status deploy/testapp; done
  3. In another terminal update the deployment, example in my case: kubectl set image deploy/testapp testapp=jtslear/testapp:0.0.15

Anything else we need to know?: This is flakey. I’ve seen this in our production clusters only a few times, and even recreating it is rather tricky. I’ve had to repeat the steps above a couple of times before I ran into it.

To work around this, I’ve added a 3 second sleep between the set and status commands.

Environment:

  • Kubernetes version (use kubectl version): seen on versions 1.7.0 and 1.7.1
  • Cloud provider or hardware configuration**: minikube/GKE
  • OS (e.g. from /etc/os-release):
    • GKE: COS, build: 9460.64.0 version: 59
    • minikube: Buildroot, version 2017.02
  • Kernel (e.g. uname -a):
    • GKE: Linux gke-production-pool-1-7-1-2a195619-19nh 4.4.52+ #1 SMP Thu Jun 15 15:23:01 PDT 2017 x86_64 Intel(R) Xeon(R) CPU @ 2.20GHz GenuineIntel GNU/Linux
    • minikube: Linux minikube 4.9.13 #1 SMP Tue Jul 18 22:17:02 UTC 2017 x86_64 GNU/Linux
  • Install tools: minikube/GKE
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 5
  • Comments: 28 (16 by maintainers)

Commits related to this issue

Most upvoted comments

Hey so automated rollbacks through kubernetes is done currently with only “undo” command with kubectl.

Is there is way kubernetes identifies the failed deployment and rollbacks automatically to previous running deployment by itself rather than passing the command kubectl rollout undo deploy/

Also with the command “kubectl patch deployment/nginx-deployment -p '{“spec”:{“progressDeadlineSeconds”:100}}” kubernetes does not by default rolls back to previous success deployment after the new deployment fails and completes the progressDeadlineSeconds to 100s

can we make make kubernetes know to rollback by itself depending upon progressDeadlineSeconds ?

I have tried " kubectl patch deployment/nginx-deployment -p ‘{“spec”:{“spec.autoRollback”:true}}’" which dint seem to work

Please provide me with solution if there is a way kubernetes can automatically rollback to previous successful deployment by itself. Thank you

Yeah, flipping from False to True is ok since the actual state of the Deployment is True. Are you also running on GKE?