kubernetes: daemonset rollingupdate hang

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug /kind feature

What happened: I have a daemonsets with 250 replicas and maxUnavailable is 20%. When I try to update its image, the ds will scale down 50 old replicas and wait for SatisfiedExpectations, after that it will scale up 50 new replicas. But if a pod delete failed (node is notready, and pod stuck at terminating status), ds controller will never reach the expectation and just stuck here. What you expected to happen: When a pod stuck at terminating, ds controller should start new pods like deployment controller, instead of waiting for the delete batch finish and ready for next poll. How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): Client Version: version.Info{Major:“1”, Minor:“8+”, GitVersion:“v1.8.1-8+fca89bca262f6d”, GitCommit:“fca89bca262f6de6f8fcc5942f28b30bcf29edad”, GitTreeState:“clean”, BuildDate:“2017-10-25T13:04:04Z”, GoVersion:“go1.8.3”, Compiler:“gc”, Platform:“linux/amd64”}
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 2
Comments: 19 (13 by maintainers)

Most upvoted comments

@janetkuo why does the DaemonSet controller wait for all pods 50 to be done, before moving to the NEXT 50 ? I completely understand that we want only at most 1 pod per node. But if a pod deletion fails on the node, move on to the next pod as long as the total unavailable pods are less than maxUnavailable. This will allow us to make progress in the face of bad nodes or failed pod deletions due to failing tear down of bad volumes or other reasons.

krmayankk on Apr 4, 2019