kubernetes: Pod startup errors bork StatefulSet rolling update

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

I’d like to think this is a bug, but I fear it may actually be a feature request.

What happened:

Pod startup failed during creation of or rolling update to a StatefulSet (for any of a number of reasons, e.g., a typo in an image name, or just a crash bug in the code being run). Rollout remains stalled even if the update is rolled back or if the StatefulSet is updated again with a corrected image. The only way to recover is via manual intervention (e.g., delete the pod that was failing, causing it to be recreated in the corrected state, and allowing subsequent pods to start).

What you expected to happen:

I expected updating the StatefulSet configuration with a corrected or reverted image spec to correct the problem.

How to reproduce it (as minimally and precisely as possible):

(1) Define and create a StatefulSet with 2 replicas of a single container pod with some non-working image. (2) Observe that StatefulSet comes up with pod 0 in a restart loop (e.g., Waiting: CrashLoopBackOff) and pod 1 does not start. (3) Patch the StatefulSet to update the image to a working image. (4) Observe that the condition of the pods does not change, even though the StatefulSet is now configured with the new, corrected image. (5) Manually delete pod 0. (6) Observe that pods 0 and 1 now both come up successfully running the new image.

Alternatively:

(1) Define and create a StatefulSet with 2 replicas of a single container pod with some working image. (2) Observe that StatefulSet comes up with pods 0 and 1 both running successfully. (3) Patch the StatefulSet to update the image to a non-working image. (4) Observe that pod 1 is updated to the new image and goes into a restart loop (e.g., Waiting: CrashLoopBackOff) while pod 0 remains running the old image. (5a) Patch the StatefulSet again to update the image to a third, working image. (5b) Revert the image change in step (3), e.g., using kubectl rollout undo (6) Observe that the condition of the pods does not change, even though the StatefulSet is now configured with the new, corrected image. (7) Manually delete pod 1. (8) Observe that pods 0 and 1 now both come up successfully running the new image.

Anything else we need to know?:

To my sensibilities this is just a bug: why would StatefulSet support rolling updates at all if there’s no way for it to handle the failure case? However, as I mentioned above, I fear this is actually a feature request. Our use case is a CI/CD system, so end-to-end automation is essential. I’m sure that with some effort we could implement the necessary monitoring and intervention by automated means via a kludgy combination of polling and timeouts, but the whole reason for switching to Kubernetes in the first place was to avoid having to write and debug that kind of fussy and error prone stuff. StatefulSet seems to be in exactly the right place to know what needs to be done and to do it, but unfortunately it now appears to only handle the happy path.

Edited to add: Another manual intervention that is effective: rather than deleting the pod that is stuck in a restart loop, directly update the image in the pod itself. This will result in a working pod and allow the rollout to resume. Note that there is nothing that requires the image so used to be the same image as the one that the StatefulSet was updated with, but if this image update were to be done by the StatefulSet controller we’d get exactly the behavior I’d like to see. From an uninformed outsider’s perspective this seems like a small change to the rolling update logic.

Environment:

  • Kubernetes version (use kubectl version): 1.9
  • Cloud provider or hardware configuration/OS/etc: Mac OSX running minikube v0.24.1

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 13
  • Comments: 42 (17 by maintainers)

Commits related to this issue

Most upvoted comments

I can still see this in 1.14. Should we reopen this issue?

This is an annoying issue as it requires manual intervention, and can really bog down a team that relies on automated workflows. I +1 reopening this

/reopen

I think there’s a process issue here. Automatically closing an issue If developers just ignore it for 3 months is astonishingly dysfunctional. Closing your eyes does not make the problem go away.

It’s been more than 5 years…

@kmangla9: You can’t reopen an issue/PR unless you authored it or you are a collaborator.

Your wish is my command…

/reopen

It’s been 2 1/2 years. Sheesh already.

I just tried to reproduce this (on 1.11.1). AFAICT it only affects StatefulSets with the default podManagementPolicy of OrderedReady, but not Parallel.

/reopen

I can reproduce this issue, this is one of the biggest operational overheads in our setup. We usually delete the pod after the patch for a new image.

I have also ran into this issue and cannot find a way to recover the statefulset without losing the pv to pod mapping. This is very unfortunate and going to cause data loss trying to fix.

I cannot reproduce losing the pv to pod mapping.

/reopen /lifecycle-frozen

I personally have experienced this issue and agree it’s super annoying.

(I guess it’s sig apps now)

This is not api machinery or architecture.

/sig workloads

I have also ran into this issue and cannot find a way to recover the statefulset without losing the pv to pod mapping. This is very unfortunate and going to cause data loss trying to fix.