kubernetes: Pod startup errors bork StatefulSet rolling update

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

I’d like to think this is a bug, but I fear it may actually be a feature request.

What happened:

Pod startup failed during creation of or rolling update to a StatefulSet (for any of a number of reasons, e.g., a typo in an image name, or just a crash bug in the code being run). Rollout remains stalled even if the update is rolled back or if the StatefulSet is updated again with a corrected image. The only way to recover is via manual intervention (e.g., delete the pod that was failing, causing it to be recreated in the corrected state, and allowing subsequent pods to start).

What you expected to happen:

I expected updating the StatefulSet configuration with a corrected or reverted image spec to correct the problem.

How to reproduce it (as minimally and precisely as possible):

(1) Define and create a StatefulSet with 2 replicas of a single container pod with some non-working image. (2) Observe that StatefulSet comes up with pod 0 in a restart loop (e.g., Waiting: CrashLoopBackOff) and pod 1 does not start. (3) Patch the StatefulSet to update the image to a working image. (4) Observe that the condition of the pods does not change, even though the StatefulSet is now configured with the new, corrected image. (5) Manually delete pod 0. (6) Observe that pods 0 and 1 now both come up successfully running the new image.

Alternatively:

(1) Define and create a StatefulSet with 2 replicas of a single container pod with some working image. (2) Observe that StatefulSet comes up with pods 0 and 1 both running successfully. (3) Patch the StatefulSet to update the image to a non-working image. (4) Observe that pod 1 is updated to the new image and goes into a restart loop (e.g., Waiting: CrashLoopBackOff) while pod 0 remains running the old image. (5a) Patch the StatefulSet again to update the image to a third, working image. (5b) Revert the image change in step (3), e.g., using kubectl rollout undo (6) Observe that the condition of the pods does not change, even though the StatefulSet is now configured with the new, corrected image. (7) Manually delete pod 1. (8) Observe that pods 0 and 1 now both come up successfully running the new image.

Anything else we need to know?:

To my sensibilities this is just a bug: why would StatefulSet support rolling updates at all if there’s no way for it to handle the failure case? However, as I mentioned above, I fear this is actually a feature request. Our use case is a CI/CD system, so end-to-end automation is essential. I’m sure that with some effort we could implement the necessary monitoring and intervention by automated means via a kludgy combination of polling and timeouts, but the whole reason for switching to Kubernetes in the first place was to avoid having to write and debug that kind of fussy and error prone stuff. StatefulSet seems to be in exactly the right place to know what needs to be done and to do it, but unfortunately it now appears to only handle the happy path.

Edited to add: Another manual intervention that is effective: rather than deleting the pod that is stuck in a restart loop, directly update the image in the pod itself. This will result in a working pod and allow the rollout to resume. Note that there is nothing that requires the image so used to be the same image as the one that the StatefulSet was updated with, but if this image update were to be done by the StatefulSet controller we’d get exactly the behavior I’d like to see. From an uninformed outsider’s perspective this seems like a small change to the rolling update logic.

Environment:

Kubernetes version (use kubectl version): 1.9
Cloud provider or hardware configuration/OS/etc: Mac OSX running minikube v0.24.1

About this issue

Original URL
State: open
Created 6 years ago
Reactions: 13
Comments: 42 (17 by maintainers)

Commits related to this issue

pkg/alertmanager: change podManagement policy to parallel to prevent statefulset reconciliation from hanging When using default podManagementPolicy it is possible to create a situation where alertman... — committed to paulfantom/prometheus-operator by paulfantom 5 years ago
pkg/alertmanager: change podManagement policy to parallel to prevent statefulset reconciliation from hanging When using default podManagementPolicy it is possible to create a situation where alertman... — committed to paulfantom/prometheus-operator by paulfantom 5 years ago
pkg/alertmanager: change podManagement policy to parallel to prevent statefulset reconciliation from hanging When using default podManagementPolicy it is possible to create a situation where alertman... — committed to talcoding/prometheus-operator by paulfantom 5 years ago
change statefulset podmangementpolicy to parallel https://github.com/kubernetes/kubernetes/issues/60164 — committed to r-lmr/lmrdashboard by maxaudron 4 years ago

Most upvoted comments

I can still see this in 1.14. Should we reopen this issue?

acondrat on May 11, 2020

This is an annoying issue as it requires manual intervention, and can really bog down a team that relies on automated workflows. I +1 reopening this

aiman-alsari on Sep 21, 2018

/reopen

I think there’s a process issue here. Automatically closing an issue If developers just ignore it for 3 months is astonishingly dysfunctional. Closing your eyes does not make the problem go away.

It’s been more than 5 years…

FUDCo on Aug 10, 2023

@kmangla9: You can’t reopen an issue/PR unless you authored it or you are a collaborator.

Your wish is my command…

/reopen

It’s been 2 1/2 years. Sheesh already.

FUDCo on Sep 29, 2020

I just tried to reproduce this (on 1.11.1). AFAICT it only affects StatefulSets with the default podManagementPolicy of OrderedReady, but not Parallel.

mxey on Jul 23, 2018

/reopen

tstraley on Dec 3, 2021

I can reproduce this issue, this is one of the biggest operational overheads in our setup. We usually delete the pod after the patch for a new image.

I have also ran into this issue and cannot find a way to recover the statefulset without losing the pv to pod mapping. This is very unfortunate and going to cause data loss trying to fix.

I cannot reproduce losing the pv to pod mapping.

jturolla-nubank on Nov 7, 2018

/reopen /lifecycle-frozen

I personally have experienced this issue and agree it’s super annoying.

lavalamp on Oct 24, 2018

(I guess it’s sig apps now)

lavalamp on Feb 22, 2018

This is not api machinery or architecture.

/sig workloads

lavalamp on Feb 22, 2018

I have also ran into this issue and cannot find a way to recover the statefulset without losing the pv to pod mapping. This is very unfortunate and going to cause data loss trying to fix.

rmb938 on Oct 24, 2018