kubernetes: Scaling or deleting statefulset in 1.5 cluster with 1.6 kubectl fails

BUG REPORT

When running either kubectl delete statefulset <statefulset name> or kubectl scale statefulset <statefulset name> --replicas=<count> with kubectl 1.6.2 against a GKE cluster 1.5.6 it fails with an error like the following.

error: Scaling the resource failed with: StatefulSet.apps "<statefulset name>" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas' are forbidden.; Current resource version 100392498

This happened the other way around as well with kubectl 1.5.x against a GKE 1.6.x cluster.

For scaling there’s a working alernative by running kubectl patch statefulsets <statefulset name> -p '{"spec":{"replicas":'<count>'}}'. Unfortunately I haven’t found a workaround yet for deleting the statefulset.

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:33:11Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.6", GitCommit:"114f8911f9597be669a747ab72787e0bd74c9359", GitTreeState:"clean", BuildDate:"2017-03-28T13:36:31Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Google Container Engine 1.6.2 with cos image:
  • OS (e.g. from /etc/os-release):
BUILD_ID=9000.84.2
NAME="Container-Optimized OS"
GOOGLE_CRASH_ID=Lakitu
VERSION_ID=56
BUG_REPORT_URL=https://crbug.com/new
PRETTY_NAME="Container-Optimized OS from Google"
VERSION=56
GOOGLE_METRICS_PRODUCT_ID=26
HOME_URL="https://cloud.google.com/compute/docs/containers/vm-image/"
ID=cos
  • Kernel (e.g. uname -a):
Linux gke-production-europe-we-auto-scaling-917da0af-zzzj 4.4.21+ #1 SMP Fri Feb 17 15:34:45 PST 2017 x86_64 Intel(
R) Xeon(R) CPU @ 2.50GHz GenuineIntel GNU/Linux

What happened:

  • kubectl delete statefulset <statefulset name> failed with updates to statefulset spec for fields other than 'replicas' are forbidden error
  • kubectl scale statefulset <statefulset name> --replicas=<count> failed with updates to statefulset spec for fields other than 'replicas' are forbidden error

What you expected to happen:

Both actions to succeed

How to reproduce it (as minimally and precisely as possible):

  • Create a statefulset in a GKE 1.5.6 cluster
  • Run the above commands with kubectl 1.6.2

Anything else we need to know:

Happens for kubectl 1.5.x with GKE 1.6.x as well.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 1
  • Comments: 29 (18 by maintainers)

Most upvoted comments

Deleting them through the dashboard works fine, so I have a workaround without downgrading my kubectl version 😃

This may be an older issue, but still affected us because we have some older kube clusters running 1.5.1.
I was trying to use statefulsets and ran into the same issues/blockers noted above, where for me, the issue/problem was that I couldn’t delete the statefulsets which was a blocker.
So, effectively I had defunct statefulsets lingering which prevented me from redeploying and thus made me very sad.

What worked for me (YMMV) and allowed me to purge the defunct statefulsets (that could not be deleted) was to effectively redeploy the manifest files via helm (in the same namespace) and then delete the helm release.

I did this on a development cluster, please don’t use this method in prod or if you don’t understand the consequences.

I expected that the helm deployment would fail, due to existing statefulsets which it did. The next step was to simply delete the now failed helm release and it purged all the lingering statefulset files for me–which couldn’t be deleted with everything I tried --grace-period=0 --cascade=true

In my case this was pretty simple where foo is the previous namespace with the defunct lingering statefulsets. . .

mkdir -p ./foo/templates
mv *.yaml !$
# create simple Chart.yaml file in ./foo
helm lint ./foo # not that I really care here but being nice
helm install  --debug ./foo  --namespace=foo
sleep 30
foo_release=$(helm list  | grep 'foo' | awk '{print $1}')
helm delete $foo_release
kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.6", GitCommit:"4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c", GitTreeState:"clean", BuildDate:"2017-09-15T08:51:21Z", GoVersion:"go1.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1+coreos.0", GitCommit:"cc65f5321f9230bf9a3fa171155c1213d6e3480e", GitTreeState:"clean", BuildDate:"2016-12-14T04:08:28Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Sadly quite dirty, but if you can’t get rid of the statefulsets, it was helm to the rescue (at least for me/us). YMMV, hope this helps someone else out.

For future reference, the problem is due to immutability of InitContainers and Affinity pod spec that fails validation check. It can be fixed by: (1) using same version of client and server (2) using newer version of Kubernetes

As there are 2 convenient fixes, I think it is not justified to work on backporting the change to v1.5. The scale endpoint will be fixed and rolled out in future releases. Please refer this issue (work in progress) for more information: https://github.com/kubernetes/kubernetes/issues/46005

Sure, putting a hack in to a beta feature is ok. But it’s a requirement for GA anyway

@crimsonfaith91 @lcjlcj kubectl 1.5.7 works

curl -O https://storage.googleapis.com/kubernetes-release/release/v1.5.7/bin/linux/amd64/kubectl && chmod +x kubectl

Sanity check - are stateful sets using the scale endpoint? If not… that’s what we should be fixing. The scale endpoint is allowed to do more efficient things, but fundamentally all edits are going to result in some defaulting.

@lcjlcj If the issue still blocks you, you may consider using kubectl v1.5. Tell me if you need any help. Thanks!

@crimsonfaith91 it was the last 1.5.x kubectl bundled with gcloud before it got updated to 1.6.x

If I look at the release notes at https://cloud.google.com/sdk/docs/release-notes#15000_2017-04-05 it must have been 1.5.4:

Updated Google Container Engine’s kubectl from version 1.5.4 to 1.6.0.

Container Engine had just been upgraded to 1.6.0 at that time, looking at the release notes at https://cloud.google.com/container-engine/release-notes#april_4_2017

/assign @foxish who is helping @crimsonfaith91