kubernetes: Modifying nodeSelector on StatefulSet doesn't reschedule Pods

/kind bug

What happened: Changing nodeSelector of a StatefulSet doesn’t trigger rescheduling of it’s existing pods. I kubectl apply the StatefulSet below, and wait for it’s Pods to get scheduled onto Nodes with label node_type: type1. Then I change the nodeSelector label to node_type: type2 and do kubectl apply again.

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: sstest
  labels:
    app: sstest
spec:
  replicas: 2
  serviceName: "service"
  template:
    metadata:
      labels:
        app: sstest
    spec:
      nodeSelector:
        node_type: type1
      containers:
            - name: nginx
              image: k8s.gcr.io/nginx-slim:0.8

What you expected to happen:

I expect the Pods to be rescheduled to the type2 Nodes, but nothing happens. The Pods only get rescheduled if I manually kill them using kubectl delete pod sstest-0.

How to reproduce it (as minimally and precisely as possible):

  1. Create a cluster in which some nodes have the label node_type: type1 while some other nodes have the label node_type: type2.
  2. Apply the above StatefulSet definition to the cluster.
  3. Change the nodeSelector from node_type: type1 to node_type: type2 in the StatefulSet definition file.
  4. Apply the file again.
  5. Kill a Pod manually to verify that it gets rescheduled to another node.

Anything else we need to know?:

I tested the same scenario with Deployments. In that case the rescheduling worked as expected.

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:28:34Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.4-gke.1", GitCommit:"04502ae78d522a3d410de3710e1550cfb16dad4a", GitTreeState:"clean", BuildDate:"2017-12-08T17:24:53Z", GoVersion:"go1.8.3b4", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: GKE
  • OS (e.g. from /etc/os-release): cos

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 10
  • Comments: 37 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Is it safe to assume this is not the intended behavior and will be fixed sometime in the future?

Same issue here 😕

It seem to be even worse in 1.21.2. Not even deleting a pod works; the new pod still has the old nodeSelector. I had to resort to deleting the StatefulSet and creating it again.

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten /remove-lifecycle stale

Yup this is still a problem. Just witnessed on 1.19.11 which happens to be my blocker to upgrading 😢

Can this issue be re-opened please?

Yup, makes sense.

/remove-lifecycle rotten /reopen

Can this issue be re-opened please? cc @nikhita

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

/lifecycle frozen

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

/remove-lifecycle stale

/remove-lifecycle rotten

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

@adam-sandor @dims I’ll have a look