descheduler: nodeFit = false doesn't work as expected with RemovePodsViolatingNodeAffinity

What version of descheduler are you using?

descheduler version: 0.22

Does this issue reproduce with the latest release? Yes

Which descheduler CLI options are you using?

        - "--policy-config-file"
        - "/policy-dir/policy.yaml"
        - "--descheduling-interval"
        - "30s"
        - "--v"
        - "4"

Please provide a copy of your descheduler policy config file

apiVersion: v1
kind: ConfigMap
metadata:
  name: descheduler-policy-configmap
  namespace: kube-system
data:
  policy.yaml: |
    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    strategies:
      "RemovePodsViolatingNodeAffinity":
        enabled: true
        params:
          nodeAffinityType:
            - "requiredDuringSchedulingIgnoredDuringExecution"
          labelSelector:
            matchExpressions:
              - {key: "foo", operator: In, values: [bar]}
          nodeFit: false

What k8s version are you using (kubectl version)?

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.9", GitCommit:"7a576bc3935a6b555e33346fd73ad77c925e9e4a", GitTreeState:"clean", BuildDate:"2021-07-15T21:01:38Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.9", GitCommit:"7a576bc3935a6b555e33346fd73ad77c925e9e4a", GitTreeState:"clean", BuildDate:"2021-07-15T20:56:38Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

I created a deployment with a node selector forcing the pods to land on a specific single node. The deployment pod template had a node affinity rule, which was not triggered during scheduling. After scheduling, I changed the node labels so that the affinity should trigger descheduling, and labeled the pods with the required foo=bar label.

What did you expect to see? I expected the descheduler to evict my pod, and the pod to end up in PENDING state as there is no node where it could fit.

What did you see instead? Descheduler prints that the pod will not fit the node, but it won’t evict it.

The culprit is at the strategy, node_affinity.go line 89. https://github.com/kubernetes-sigs/descheduler/blob/5b557941fac60cd85c1a3b709eb49a08d648fd8d/pkg/descheduler/strategies/node_affinity.go#L89

The node affinity strategy will always check for finding a node that fits, regardless of how nodeFit is set. The documentation of the nodeFit filtering would suggest that the default is not to check for node fitting: https://github.com/kubernetes-sigs/descheduler/blob/5b557941fac60cd85c1a3b709eb49a08d648fd8d/README.md#L703-L707

It would be logical, that when nodeFit=false, the pod would be evicted regardless of whether it fits somewhere, or not. A one-line change would fix this. Line 89 could be e.g. changed to something like: (!nodeFit || nodeutil.PodFitsAnyNode(pod, nodes))

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 32 (19 by maintainers)

Most upvoted comments

Adding those options to NodeFit sounds good to me, but maybe we should split that into a separate follow-up task after converting the current NodeFit to a plugin.

Then at that point it’s just a discussion of a change for a single plugin. I don’t think that has any architectural impact on the framework refactor.

I think we should freeze all feature requests that we plan to tackle (for sure) at some point in the future. Does not make sense to keep managing stale or rotten labels on those, if we are sure we want to address it at some point

@RyanDevlin I verified that your patches fix this