kubernetes: pod stuck with NodeAffinity status // using spot VMs under K8s `1.22.x` and `1.23.x`

The same problem on 1.22.3-gke.700

_Originally posted by @maxpain in https://github.com/kubernetes/kubernetes/issues/98534#issuecomment-1003169567_

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 4
  • Comments: 18 (7 by maintainers)

Most upvoted comments

This happens with 1.25.8-gke.500 as well.

Steps to reproduce:

  1. Create a cluster with managed SPOT node pool having 1 node.
  2. Create an alpine deployment with 1 replica and a node selector on it.
      nodeSelector: 
        cloud.google.com/gke-spot: "true"
  1. Add a pause or sleep on the above.
  2. Simulate the node-maintenance step on the node the pod is scheduled on. Ref: https://cloud.google.com/compute/docs/instances/simulating-host-maintenance#gcloud
  3. Once the pod comes back up, it will have 1 dangling with NodeAffinity and the other in Running state.
NAME                                  READY   STATUS         RESTARTS   AGE
spot-graceful-test-868b9dd54f-9sh7j   1/1     Running        0          23s
spot-graceful-test-ff77554fd-fv42r    0/1     NodeAffinity   0          3m36s

/sig node

We are aware it is supposed to be fixed as of k8s 1.21, but we experienced it in the same context but under newer K8s version. All pods on a given node are stuck with NodeAffinity status, and will remain so until deleted, after which action they will be re-scheduled. The node is ready and otherwise healthy.

k8s version: v1.22.12-gke.1200 spot VMs: enable

Same experience on v1.25.10-gke.1400; lots of NodeAffinity pods after spot nodes are preempted.

This was also happening on 1.24.13-gke.2500; and we upgraded to attempt to reduce the noise.

Google says this is ‘fixed’ from 1.25.7-gke.1000 or later https://cloud.google.com/kubernetes-engine/docs/release-notes#April_14_2023 (but it’s not)

Sliced screenshot output of the equivalent of kubectl get po,no image

+1, after adding a new preemptible (not spot) node pool to a 1.23.13-gke.900 cluster and scheduling a deployment there I’ve also noticed this behaviour on the first couple preemptions

@jonpulsifer Do you mind to give some more evidence on how to reproduce the issue?

Good and/or bad news: I’ve seen the NodeAffinity status occur once under K8s 1.23, but now I have trouble reproducing.

The process is hanging for no stable steps to reproduce the issue in my mind.

+1 This happens in our GKE cluster to about 5% of the pods that run on preemptive nodes.

1.23.14-gke.401/1.23.12-gke.100