kubernetes: pod stuck with NodeAffinity status // using spot VMs under K8s `1.22.x` and `1.23.x`
The same problem on 1.22.3-gke.700
_Originally posted by @maxpain in https://github.com/kubernetes/kubernetes/issues/98534#issuecomment-1003169567_
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 4
- Comments: 18 (7 by maintainers)
This happens with 1.25.8-gke.500 as well.
Steps to reproduce:
/sig node
We are aware it is supposed to be fixed as of k8s 1.21, but we experienced it in the same context but under newer K8s version. All pods on a given node are stuck with NodeAffinity status, and will remain so until deleted, after which action they will be re-scheduled. The node is ready and otherwise healthy.
k8s version: v1.22.12-gke.1200 spot VMs: enable
Same experience on
v1.25.10-gke.1400
; lots of NodeAffinity pods after spot nodes are preempted.This was also happening on
1.24.13-gke.2500
; and we upgraded to attempt to reduce the noise.Google says this is ‘fixed’ from
1.25.7-gke.1000 or later
https://cloud.google.com/kubernetes-engine/docs/release-notes#April_14_2023 (but it’s not)Sliced screenshot output of the equivalent of
kubectl get po,no
@jonpulsifer Do you mind to give some more evidence on how to reproduce the issue?
The process is hanging for no stable steps to reproduce the issue in my mind.
+1 This happens in our GKE cluster to about 5% of the pods that run on preemptive nodes.
1.23.14-gke.401/1.23.12-gke.100