kubernetes: Pods fail with "NodeAffinity failed" after kubelet restarts
What happened:
The issue is basically same as https://github.com/kubernetes/kubernetes/issues/92067.
With the fix https://github.com/kubernetes/kubernetes/pull/94087 in place, kubelet will node lister to sync in GetNode().
However, in the case of kubelet restart, the pods scheduled on the node before the restart might still fail with “NodeAffinity failed” after the restart. Looking at the code, this is probably because the admit pod check (canAdmitPod()) might happen before GetNode().
What you expected to happen:
After kubelet restart, old pods (pods scheduled on the node before the restart) do not see “NodeAffinity failed”.
How to reproduce it (as minimally and precisely as possible):
This issue does not happen all the time. To reproduce it, you will need to keep restarting the kubelet, and you might see a previously running Pod started to fail with “Predicate NodeAffinity failed”.
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): - Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
): - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 10
- Comments: 28 (16 by maintainers)
on GKE
v1.19.11-gke.2101
is reproducible as well, please @phantooom consider the re-openingissue is still reproduced in GKE
1.19.8-gke.1600
sounds like something that regressed after the node sync changes, but the second one (that i did) did not fix it.
the change that i did: https://github.com/kubernetes/kubernetes/pull/99336
was technically a refactor on what was already established by the previous change: https://github.com/kubernetes/kubernetes/pull/94087
this seems racy and should be brought to discussion at the SIG Node meeting https://github.com/kubernetes/community/tree/master/sig-node#meetings
kubelet maintainers that are more savvy must be able to reproduce it:
we have a lot of GKE reporters in this ticket. has anyone seen the problem on non-GKE clusters?
After upgrading GKE to v1.18.19-gke.1700 I experienced the same issue - some of the pods after node preemption moved to NodeAffinity status
It should be fixed in 1.18.19, v1.19.10, v1.20.7 and v1.21.1.
For GKE upgrade, I think it should be asked in GKE service? /triage duplicate /close
Same here, as far as i know this is fixed in 1.18.19
fix in https://github.com/kubernetes/kubernetes/pull/99336#issuecomment-824441152 cherrypicked to 1.18 in https://github.com/kubernetes/kubernetes/pull/101343
also affectx up to 1.21 btw, check that PR to see the commit for each version
We got first affected by this issue after upgrading our GKE cluster from
v1.17.17-gke.2800
to1.18.17-gke.700
for pods running on pre-emptible nodes. Is this k8s1.18+
specific?FYI this is also present in gke
1.18.17-gke.700
, i did hope they would backport the patch since yesterday the .700 was released to stable channel but that is not the case.Luckily, for us, this is only an issue with
preemptible
nodes since that is effectively a node restartWill wait for 1.18.19 impatiently. 🤞
same