kubernetes: Kubelet rejects pod scheduled based on newly added node labels which have not been observed by the kubelet yet

What happened:

Whenever a node is added or updated, there is a small window where pods are scheduled to that node, before any beta labels are applied to it. This can cause issues with pods that are queued up to be scheduled and that have a NodeAffinity (in our case) to the now deprecated beta.kubernetes.io/os label.

What you expected to happen:

The proper labels to be applied to workers before the scheduling of pods on that node.

How to reproduce it (as minimally and precisely as possible):

(Not 100 percent success rate)

deploy 1.19 cluster with no workers
apply a deployment with a node affinity for the beta.kubernetes.io/os label
add worker

Anything else we need to know?:

I have been told this step used to be done on the worker side, but is now done on the master side. Which could explain why this is happening. https://github.com/kubernetes/kubernetes/blob/v1.19.0-rc.2/pkg/controller/nodelifecycle/node_lifecycle_controller.go#L1534-L1578

/sig scheduling

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 1
Comments: 32 (21 by maintainers)

Commits related to this issue

salt: Use `beta.kubernetes.io/os` in NodeSelector For Deployment and DaemonSet if we rely on some label not created by kubelet directly like `kubernetes.io/os` the Pod can be scheduled on a Node runn... — committed to scality/metalk8s by TeddyAndrieux 4 years ago

Most upvoted comments

oh, I get now where the issue is, the scheduler sees that the label is applied but kubelet doesn’t, and so kubelet is not admitting the pod after the scheduler scheduled the pod.

+11

ahg-g on Jul 24, 2020

I’ve met same issue with v1.19.3, reboot node normally the pod will enter NodeAffinity state.

The fix for this issue was released to v1.19.8+ in https://github.com/kubernetes/kubernetes/pull/97996

liggitt on Feb 25, 2021