kubernetes: Kubelet rejects pod scheduled based on newly added node labels which have not been observed by the kubelet yet

What happened:

Whenever a node is added or updated, there is a small window where pods are scheduled to that node, before any beta labels are applied to it. This can cause issues with pods that are queued up to be scheduled and that have a NodeAffinity (in our case) to the now deprecated beta.kubernetes.io/os label.

What you expected to happen:

The proper labels to be applied to workers before the scheduling of pods on that node.

How to reproduce it (as minimally and precisely as possible):

(Not 100 percent success rate)

  • deploy 1.19 cluster with no workers
  • apply a deployment with a node affinity for the beta.kubernetes.io/os label
  • add worker

Anything else we need to know?:

I have been told this step used to be done on the worker side, but is now done on the master side. Which could explain why this is happening. https://github.com/kubernetes/kubernetes/blob/v1.19.0-rc.2/pkg/controller/nodelifecycle/node_lifecycle_controller.go#L1534-L1578

/sig scheduling

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 32 (21 by maintainers)

Commits related to this issue

Most upvoted comments

oh, I get now where the issue is, the scheduler sees that the label is applied but kubelet doesn’t, and so kubelet is not admitting the pod after the scheduler scheduled the pod.

I’ve met same issue with v1.19.3, reboot node normally the pod will enter NodeAffinity state.

The fix for this issue was released to v1.19.8+ in https://github.com/kubernetes/kubernetes/pull/97996