kubernetes: Scheduler fails to schedule due to cyclic errors on pod forget and assume

What happened:

Found several pending pods in scheduler queue and they failed due to:

E0510 11:08:36.825986       1 framework.go:777] "Failed running Bind plugin" err="Operation cannot be fulfilled on pods/binding \"test-pod-6b9b7db86c-rjr4m\": rpc error: code = Unavailable desc = transport is closing" plugin="DefaultBinder" pod="test-namespace/test-pod-6b9b7db86c-rjr4m"
E0510 11:08:36.826066       1 scheduler.go:586] scheduler cache ForgetPod failed: pod c79ffa51-1b35-4eb6-8e09-783fad5fb5bf wasn't assumed so cannot be forgotten
...
...
E0510 11:08:38.107538       1 scheduler.go:367] scheduler cache AssumePod failed: pod c79ffa51-1b35-4eb6-8e09-783fad5fb5bf is in the cache, so can't be assumed

What you expected to happen:

Expected the pod to be scheduled/assumed on retry as it was not added in cache per message “wasn’t assumed so cannot be forgotten” previously. In other words, expected schedulerCache.assumedPods & schedulerCache.podStates to be always in sync.

How to reproduce it (as minimally and precisely as possible):

Not sure

Environment:

  • Kubernetes version (use kubectl version): 1.20.2
  • Cloud provider or hardware configuration: GCP
  • OS (e.g: cat /etc/os-release): Ubuntu 16.04.7 LTS

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 25 (7 by maintainers)

Most upvoted comments

@ahg-g Wouldn’t a subsequent OnUpdate event from kubelet after updating nodeName make the cache consistent?

The pod should be completely removed from the queue if it was assigned a node, this is not happening right now.