kubernetes: Scheduler fails to schedule due to cyclic errors on pod forget and assume
What happened:
Found several pending pods in scheduler queue and they failed due to:
E0510 11:08:36.825986 1 framework.go:777] "Failed running Bind plugin" err="Operation cannot be fulfilled on pods/binding \"test-pod-6b9b7db86c-rjr4m\": rpc error: code = Unavailable desc = transport is closing" plugin="DefaultBinder" pod="test-namespace/test-pod-6b9b7db86c-rjr4m"
E0510 11:08:36.826066 1 scheduler.go:586] scheduler cache ForgetPod failed: pod c79ffa51-1b35-4eb6-8e09-783fad5fb5bf wasn't assumed so cannot be forgotten
...
...
E0510 11:08:38.107538 1 scheduler.go:367] scheduler cache AssumePod failed: pod c79ffa51-1b35-4eb6-8e09-783fad5fb5bf is in the cache, so can't be assumed
What you expected to happen:
Expected the pod to be scheduled/assumed on retry as it was not added in cache per message “wasn’t assumed so cannot be forgotten” previously.
In other words, expected schedulerCache.assumedPods
& schedulerCache.podStates
to be always in sync.
How to reproduce it (as minimally and precisely as possible):
Not sure
Environment:
- Kubernetes version (use
kubectl version
): 1.20.2 - Cloud provider or hardware configuration: GCP
- OS (e.g:
cat /etc/os-release
): Ubuntu 16.04.7 LTS
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 25 (7 by maintainers)
The pod should be completely removed from the queue if it was assigned a node, this is not happening right now.