kubernetes: rethink sorting order of scheduling failure reasons
What would you like to be added?
Now the scheduler surfaces a pod’s scheduling failure reasons by sorting the messages alphabetically:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4s default-scheduler 0/500 nodes are available: 1 node(s) had taint {node-001: }, that the pod didn't tolerate, 1 node(s) had taint {node-002: }, that the pod didn't tolerate, 1 node(s) had taint {node-003: }, that the pod didn't tolerate, 1 node(s) had taint {node-004: }, that the pod didn't tolerate, 1 node(s) had taint {node-005: }, that the pod didn't tolerate, 1 node(s) had taint {node-006: }, that the pod didn't tolerate, 1 node(s) had taint {node-007: }, that the pod didn't tolerate, 1 node(s) had taint {node-008: }, that the pod didn't tolerate, 1 node(s) had taint {node-009: }, that the pod didn't tolerate, 1 node(s) had taint {node-010: }, that the pod didn't tolerate, 1 node(s) had taint {node-011: }, that the pod didn't tolerate, 1 node(s) had taint {node-012: }, that the pod didn't tolerate, 1 node(s) had taint {node-013: }, that the pod didn't tolerate, 1 node(s) had taint {node-014: }, that the pod didn't tolerate, 1 node(s) had taint {node-015: }, that the pod didn't tolerate, 1 node(s) had taint {node-016: }, ...
However, on a large cluster or upon diverse failure reasons, due to the message truncation, sometimes you cannot spot the most common failure reasons, and thus hard to troubleshoot. For now, the only way is to dig into the scheduler log, which is not that friendly.
It’d be nice to sort the failure reason by the frequency of their occurrences, so the event becomes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4s default-scheduler 0/500 nodes are available: 400 node(s) had taint {foo: }, that the pod didn't tolerate, 1 node(s) had taint {node-001: }, that the pod didn't tolerate, 1 node(s) had taint {node-002: }, that the pod didn't tolerate, 1 node(s) had taint {node-003: }, that the pod didn't tolerate, 1 node(s) had taint {node-004: }, that the pod didn't tolerate, 1 node(s) had taint {node-005: }, that the pod didn't tolerate, 1 node(s) had taint {node-006: }, that the pod didn't tolerate, 1 node(s) had taint {node-007: }, that the pod didn't tolerate, 1 node(s) had taint {node-008: }, that the pod didn't tolerate, 1 node(s) had taint {node-009: }, that the pod didn't tolerate, 1 node(s) had taint {node-010: }, that the pod didn't tolerate, 1 node(s) had taint {node-011: }, that the pod didn't tolerate, 1 node(s) had taint {node-012: }, that the pod didn't tolerate, 1 node(s) had taint {node-013: }, that the pod didn't tolerate, 1 node(s) had taint {node-014: }, that the pod didn't tolerate, 1 node(s) had taint {node-015: }, ...
Why is this needed?
Facilitate the end-users to quickly spot the most critical scheduling failure, without requesting SRE to look into scheduler logs.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 29 (25 by maintainers)
I’m going to make this minimum change in 1.24. And in 1.25, come up with a way to make the message compact and adaptive (like setting a maximum length limit for each plugin’s failure message, and if it exceeds, fall back to a compact messaging mode)
I think we can fix a few more:
node(s) didn't satisfy existing pods anti-affinity
(remove word rules)node(s) didn't have free ports for the incoming pod