kubernetes: rethink sorting order of scheduling failure reasons

What would you like to be added?

Now the scheduler surfaces a pod’s scheduling failure reasons by sorting the messages alphabetically:

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  4s    default-scheduler  0/500 nodes are available: 1 node(s) had taint {node-001: }, that the pod didn't tolerate, 1 node(s) had taint {node-002: }, that the pod didn't tolerate, 1 node(s) had taint {node-003: }, that the pod didn't tolerate, 1 node(s) had taint {node-004: }, that the pod didn't tolerate, 1 node(s) had taint {node-005: }, that the pod didn't tolerate, 1 node(s) had taint {node-006: }, that the pod didn't tolerate, 1 node(s) had taint {node-007: }, that the pod didn't tolerate, 1 node(s) had taint {node-008: }, that the pod didn't tolerate, 1 node(s) had taint {node-009: }, that the pod didn't tolerate, 1 node(s) had taint {node-010: }, that the pod didn't tolerate, 1 node(s) had taint {node-011: }, that the pod didn't tolerate, 1 node(s) had taint {node-012: }, that the pod didn't tolerate, 1 node(s) had taint {node-013: }, that the pod didn't tolerate, 1 node(s) had taint {node-014: }, that the pod didn't tolerate, 1 node(s) had taint {node-015: }, that the pod didn't tolerate, 1 node(s) had taint {node-016: }, ...

However, on a large cluster or upon diverse failure reasons, due to the message truncation, sometimes you cannot spot the most common failure reasons, and thus hard to troubleshoot. For now, the only way is to dig into the scheduler log, which is not that friendly.

It’d be nice to sort the failure reason by the frequency of their occurrences, so the event becomes:

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  4s    default-scheduler  0/500 nodes are available: 400 node(s) had taint {foo: }, that the pod didn't tolerate, 1 node(s) had taint {node-001: }, that the pod didn't tolerate, 1 node(s) had taint {node-002: }, that the pod didn't tolerate, 1 node(s) had taint {node-003: }, that the pod didn't tolerate, 1 node(s) had taint {node-004: }, that the pod didn't tolerate, 1 node(s) had taint {node-005: }, that the pod didn't tolerate, 1 node(s) had taint {node-006: }, that the pod didn't tolerate, 1 node(s) had taint {node-007: }, that the pod didn't tolerate, 1 node(s) had taint {node-008: }, that the pod didn't tolerate, 1 node(s) had taint {node-009: }, that the pod didn't tolerate, 1 node(s) had taint {node-010: }, that the pod didn't tolerate, 1 node(s) had taint {node-011: }, that the pod didn't tolerate, 1 node(s) had taint {node-012: }, that the pod didn't tolerate, 1 node(s) had taint {node-013: }, that the pod didn't tolerate, 1 node(s) had taint {node-014: }, that the pod didn't tolerate, 1 node(s) had taint {node-015: }, ...

Why is this needed?

Facilitate the end-users to quickly spot the most critical scheduling failure, without requesting SRE to look into scheduler logs.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 29 (25 by maintainers)

Most upvoted comments

node(s) had untolerated taints {key, val}

I’m going to make this minimum change in 1.24. And in 1.25, come up with a way to make the message compact and adaptive (like setting a maximum length limit for each plugin’s failure message, and if it exceeds, fall back to a compact messaging mode)

I think we can fix a few more:

node(s) didn't satisfy existing pods anti-affinity (remove word rules)

node(s) didn't have free ports for the incoming pod