kubernetes: pod_scheduling_durating_seconds includes the time a Pod fails PreEnqueue

What happened?

pod_scheduling_durating_seconds is recording the time that a Pod is gated.

We use the timestamp when the scheduler inserts the pod into the queue: https://github.com/kubernetes/kubernetes/blob/84c8abfb8bf900ce36f7ebfbc52794bad972d8cc/pkg/scheduler/internal/queue/scheduling_queue.go#L402

What did you expect to happen?

The period of time when a Pod fails PreEnque (like being gated) shouldn’t be accounted in the pod_scheduling_duration_seconds.

How can we reproduce it (as minimally and precisely as possible)?

Create a Pod with scheduling gates. Wait some time before removing the gate. Observe the pod_scheduling_duration_seconds metric

Anything else we need to know?

No response

Kubernetes version

1.26+

Cloud provider

Any

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 23 (22 by maintainers)

Most upvoted comments

We don’t have a conclusion yet, but as a bug, it should qualify for changes after the freezes.

@helayoty would you want to take a stab on this issue?