scheduler-plugins: the states in PodGroup is not accurate
Area
- Scheduler
- Controller
- Helm Chart
- Documents
Other components
No response
What happened?
after migrated to controller runtime, podGroup.status.scheduled count will not be updated by PostBind, and the phase transform seem not working as expected
What did you expect to happen?
PodGroup will reflect on pods states change and update in PodGroup.status.phase
but it is now accurate and sometimes wrong, we need to discuss the expected states change and make it always right.
let’s discuss the state flow before working on it:
currently, we have the following states:
- Pending: pod group has been accepted by the system
- Running: minMember pods of the pod group are in running phase.
- PreScheduling: all pods of the pod group have enqueued and are waiting to be scheduled
- Scheduling: partial pods have been scheduled and are in running phase, not meet minMember
- Scheduled: minMember pods have been scheduled and are in running phase. @Huang-Wei, is this right? seems duplicated with running
- Unknown: part of pods scheduled, and some not
- Finished: minMember pods are successfully finished
- Failed: at least one of pods have failed
Please notice, the following phase only shows my understanding of the defined phase in the code, it may be a misunderstanding or could be discussed to improve
stateDiagram-v2
state if_minMember <<choice>>
[*] --> Pending
Pending --> PreScheduling: pods added
PreScheduling --> Scheduling: some of the pods scheduled
Scheduling --> Scheduled: minMember pods scheduled, but not running
Scheduled --> Running: minMember pods scheduled and running
Running --> Failed: at least one of the pods failed
Failed --> if_minMember: failed fixed
if_minMember --> Scheduling: minMember does not meet
if_minMember --> Scheduled: minMember meet
Running --> Finished: all pods successfully finished
Finished --> [*]
How can we reproduce it (as minimally and precisely as possible)?
- create a podGroup with minMember 3
- create 3 pods in podGroup
- change 1 of the pods to make it unschedulable
- we can see the phase not working as expected and scheduled count not right.
Anything else we need to know?
No response
Kubernetes version
$ kubectl version
1.25.7
Scheduler Plugins version
0.25.7
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 20 (20 by maintainers)
I may delegate the review work to @denkensk as he was the original author. My 2 cents about the status revamp work:
/assign @denkensk as primary reviewer.
thanks @Gekko0114
/close
We can close it since I completed PR
https://github.com/kubernetes-sigs/scheduler-plugins/pull/574 Updated the PR.
@zwpaper, @denkensk Sure, I agree with you. Thanks for clarifying the discussion. I will implement it.
Hi @denkensk, I would like to hear your suggestions regarding this issue. Could you comment at your convenience?
Scheduled could mean scheduled but not yet running? it may be the original design intention
if we keep the scheduled phase, then status.scheduled would also be kept.
the point may be that is there a phase pods scheduled but not running and whether we should expose this phase to users.
for example, pods scheduled but stuck on a
ContainerCreating
or some other status