volcano: Fair sharing not working

What happened: My cluster has total 11 CPU. I’m trying to create 2 queue(excluding default queue) with weight 5 for each queue. Queue manifest,

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: test
spec:
  weight: 5

---

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: test1
spec:
  weight: 5

Queue List,

Name                     Weight  State   Inqueue Pending Running Unknown
default                  1       Open    0       0       0       0
test                     5       Open    0       0       0       0
test1                    5       Open    0       0       0       0

Created 3 Jobs for test queue with CPU resource as follow, job1 -> CPU 5 job2 -> CPU 5 job3 -> CPU 1

Now all 3 jobs are running and utilizing full cluster.

Now i’m creating new Job in test1 queue with CPU 2. I’m expecting 1 Job will be evicted from test queue and Job in test1 queue will be running. But Job in test1 queue is in Inqueue state.

Name                     Weight  State   Inqueue Pending Running Unknown
default                  1       Open    0       0       0       0
test                     5       Open    0       0       3       0
test1                    5       Open    1       0       0       0

Configuration,

actions: "enqueue, allocate, backfill"
tiers:
- plugins:
  - name: priority
  - name: gang
  - name: conformance
- plugins:
  - name: drf
  - name: predicates
  - name: proportion
  - name: nodeorder
  - name: binpack

What you expected to happen: I’m expecting 1 Job will be evicted from test queue and Job in test1 queue will be running. But Job in test1 queue is in Inqueue state. How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Volcano Version: v1.3.0
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 25 (20 by maintainers)

Most upvoted comments

@shinytang6 Thank you for the response. I did tried to build docker image from master branch and test it, but still it doesn’t work.

Ideally test queue should deserved 5000 and test1 should deserved 5000 and default queue should deserved 1000. Do i need to share any other log?

Below is the log from master branch docker images,

I0806 22:58:42.564073       1 scheduler.go:91] Start scheduling ...
I0806 22:58:42.564206       1 cache.go:840] The priority of job <default/test-job-new> is <high-pri/0>
I0806 22:58:42.564223       1 cache.go:840] The priority of job <default/test-job2> is </0>
I0806 22:58:42.564224       1 cache.go:840] The priority of job <default/test-job1> is </0>
I0806 22:58:42.564254       1 cache.go:840] The priority of job <default/test-job> is </0>
I0806 22:58:42.564305       1 cache.go:878] There are <4> Jobs, <3> Queues and <1> Nodes in total for scheduling.
I0806 22:58:42.564329       1 session.go:165] Open Session d1cb963c-8e81-4e8c-b5e5-e920535ee55f with <4> Job and <3> Queues
I0806 22:58:42.564345       1 proportion.go:73] The total resource is <cpu 12000.00, memory 26644484096.00, hugepages-2Mi 0.00>
I0806 22:58:42.564357       1 proportion.go:77] Considering Job <default/test-job2>.
I0806 22:58:42.564362       1 proportion.go:95] Added Queue <test> attributes.
I0806 22:58:42.564366       1 proportion.go:77] Considering Job <default/test-job1>.
I0806 22:58:42.564368       1 proportion.go:77] Considering Job <default/test-job>.
I0806 22:58:42.564371       1 proportion.go:77] Considering Job <default/test-job-new>.
I0806 22:58:42.564374       1 proportion.go:95] Added Queue <test1> attributes.
I0806 22:58:42.564385       1 proportion.go:153] Considering Queue <test>: weight <5>, total weight <10>.
I0806 22:58:42.564391       1 proportion.go:173] Format queue <test> deserved resource to <cpu 6000.00, memory 0.00, hugepages-2Mi 0.00>
I0806 22:58:42.564398       1 proportion.go:177] The attributes of queue <test> in proportion: deserved <cpu 6000.00, memory 0.00, hugepages-2Mi 0.00>, allocate <cpu 11000.00, memory 0.00>, request <cpu 11000.00, memory 0.00>, share <1.83>
I0806 22:58:42.564406       1 proportion.go:153] Considering Queue <test1>: weight <5>, total weight <10>.
I0806 22:58:42.564414       1 proportion.go:170] queue <test1> is meet
I0806 22:58:42.564418       1 proportion.go:177] The attributes of queue <test1> in proportion: deserved <cpu 500.00, memory 0.00>, allocate <cpu 0.00, memory 0.00>, request <cpu 500.00, memory 0.00>, share <0.00>
I0806 22:58:42.564442       1 proportion.go:189] Remaining resource is  <cpu 5500.00, memory 26644484096.00, hugepages-2Mi 0.00>
I0806 22:58:42.564457       1 proportion.go:153] Considering Queue <test>: weight <5>, total weight <5>.
I0806 22:58:42.564467       1 proportion.go:170] queue <test> is meet
I0806 22:58:42.564476       1 proportion.go:177] The attributes of queue <test> in proportion: deserved <cpu 11000.00, memory 0.00>, allocate <cpu 11000.00, memory 0.00>, request <cpu 11000.00, memory 0.00>, share <1.00>
I0806 22:58:42.564484       1 proportion.go:153] Considering Queue <test1>: weight <5>, total weight <5>.
I0806 22:58:42.564490       1 proportion.go:189] Remaining resource is  <cpu 500.00, memory 26644484096.00, hugepages-2Mi 0.00>
I0806 22:58:42.564514       1 proportion.go:142] Exiting when total weight is 0
I0806 22:58:42.564524       1 drf.go:206] Total Allocatable cpu 12000.00, memory 26644484096.00, hugepages-2Mi 0.00
I0806 22:58:42.564794       1 binpack.go:158] Enter binpack plugin ...
I0806 22:58:42.564814       1 binpack.go:177] resources [] record in weight but not found on any node
I0806 22:58:42.564820       1 binpack.go:161] Leaving binpack plugin. binpack.weight[1], binpack.cpu[1], binpack.memory[1], no extend resources. ...
I0806 22:58:42.564826       1 enqueue.go:44] Enter Enqueue ...
I0806 22:58:42.564830       1 enqueue.go:62] Added Queue <test1> for Job <default/test-job-new>
I0806 22:58:42.564834       1 enqueue.go:62] Added Queue <test> for Job <default/test-job2>
I0806 22:58:42.564838       1 enqueue.go:78] Try to enqueue PodGroup to 0 Queues
I0806 22:58:42.564842       1 enqueue.go:103] Leaving Enqueue ...
I0806 22:58:42.564846       1 allocate.go:43] Enter Allocate ...
I0806 22:58:42.564851       1 job_info.go:561] job test-job-new/default actual: map[bash-new:1], ji.TaskMinAvailable: map[bash-new:1]
I0806 22:58:42.564859       1 allocate.go:90] Added Job <default/test-job-new> into Queue <test1>
I0806 22:58:42.564862       1 job_info.go:561] job test-job2/default actual: map[bash:1], ji.TaskMinAvailable: map[bash:1]
I0806 22:58:42.564866       1 allocate.go:90] Added Job <default/test-job2> into Queue <test>
I0806 22:58:42.564869       1 job_info.go:561] job test-job1/default actual: map[bash:1], ji.TaskMinAvailable: map[bash:1]
I0806 22:58:42.564872       1 allocate.go:90] Added Job <default/test-job1> into Queue <test>
I0806 22:58:42.564875       1 priority.go:70] Priority JobOrderFn: <default/test-job1> priority: 0, <default/test-job2> priority: 0
I0806 22:58:42.564878       1 drf.go:413] DRF JobOrderFn: <default/test-job1> share state: 0.4166666666666667, <default/test-job2> share state: 0.08333333333333333
I0806 22:58:42.564882       1 job_info.go:561] job test-job/default actual: map[bash:1], ji.TaskMinAvailable: map[bash:1]
I0806 22:58:42.564906       1 allocate.go:90] Added Job <default/test-job> into Queue <test>
I0806 22:58:42.564916       1 priority.go:70] Priority JobOrderFn: <default/test-job> priority: 0, <default/test-job2> priority: 0
I0806 22:58:42.564934       1 drf.go:413] DRF JobOrderFn: <default/test-job> share state: 0.4166666666666667, <default/test-job2> share state: 0.08333333333333333
I0806 22:58:42.564955       1 allocate.go:94] Try to allocate resource to 1 Namespaces
I0806 22:58:42.564958       1 allocate.go:109] unlockedNode ID: 62db948c-9907-4163-b4cc-a03f9741ea2d, Name: docker-desktop
I0806 22:58:42.564987       1 allocate.go:162] Try to allocate resource to Jobs in Namespace <default> Queue <test1>
I0806 22:58:42.564994       1 allocate.go:196] Try to allocate resource to 1 tasks of Job <default/test-job-new>
I0806 22:58:42.564998       1 allocate.go:204] There are <1> nodes for Job <default/test-job-new>
I0806 22:58:42.565063       1 scheduler_helper.go:97] Considering Task <default/test-job-new-bash-new-0> on node <docker-desktop>: <cpu 500.00, memory 0.00> vs. <cpu 150.00, memory 26392825856.00, hugepages-2Mi 0.00>
I0806 22:58:42.565084       1 scheduler_helper.go:102] Predicates failed for task <default/test-job-new-bash-new-0> on node <docker-desktop>: task default/test-job-new-bash-new-0 on node docker-desktop fit failed: node(s) resource fit failed
I0806 22:58:42.565107       1 statement.go:351] Discarding operations ...
I0806 22:58:42.565115       1 allocate.go:162] Try to allocate resource to Jobs in Namespace <default> Queue <test>
I0806 22:58:42.565119       1 priority.go:70] Priority JobOrderFn: <default/test-job1> priority: 0, <default/test-job> priority: 0
I0806 22:58:42.565121       1 drf.go:413] DRF JobOrderFn: <default/test-job1> share state: 0.4166666666666667, <default/test-job> share state: 0.4166666666666667
I0806 22:58:42.565125       1 gang.go:118] Gang JobOrderFn: <default/test-job1> is ready: true, <default/test-job> is ready: true
I0806 22:58:42.565131       1 allocate.go:196] Try to allocate resource to 0 tasks of Job <default/test-job2>
I0806 22:58:42.565135       1 statement.go:376] Committing operations ...
I0806 22:58:42.565142       1 allocate.go:162] Try to allocate resource to Jobs in Namespace <default> Queue <test>
I0806 22:58:42.565148       1 allocate.go:196] Try to allocate resource to 0 tasks of Job <default/test-job>
I0806 22:58:42.565150       1 statement.go:376] Committing operations ...
I0806 22:58:42.565182       1 allocate.go:162] Try to allocate resource to Jobs in Namespace <default> Queue <test>
I0806 22:58:42.565226       1 allocate.go:196] Try to allocate resource to 0 tasks of Job <default/test-job1>
I0806 22:58:42.565232       1 statement.go:376] Committing operations ...
I0806 22:58:42.565248       1 allocate.go:158] Namespace <default> have no queue, skip it
I0806 22:58:42.565272       1 allocate.go:275] Leaving Allocate ...
I0806 22:58:42.565278       1 backfill.go:41] Enter Backfill ...
I0806 22:58:42.565281       1 job_info.go:561] job test-job/default actual: map[bash:1], ji.TaskMinAvailable: map[bash:1]
I0806 22:58:42.565287       1 job_info.go:561] job test-job-new/default actual: map[bash-new:1], ji.TaskMinAvailable: map[bash-new:1]
I0806 22:58:42.565294       1 job_info.go:561] job test-job2/default actual: map[bash:1], ji.TaskMinAvailable: map[bash:1]
I0806 22:58:42.565319       1 job_info.go:561] job test-job1/default actual: map[bash:1], ji.TaskMinAvailable: map[bash:1]
I0806 22:58:42.565348       1 backfill.go:90] Leaving Backfill ...
I0806 22:58:42.565352       1 reclaim.go:41] Enter Reclaim ...
I0806 22:58:42.565355       1 reclaim.go:50] There are <4> Jobs and <3> Queues in total for scheduling.
I0806 22:58:42.565359       1 job_info.go:561] job test-job-new/default actual: map[bash-new:1], ji.TaskMinAvailable: map[bash-new:1]
I0806 22:58:42.565364       1 reclaim.go:67] Added Queue <test1> for Job <default/test-job-new>
I0806 22:58:42.565367       1 job_info.go:561] job test-job2/default actual: map[bash:1], ji.TaskMinAvailable: map[bash:1]
I0806 22:58:42.565372       1 reclaim.go:67] Added Queue <test> for Job <default/test-job2>
I0806 22:58:42.565375       1 job_info.go:561] job test-job1/default actual: map[bash:1], ji.TaskMinAvailable: map[bash:1]
I0806 22:58:42.565379       1 job_info.go:561] job test-job/default actual: map[bash:1], ji.TaskMinAvailable: map[bash:1]
I0806 22:58:42.565393       1 reclaim.go:121] Considering Task <default/test-job-new-bash-new-0> on Node <docker-desktop>.
I0806 22:58:42.565429       1 proportion.go:234] Victims from proportion plugins are []
I0806 22:58:42.565434       1 gang.go:97] Can not preempt task <default/test-job1-bash-0> because job test-job1 ready num(1) <= MinAvailable(1) for gang-scheduling
I0806 22:58:42.565437       1 gang.go:97] Can not preempt task <default/test-job2-bash-0> because job test-job2 ready num(1) <= MinAvailable(1) for gang-scheduling
I0806 22:58:42.565460       1 gang.go:97] Can not preempt task <default/test-job-bash-0> because job test-job ready num(1) <= MinAvailable(1) for gang-scheduling
I0806 22:58:42.565463       1 gang.go:102] Victims from Gang plugins are []
I0806 22:58:42.565470       1 reclaim.go:145] No validated victims on Node <docker-desktop>: no victims
I0806 22:58:42.565479       1 reclaim.go:189] Leaving Reclaim ...
I0806 22:58:42.565582       1 cache.go:730] task unscheduleable default/test-job-new-bash-new-0, message: all nodes are unavailable: 1 node(s) resource fit failed., skip by no condition update
I0806 22:58:42.565653       1 session.go:187] Close Session d1cb963c-8e81-4e8c-b5e5-e920535ee55f
I0806 22:58:42.565661       1 scheduler.go:110] End scheduling ...

l will take a look for that, my intuition is that there are still some potential bugs in proportion plugin…

reclaim works when multiple conditions met the requirement: you can check it from AddReclaimableFn

  1. gang plugin: preemptable := job.MinAvailable == 0 || job.MinAvailable <= job.ReadyTaskNum()-1
  2. conformance-plugin: evictor can not reclaim pod in system namespace or with system-priority
  3. drf plugin:
  4. proportion : victimee’s queue derserve large than its allocated(that means it is overused)