volcano: PodGroup isn't triggering scaling up in Kubernetes, when using Cluster Autoscaler

What happened: PodGroup which isn’t fitting to the current resource capacity of Kubernetes won’t trigger scale-up for node pool, even if Cluster Autoscaler is enabled in Kubernetes. What you expected to happen: I except that Kubernetes Cluster Autoscaler will detect the increased workload, and then trigger the scaling up. How to reproduce it (as minimally and precisely as possible): Apply a PodGroup which isn’t fitting the current resource capacity of node pool.

Anything else we need to know?: Microsoft has decided to use Volcano in their Azure ML platform for scheduling training jobs in Kubernetes, so this will an issue with lot of people in the future. Environment:

  • Kubernetes version (use kubectl version): v1.22.15

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 6
  • Comments: 24 (10 by maintainers)

Most upvoted comments

Can the PR solve the current problem? #2602