kubedl: [BUG] the DAGScheduling and GangScheduling(volcano) conflict in mpijob
What happened: The mpijob worker pods are pending, and there is no launcher pod
mpi-demo-worker-0 0/1 Pending 0 13s
mpi-demo-worker-1 0/1 Pending 0 13s
The events of worker pod are as follows
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 65s volcano 3/2 tasks in gang unschedulable: pod group is not ready, 2 Pending, 3 minAvailable.
I think the core reason is the DAGScheduling and GangScheduling(volcano) conflict in mpijob.
I can fix this problem by adding this args in the kubedl deployment.
- --feature-gates
- DAGScheduling=false
What you expected to happen:
No pending
How to reproduce it: enable DAGScheduling and GangScheduling(volcano) to run a mpijob
Anything else we need to know?:
Environment:
- KubeDL version:
- Kubernetes version (use
kubectl version
): - OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
): - Install tools:
- Others:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (7 by maintainers)
@HeGaoYuan I post an issue and will refactor it soon https://github.com/kubedl-io/kubedl/issues/194