argo-workflows: Exceeded Quota Causes Failed Workflows
Checklist:
- I’ve included the version.
- I’ve included reproduction steps.
- I’ve included the workflow YAML.
- I’ve included the logs.
What happened: Workflow failed due to exceeding CPU quota and also due to exceeding memory quota
What you expected to happen: Pod should stay in pending state until it is able to get the necessary resources.
How to reproduce it (as minimally and precisely as possible):
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: cpu-limit-
spec:
serviceAccountName: argo
entrypoint: wait
templates:
- name: wait
resubmitPendingPods: True
script:
image: alpine:latest
command: [sh, -c]
args: ["sleep 30s"]
resources:
requests:
cpu: 200m
limits:
cpu: 200m
for i in {1..20}
do
argo submit test-workflow.yaml
done
*replace cpu limit and request with memory to test memory
Anything else we need to know?:
Environment:
- Argo version:
$ argo version
argo: 2.8.2+8a151ae.dirty
BuildDate: 2020-06-18T23:50:58Z
GitCommit: 8a151aec6538c9442cf2380c2544ba3efb60ff60
GitTreeState: dirty
GitTag: 2.8.2
GoVersion: go1.13
Compiler: gc
Platform: linux/amd64
- Kubernetes version :
$ kubectl version -o yaml
clientVersion:
buildDate: 2020-01-29T21:26:39Z
compiler: gc
gitCommit: d4cacc0
gitTreeState: clean
gitVersion: v1.10.0+d4cacc0
goVersion: go1.14beta1
major: "1"
minor: 10+
platform: linux/amd64
serverVersion:
buildDate: 2020-05-04T12:54:43Z
compiler: gc
gitCommit: a3ec9df
gitTreeState: clean
gitVersion: v1.16.2
goVersion: go1.12.12
major: "1"
minor: 16+
platform: linux/amd64
Other debugging information (if applicable):
- workflow result:
$ argo --loglevel DEBUG get <workflowname>
DEBU[0000] CLI version version="{2.8.2+8a151ae.dirty 2020-06-18T23:50:58Z 8a151aec6538c9442cf2380c2544ba3efb60ff60 2.8.2 dirty go1.13 gc linux/amd64}"
DEBU[0000] Client options opts="{{ false false} 0x1574670 0xc000117900}"
Name: cpu-limit-r4jsz
Namespace: thoth-test-core
ServiceAccount: argo
Status: Error
Message: pods "cpu-limit-r4jsz" is forbidden: exceeded quota: thoth-test-core-quota, requested: limits.memory=3048Mi, used: limits.memory=30096Mi, limited: limits.memory=32Gi
Conditions:
Completed True
Created: Thu Jul 30 14:11:30 -0400 (11 minutes ago)
Started: Thu Jul 30 14:11:30 -0400 (11 minutes ago)
Finished: Thu Jul 30 14:11:31 -0400 (11 minutes ago)
Duration: 1 second
STEP TEMPLATE PODNAME DURATION MESSAGE
⚠ cpu-limit-r4jsz wait cpu-limit-r4jsz 0s pods "cpu-limit-r4jsz" is forbidden: exceeded quota: thoth-test-core-quota, requested: limits.memory=3048Mi, used: limits.memory=30096Mi, limited: limits.memory=32Gi
Related #3419 #3490
Message from the maintainers:
If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 17 (10 by maintainers)
I’ll take a look