argo-workflows: v2.11: workflow-controller-configmap parallelism not being honored.

Summary

I set the “across workflow parallelism” in workflow-controller-config to 5.

apiVersion: v1
data:
  config: |
    parallelism: 5
    nodeEvents:
      enabled: true
    workflowDefaults:
      spec:
        ttlStrategy:
          secondsAfterSuccess: 5
        parallelism: 5
kind: ConfigMap

Then submitted 10 workflows at once and all of them went into running state

abmallic@MININT-8H2H4GR:~/argo/argo_workflows$ kg workflow NAME STATUS AGE dag-diamon-coinflip-5c55t Running 7s dag-diamon-coinflip-fczh5 Running 13s dag-diamon-coinflip-hr4k7 Running 8s dag-diamon-coinflip-m46cv Running 11s dag-diamon-coinflip-mq4rl Running 10s dag-diamon-coinflip-rx2nv Running 12s dag-diamon-coinflip-vcx66 Running 12s dag-diamon-coinflip-wkf6r Running 6s dag-diamon-coinflip-xvjpj Running 6s dag-diamon-coinflip-zsmqs Running 9s

What happened/what you expected to happen? 5 of them should have gone into running state

Diagnostics

What version of Argo Workflows are you running? 2.11

kind: Workflow
metadata:
  creationTimestamp: "2020-09-21T08:28:24Z"
  generateName: dag-diamon-coinflip-
  generation: 4
  labels:
    workflows.argoproj.io/phase: Running
  managedFields:
  - apiVersion: argoproj.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:generateName: {}
      f:spec:
        .: {}
        f:arguments: {}
        f:entrypoint: {}
        f:podGC:
          .: {}
          f:strategy: {}
        f:templates: {}
      f:status:
        .: {}
        f:finishedAt: {}
    manager: argo
    operation: Update
    time: "2020-09-21T08:28:24Z"
  - apiVersion: argoproj.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .: {}
          f:workflows.argoproj.io/phase: {}
      f:spec:
        f:parallelism: {}
        f:ttlStrategy:
          .: {}
          f:secondsAfterSuccess: {}
      f:status:
        f:nodes:
          .: {}
          f:dag-diamon-coinflip-5c55t:
            .: {}
            f:children: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:name: {}
            f:phase: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:dag-diamon-coinflip-5c55t-758823825:
            .: {}
            f:boundaryID: {}
            f:children: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:name: {}
            f:phase: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:dag-diamon-coinflip-5c55t-2710044213:
            .: {}
            f:boundaryID: {}
            f:children: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:name: {}
            f:phase: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:dag-diamon-coinflip-5c55t-3805981268:
            .: {}
            f:boundaryID: {}
            f:displayName: {}
            f:finishedAt: {}
            f:hostNodeName: {}
            f:id: {}
            f:name: {}
            f:phase: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
        f:phase: {}
        f:startedAt: {}
    manager: workflow-controller
    operation: Update
    time: "2020-09-21T08:28:28Z"
  name: dag-diamon-coinflip-5c55t
  namespace: argo
  resourceVersion: "2958114"
  selfLink: /apis/argoproj.io/v1alpha1/namespaces/argo/workflows/dag-diamon-coinflip-5c55t
  uid: 12cb8123-a6be-4676-8902-b8b6b4fdc0cc
spec:
  arguments: {}
  entrypoint: diamond
  parallelism: 5
  podGC:
    strategy: OnWorkflowSuccess
  templates:
  - arguments: {}
    dag:
      tasks:
      - arguments: {}
        name: A
        template: coinflip
      - arguments: {}
        dependencies:
        - A
        name: B
        template: coinflip
      - arguments: {}
        dependencies:
        - A
        name: C
        template: coinflip
      - arguments: {}
        dependencies:
        - B
        - C
        name: D
        template: coinflip
    inputs: {}
    metadata: {}
    name: diamond
    nodeSelector:
      agentpool: argoworkload
    outputs: {}
  - arguments: {}
    inputs: {}
    metadata: {}
    name: coinflip
    nodeSelector:
      agentpool: argoworkload
    outputs: {}
    steps:
    - - arguments: {}
        name: flip-coin
        template: flip-coin
    - - arguments: {}
        name: heads
        template: found-heads
        when: '{{steps.flip-coin.outputs.result}} == heads'
      - arguments: {}
        name: tails
        template: coinflip
        when: '{{steps.flip-coin.outputs.result}} == tails'
  - arguments: {}
    inputs: {}
    metadata: {}
    name: flip-coin
    nodeSelector:
      agentpool: argoworkload
    outputs: {}
    script:
      command:
      - python
      image: python:alpine3.6
      name: ""
      resources: {}
      source: |
        import random
        import time
        time.sleep(60)
        result = "heads" if random.randint(0,1) == 0 else "tails"
        print(result)
  - arguments: {}
    container:
      args:
      - echo "it was heads"
      command:
      - sh
      - -c
      image: alpine:3.6
      name: ""
      resources: {}
    inputs: {}
    metadata: {}
    name: found-heads
    nodeSelector:
      agentpool: argoworkload
    outputs: {}
  ttlStrategy:
    secondsAfterSuccess: 5
status:
  finishedAt: null
  nodes:
    dag-diamon-coinflip-5c55t:
      children:
      - dag-diamon-coinflip-5c55t-758823825
      displayName: dag-diamon-coinflip-5c55t
      finishedAt: null
      id: dag-diamon-coinflip-5c55t
      name: dag-diamon-coinflip-5c55t
      phase: Running
      startedAt: "2020-09-21T08:28:24Z"
      templateName: diamond
      templateScope: local/dag-diamon-coinflip-5c55t
      type: DAG
    dag-diamon-coinflip-5c55t-758823825:
      boundaryID: dag-diamon-coinflip-5c55t
      children:
      - dag-diamon-coinflip-5c55t-2710044213
      displayName: A
      finishedAt: null
      id: dag-diamon-coinflip-5c55t-758823825
      name: dag-diamon-coinflip-5c55t.A
      phase: Running
      startedAt: "2020-09-21T08:28:24Z"
      templateName: coinflip
      templateScope: local/dag-diamon-coinflip-5c55t
      type: Steps
    dag-diamon-coinflip-5c55t-2710044213:
      boundaryID: dag-diamon-coinflip-5c55t-758823825
      children:
      - dag-diamon-coinflip-5c55t-3805981268
      displayName: '[0]'
      finishedAt: null
      id: dag-diamon-coinflip-5c55t-2710044213
      name: dag-diamon-coinflip-5c55t.A[0]
      phase: Running
      startedAt: "2020-09-21T08:28:24Z"
      templateName: coinflip
      templateScope: local/dag-diamon-coinflip-5c55t
      type: StepGroup
    dag-diamon-coinflip-5c55t-3805981268:
      boundaryID: dag-diamon-coinflip-5c55t-758823825
      displayName: flip-coin
      finishedAt: null
      hostNodeName: aks-argoworkload-29252569-vmss00001o
      id: dag-diamon-coinflip-5c55t-3805981268
      name: dag-diamon-coinflip-5c55t.A[0].flip-coin
      phase: Running
      startedAt: "2020-09-21T08:28:24Z"
      templateName: flip-coin
      templateScope: local/dag-diamon-coinflip-5c55t
      type: Pod
  phase: Running
  startedAt: "2020-09-21T08:28:24Z"


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 6
  • Comments: 25 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Seems that w/ this change you still need to restart the controller for changes to be picked up.

Correct. This config item is not used by many people, so I want to keep it simple to reduce bugs, and not having the ability to re-configure at runtime is part of that.

@alexec I have noticed that bug earlier also so as precautionary step I restarted my Argo workflow controller before starting my actual workload. I confirmed that controller is picking latest config-map with parallelism=100 by looking at controller logs of config is being used. Even after restart, controller is not respecting parallelism always.

Argo controller logs

level=info msg="Configuration:\nartifactRepository: {}\nmetricsConfig: {}\nnodeEvents:\n  enabled: false\nparallelism: 100\npodSpecLogStrategy: {}\nsso:\n  clientId:\n    key: \"\"\n  clientSecret:\n    key: \
"\"\n  issuer: \"\"\n  redirectUrl: \"\"\ntelemetryConfig: {}\nworkflowDefaults:\n  metadata:\n    creationTimestamp: null\n  spec:\n    arguments: {}\n    parallelism: 20\n    ttlStrategy:\n      secondsAfterSuccess: 3600\n  status:\n
  finishedAt: null\n    startedAt: null\n"

So I suspect that restart bug you are talking about is separate then what I am facing.