argo-workflows: Step hangs forever w/ retryStrategy added to templates (possibly related to Lifecycle hooks)

Summary

I compose my DAG based on re-usable templates. I’m able to run the workflow successfully without retryStrategy on the templates, but adding the lines

        retryStrategy:
          retryPolicy: OnError
          limit: "3"

causes the DAG to spin forever.

See images Without retryStrategy image

With retryStrategy image

Diagnostics

Sample WF spec

          name: LinuxJobBase
          container:
            command: ["bash"]
            args:
            - -c
            - >-
                ls
          retryStrategy:
              limit: '3'
              retryPolicy: OnError
  - dag:
     name: DAG
      tasks:
       - name: Python2Compile
         template: LinuxJobBase
      .......

Note that there is also an exithandler added to each of the dag steps, via hooks

        hooks:
          exit:
            arguments:
              parameters:
              - name: POD_NAME
                value: '{{tasks.Python3UnitTests.outputs.parameters.podname}}'
              - name: NODE_NAME
                value: '{{tasks.Python3UnitTests.outputs.parameters.nodename}}'
              - name: POD_NAMESPACE
                value: '{{tasks.Python3UnitTests.outputs.parameters.podnamespace}}'
              - name: POD_UID
                value: '{{tasks.Python3UnitTests.outputs.parameters.poduid}}'
              - name: IMAGE
                value: '{{tasks.Python3UnitTests.outputs.parameters.image}}'
            template: LinuxExitHandler

What Kubernetes provider are you using? Bare Metal What version of Argo Workflows are you running? 3.1.0 What executor are you running? Docker/K8SAPI/Kubelet/PNS/Emissary Emissary


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 2
  • Comments: 17 (17 by maintainers)

Most upvoted comments

@ad22 Thanks for the info and sorry for the delay… I’ll get a chance to look at this soon