argo-workflows: Multi-step workflow does not terminate (wait container does not exist with Docker executor in v3.0)
Summary
Executing a multi-step workflow does not terminate or proceed to the next step even after the pod has terminated.
The issue started at https://github.com/hyfen-nl/PIVT/issues/106, where the developer identified this as a potential Argo issue. The logs below are based on the example at https://argoproj.github.io/argo-workflows/examples/#steps.
Diagnostics
What Kubernetes provider are you using? Digital Ocean
What version of Argo Workflows are you running? v3.0.7
Paste a workflow that reproduces the bug, including status:
kubectl get wf -o yaml ${workflow}
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
creationTimestamp: "2021-06-02T07:19:32Z"
generateName: steps-
generation: 3
labels:
workflows.argoproj.io/phase: Running
name: steps-5xt4d
namespace: default
resourceVersion: "3619084"
uid: bc955ea0-7a0c-4a36-8a37-d3fb10e27615
spec:
arguments: {}
entrypoint: hello-hello-hello
templates:
- inputs: {}
metadata: {}
name: hello-hello-hello
outputs: {}
steps:
- - arguments:
parameters:
- name: message
value: hello1
name: hello1
template: whalesay
- - arguments:
parameters:
- name: message
value: hello2a
name: hello2a
template: whalesay
- arguments:
parameters:
- name: message
value: hello2b
name: hello2b
template: whalesay
- container:
args:
- '{{inputs.parameters.message}}'
command:
- cowsay
image: docker/whalesay
name: ""
resources: {}
inputs:
parameters:
- name: message
metadata: {}
name: whalesay
outputs: {}
status:
artifactRepositoryRef:
default: true
conditions:
- status: "True"
type: PodRunning
finishedAt: null
nodes:
steps-5xt4d:
children:
- steps-5xt4d-3743377224
displayName: steps-5xt4d
finishedAt: null
id: steps-5xt4d
name: steps-5xt4d
phase: Running
progress: 0/1
startedAt: "2021-06-02T07:19:32Z"
templateName: hello-hello-hello
templateScope: local/steps-5xt4d
type: Steps
steps-5xt4d-293443185:
boundaryID: steps-5xt4d
displayName: hello1
finishedAt: null
hostNodeName: hlf-pool1-8rnem
id: steps-5xt4d-293443185
inputs:
parameters:
- name: message
value: hello1
name: steps-5xt4d[0].hello1
phase: Running
progress: 0/1
startedAt: "2021-06-02T07:19:32Z"
templateName: whalesay
templateScope: local/steps-5xt4d
type: Pod
steps-5xt4d-3743377224:
boundaryID: steps-5xt4d
children:
- steps-5xt4d-293443185
displayName: '[0]'
finishedAt: null
id: steps-5xt4d-3743377224
name: steps-5xt4d[0]
phase: Running
progress: 0/1
startedAt: "2021-06-02T07:19:32Z"
templateScope: local/steps-5xt4d
type: StepGroup
phase: Running
progress: 0/1
startedAt: "2021-06-02T07:19:32Z"
Paste the logs from the workflow controller:
kubectl logs -n argo deploy/workflow-controller | grep ${workflow}
time="2021-06-02T07:19:32.754Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.769Z" level=info msg="Updated phase -> Running" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.780Z" level=info msg="Steps node steps-5xt4d initialized Running" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.786Z" level=info msg="StepGroup node steps-5xt4d-3743377224 initialized Running" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.790Z" level=info msg="Pod node steps-5xt4d-293443185 initialized Pending" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.808Z" level=info msg="Created pod: steps-5xt4d[0].hello1 (steps-5xt4d-293443185)" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.808Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.852Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=3619053 workflow=steps-5xt4d
time="2021-06-02T07:19:42.869Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:42.874Z" level=info msg="Updating node steps-5xt4d-293443185 status Pending -> Running" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:42.882Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:42.930Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=3619084 workflow=steps-5xt4d
time="2021-06-02T07:19:52.913Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:52.916Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:39:52.914Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:39:52.915Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:59:52.914Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:59:52.915Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T08:19:52.919Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T08:19:52.919Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T08:39:52.919Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T08:39:52.920Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T08:59:52.920Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T08:59:52.921Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T09:19:52.920Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T09:19:52.921Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
Paste the logs from your workflow's wait container:
kubectl logs -c wait -l workflows.argoproj.io/workflow=${workflow}
time="2021-06-02T09:24:46.518Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:47.681Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:48.847Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:50.027Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:51.191Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:52.360Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:53.565Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:54.767Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:55.891Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:56.933Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 4
- Comments: 22 (11 by maintainers)
Commits related to this issue
- fix(executor): Fix docker not terminating. Fixes #6064 Signed-off-by: Alex Collins <alex_collins@intuit.com> — committed to argoproj/argo-workflows by alexec 3 years ago
- fix(executor): Fix docker not terminating. Fixes #6064 (#6083) — committed to argoproj/argo-workflows by alexec 3 years ago
- fix(executor): Fix docker not terminating. Fixes #6064 (#6083) Signed-off-by: Alex Collins <alex_collins@intuit.com> — committed to argoproj/argo-workflows by alexec 3 years ago
- fix(executor): Fix docker not terminating. Fixes #6064 (#6083) — committed to argoproj/argo-workflows by alexec 3 years ago
I am using 3.2.6 version and I get the same behavior. I opened a discussion on this issue --> https://github.com/argoproj/argo-workflows/discussions/7480
Maybe it’s because it’s simply executing a hello-world statement. The reason I’m using this to test is because it’s much simpler than the original workflow I was using. In that workflow I was encountering the same issue, in which the workflow would stop at a step and not proceed to the next.