argo-workflows: Serialization issue in withItems leading to duplicate tasks
Summary
With argo version 3.2.6, when submitting workflows that use withItems
where the value has double quotes inside of single quotes (ie: value: '"foo"'
) then each task is doubled where the duplicate task contains a different quote-escape of the parameters.
Note that this single-double quoting is a result of hera-workflows sending json blobs back and forth.
Name: test9ckrq
Namespace: etl
ServiceAccount: default
Status: Running
Created: Wed Feb 16 19:15:48 +0100 (1 second ago)
Started: Wed Feb 16 19:15:48 +0100 (1 second ago)
Duration: 1 second
Progress: 0/1
STEP TEMPLATE PODNAME DURATION MESSAGE
● test9ckrq test-workflow
├─◷ test-task(0:a:\"test\") test-task test9ckrq-2427085523 1s
└─◷ test-task(0:a:"test") test-task test9ckrq-3392308087 1s
Interestingly, once the jobs complete, argo watch
doesn’t show the duplicate for successful tasks but if we look at the logs we can see both runs.
$ argo -n etl watch @latest
Name: test9ckrq
Namespace: etl
ServiceAccount: default
Status: Succeeded
Conditions:
PodRunning False
Completed True
Created: Wed Feb 16 19:15:48 +0100 (16 seconds ago)
Started: Wed Feb 16 19:15:48 +0100 (16 seconds ago)
Finished: Wed Feb 16 19:15:58 +0100 (6 seconds ago)
Duration: 10 seconds
Progress: 1/1
ResourcesDuration: 1s*(1 cpu),1s*(100Mi memory)
STEP TEMPLATE PODNAME DURATION MESSAGE
✔ test9ckrq test-workflow
└─✔ test-task(0:a:\"test\") test-task test9ckrq-2427085523 4s
$ argo -n etl logs @latest
test9ckrq-3392308087: test
test9ckrq-2427085523: test
Diagnostics
This is the simplest workflow that reproduces the issue. Simply changing a: '"test"'
to a: "test"
solves the issue of the duplicate tasks, however this isn’t a reasonable solution since it means we can’t send json in this field.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: test
spec:
entrypoint: test-workflow
templates:
- name: test-workflow
dag:
tasks:
- name: test-task
template: test-task
arguments:
parameters:
- name: a
value: '{{item.a}}'
withItems:
- a: '"test"'
- name: test-task
inputs:
parameters:
- name: a
value: '{{item.a}}'
script:
name: test-task
image: 'python:3.7'
command:
- python
source: |
import json
a = json.loads('{{inputs.parameters.a}}')
print(a)
Interestingly, the duplicate task doesn’t show up in the controller logs.
$ kubectl logs -n etl-argo-v3 deploy/argo-argo-workflows-workflow-controller | grep test9ckrq
time="2022-02-16T18:15:48.788Z" level=info msg="Processing workflow" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.793Z" level=info msg="Updated phase -> Running" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.793Z" level=info msg="DAG node test9ckrq initialized Running" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.793Z" level=info msg="TaskGroup node test9ckrq-4247033023 initialized Running (message: )" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.793Z" level=info msg="All of node test9ckrq.test-task(0:a:\\\"test\\\") dependencies [] completed" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.794Z" level=info msg="Pod node test9ckrq-2427085523 initialized Pending" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.820Z" level=info msg="Created pod: test9ckrq.test-task(0:a:\\\"test\\\") (test9ckrq-2427085523)" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.821Z" level=info msg="TaskSet Reconciliation" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.821Z" level=info msg=reconcileAgentPod namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.832Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"test9ckrq\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.832Z" level=info msg="Re-applying updates on latest version and retrying update" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.846Z" level=info msg="Update retry attempt 1 successful" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.846Z" level=info msg="Workflow update successful" namespace=etl-argo-v3 phase=Running resourceVersion=997292078 workflow=test9ckrq
time="2022-02-16T18:15:58.817Z" level=info msg="Processing workflow" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.817Z" level=info msg="Updating node test9ckrq-2427085523 exit code 0" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.818Z" level=info msg="Updating node test9ckrq-2427085523 status Pending -> Succeeded" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.818Z" level=info msg="node test9ckrq-4247033023 phase Running -> Succeeded" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.818Z" level=info msg="node test9ckrq-4247033023 finished: 2022-02-16 18:15:58.818693885 +0000 UTC" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.818Z" level=info msg="Outbound nodes of test9ckrq set to [test9ckrq-2427085523]" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.818Z" level=info msg="node test9ckrq phase Running -> Succeeded" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.818Z" level=info msg="node test9ckrq finished: 2022-02-16 18:15:58.81895713 +0000 UTC" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.819Z" level=info msg="Checking daemoned children of test9ckrq" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.819Z" level=info msg="TaskSet Reconciliation" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.819Z" level=info msg=reconcileAgentPod namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.819Z" level=info msg="Updated phase Running -> Succeeded" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.819Z" level=info msg="Marking workflow completed" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.819Z" level=info msg="Checking daemoned children of " namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.831Z" level=info msg="Workflow update successful" namespace=etl-argo-v3 phase=Succeeded resourceVersion=997292181 workflow=test9ckrq
time="2022-02-16T18:15:58.838Z" level=info msg="cleaning up pod" action=labelPodCompleted key=etl-argo-v3/test9ckrq-2427085523/labelPodCompleted
Some information about my argo install,
$ kubectl -n etl get configmap argo-argo-workflows-workflow-controller-configmap -o yaml
apiVersion: v1
data:
config: |
containerRuntimeExecutor: docker
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: argo
meta.helm.sh/release-namespace: etl
creationTimestamp: "2022-02-09T05:56:51Z"
labels:
app.kubernetes.io/component: workflow-controller
app.kubernetes.io/instance: argo
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: argo-workflows-cm
app.kubernetes.io/part-of: argo-workflows
helm.sh/chart: argo-workflows-0.10.0
name: argo-argo-workflows-workflow-controller-configmap
namespace: etl-argo-v3
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 2
- Comments: 21 (10 by maintainers)
Commits related to this issue
- Better workaround for argoproj/argo-workflows#7895 — committed to mynameisfiber/hera-workflows by mynameisfiber 2 years ago
I knew that code would cause problem. I’m not sure this is fixable due to inherent problems with the design of
Item
.Instead, I think we need #7801