argo-workflows: Serialization issue in withItems leading to duplicate tasks

Summary

With argo version 3.2.6, when submitting workflows that use withItems where the value has double quotes inside of single quotes (ie: value: '"foo"') then each task is doubled where the duplicate task contains a different quote-escape of the parameters.

Note that this single-double quoting is a result of hera-workflows sending json blobs back and forth.

Name:                test9ckrq
Namespace:           etl
ServiceAccount:      default
Status:              Running
Created:             Wed Feb 16 19:15:48 +0100 (1 second ago)
Started:             Wed Feb 16 19:15:48 +0100 (1 second ago)
Duration:            1 second
Progress:            0/1

STEP                          TEMPLATE       PODNAME               DURATION  MESSAGE
 ● test9ckrq                  test-workflow                                    
 ├─◷ test-task(0:a:\"test\")  test-task      test9ckrq-2427085523  1s          
 └─◷ test-task(0:a:"test")    test-task      test9ckrq-3392308087  1s          

Interestingly, once the jobs complete, argo watch doesn’t show the duplicate for successful tasks but if we look at the logs we can see both runs.

$ argo -n etl watch @latest
Name:                test9ckrq
Namespace:           etl
ServiceAccount:      default
Status:              Succeeded
Conditions:          
 PodRunning          False
 Completed           True
Created:             Wed Feb 16 19:15:48 +0100 (16 seconds ago)
Started:             Wed Feb 16 19:15:48 +0100 (16 seconds ago)
Finished:            Wed Feb 16 19:15:58 +0100 (6 seconds ago)
Duration:            10 seconds
Progress:            1/1
ResourcesDuration:   1s*(1 cpu),1s*(100Mi memory)

STEP                          TEMPLATE       PODNAME               DURATION  MESSAGE
 ✔ test9ckrq                  test-workflow                                    
 └─✔ test-task(0:a:\"test\")  test-task      test9ckrq-2427085523  4s          
$ argo -n etl logs @latest
test9ckrq-3392308087: test
test9ckrq-2427085523: test

Diagnostics

This is the simplest workflow that reproduces the issue. Simply changing a: '"test"' to a: "test" solves the issue of the duplicate tasks, however this isn’t a reasonable solution since it means we can’t send json in this field.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: test
spec:
  entrypoint: test-workflow
  templates:
    - name: test-workflow
      dag:
        tasks:
          - name: test-task
            template: test-task
            arguments:
              parameters:
                - name: a
                  value: '{{item.a}}'
            withItems:
              - a: '"test"'
    - name: test-task
      inputs:
        parameters:
          - name: a
            value: '{{item.a}}'
      script:
        name: test-task
        image: 'python:3.7'
        command:
          - python
        source: |
          import json
          a = json.loads('{{inputs.parameters.a}}')
    
          print(a)

Interestingly, the duplicate task doesn’t show up in the controller logs.

$ kubectl logs -n etl-argo-v3 deploy/argo-argo-workflows-workflow-controller | grep test9ckrq
time="2022-02-16T18:15:48.788Z" level=info msg="Processing workflow" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.793Z" level=info msg="Updated phase  -> Running" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.793Z" level=info msg="DAG node test9ckrq initialized Running" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.793Z" level=info msg="TaskGroup node test9ckrq-4247033023 initialized Running (message: )" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.793Z" level=info msg="All of node test9ckrq.test-task(0:a:\\\"test\\\") dependencies [] completed" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.794Z" level=info msg="Pod node test9ckrq-2427085523 initialized Pending" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.820Z" level=info msg="Created pod: test9ckrq.test-task(0:a:\\\"test\\\") (test9ckrq-2427085523)" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.821Z" level=info msg="TaskSet Reconciliation" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.821Z" level=info msg=reconcileAgentPod namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.832Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"test9ckrq\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.832Z" level=info msg="Re-applying updates on latest version and retrying update" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.846Z" level=info msg="Update retry attempt 1 successful" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:48.846Z" level=info msg="Workflow update successful" namespace=etl-argo-v3 phase=Running resourceVersion=997292078 workflow=test9ckrq
time="2022-02-16T18:15:58.817Z" level=info msg="Processing workflow" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.817Z" level=info msg="Updating node test9ckrq-2427085523 exit code 0" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.818Z" level=info msg="Updating node test9ckrq-2427085523 status Pending -> Succeeded" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.818Z" level=info msg="node test9ckrq-4247033023 phase Running -> Succeeded" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.818Z" level=info msg="node test9ckrq-4247033023 finished: 2022-02-16 18:15:58.818693885 +0000 UTC" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.818Z" level=info msg="Outbound nodes of test9ckrq set to [test9ckrq-2427085523]" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.818Z" level=info msg="node test9ckrq phase Running -> Succeeded" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.818Z" level=info msg="node test9ckrq finished: 2022-02-16 18:15:58.81895713 +0000 UTC" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.819Z" level=info msg="Checking daemoned children of test9ckrq" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.819Z" level=info msg="TaskSet Reconciliation" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.819Z" level=info msg=reconcileAgentPod namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.819Z" level=info msg="Updated phase Running -> Succeeded" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.819Z" level=info msg="Marking workflow completed" namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.819Z" level=info msg="Checking daemoned children of " namespace=etl-argo-v3 workflow=test9ckrq
time="2022-02-16T18:15:58.831Z" level=info msg="Workflow update successful" namespace=etl-argo-v3 phase=Succeeded resourceVersion=997292181 workflow=test9ckrq
time="2022-02-16T18:15:58.838Z" level=info msg="cleaning up pod" action=labelPodCompleted key=etl-argo-v3/test9ckrq-2427085523/labelPodCompleted

Some information about my argo install,

$ kubectl -n etl get configmap argo-argo-workflows-workflow-controller-configmap -o yaml
apiVersion: v1
data:
  config: |
    containerRuntimeExecutor: docker
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: argo
    meta.helm.sh/release-namespace: etl
  creationTimestamp: "2022-02-09T05:56:51Z"
  labels:
    app.kubernetes.io/component: workflow-controller
    app.kubernetes.io/instance: argo
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: argo-workflows-cm
    app.kubernetes.io/part-of: argo-workflows
    helm.sh/chart: argo-workflows-0.10.0
  name: argo-argo-workflows-workflow-controller-configmap
  namespace: etl-argo-v3

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 21 (10 by maintainers)

Commits related to this issue

Most upvoted comments

I knew that code would cause problem. I’m not sure this is fixable due to inherent problems with the design of Item.

Instead, I think we need #7801