argo-workflows: Successfully finished pods show stuck in pending phase indefinitely in GKE + Kubernetes v1.20

Summary

We were running reasonably complex Argo workflows without issues for a long time. However around the time we updated Kubernetes version to 1.19.10-gke.1000 (Running ARGO in GKE) we started experiencing frequent problems with workflow getting stuck because Pod that was successfully started by Argo and finished is shown stuck in the pending state in Argo. (Even though we can see from logs that the main container successfully finished) We have tried using PNS executor and k8sapi executors but that did not fix the issue. We have removed Argo namespace and recreated it again and the issue is still happening. We updated from Argo 2.x to 3.0.8 to 3.1.1 and to stable 3.0.3 and it still occurred. Currently we are at latest tag (argoproj/workflow-controller:v3.0.3)

Diagnostics

What Kubernetes provider are you using? We are using GKE with Kubernetes version 1.19.10-gke.1000

What version of Argo Workflows are you running? Tested on multiple versions. Currently runing 3.0.3 but it started in 2.x version and happened in 3.1.1 and 3.0.8

What executor are you running? Docker/K8SAPI/Kubelet/PNS/Emissary Failed with default(I guess docker) K8SAPI and PNS

Did this work in a previous version? I.e. is it a regression? We are not sure if that was the cause but if worked without issues 2.6 on GKE before updating kubernetes to 1.19

# Simplified failing part of workflow. 
 Parents-parent
    our-workflow-1625018400-118234800:
      boundaryID: our-workflow-1625018400-2940179300
      children:
        - our-workflow-1625018400-1820417972
        - our-workflow-1625018400-1463043834
      displayName: '[2]'
      finishedAt: null
      id: our-workflow-1625018400-118234800
      name: our-workflow-1625018400[2].fill-supervised-product-clusters[0].fill-supervised-product-cluster(19:configs:["things"],products:["params"],supervised-configs:[])[2]
      phase: Running
      progress: 0/2
      startedAt: "2021-06-30T09:34:44Z"
      templateScope: local/our-workflow-1625018400
      type: StepGroup   
  Parent stuck running
    our-workflow-1625018400-1820417972:
      boundaryID: our-workflow-1625018400-2940179300
      children:
      - our-workflow-1625018400-655392735
      displayName: fill-products(0:params)
      finishedAt: null
      id: our-workflow-1625018400-1820417972
      inputs:
        parameters:
        - name: products
          value: send_val
      name: our-workflow-1625018400[2].fill-supervised-product-clusters[0].fill-supervised-product-cluster(19:configs:["things"],products:["params"],supervised-configs:[])[2].fill-products(0:params)
      phase: Running
      progress: 0/1
      startedAt: "2021-06-30T09:34:44Z"
      templateName: fill-products
      templateScope: local/our-workflow-1625018400
      type: Retry
  Finished pod stuck pending
    our-workflow-1625018400-655392735:
      boundaryID: our-workflow-1625018400-2940179300
      displayName: fill-products(0:params)(0)
      finishedAt: null
      id: our-workflow-1625018400-655392735
      inputs:
        parameters:
          - name: products
            value: send_val
      name: our-workflow-1625018400[2].fill-supervised-product-clusters[0].fill-supervised-product-cluster(19:configs:["things"],products:["params"],supervised-configs:[])[2].fill-products(0:params)(0)
      phase: Pending
      progress: 0/1
      startedAt: "2021-06-30T09:34:44Z"
      templateName: fill-products
      templateScope: local/our-workflow-1625018400
      type: Pod

# Logs from the workflow controller:
# Workflow container restarts occasionally failing logs:
#  Edit: NOTE: Pods get stuck even if workflow controller doesn't restart.
time="2021-06-30T08:03:26.894Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=349445621 workflow=workflow-working-fine-1625040000
time="2021-06-30T08:03:26.898Z" level=info msg="SG Outbound nodes of affected-workflow-1625018400-3050482061 are [affected-workflow-1625018400-3894231772]" namespace=default workflow=affected-workflow-1625018400
time="2021-06-30T08:03:26.898Z" level=info msg="SG Outbound nodes of affected-workflow-1625018400-483040858 are [affected-workflow-1625018400-3675982209]" namespace=default workflow=affected-workflow-1625018400
time="2021-06-30T08:03:26.899Z" level=info msg="SG Outbound nodes of affected-workflow-1625018400-744350190 are [affected-workflow-1625018400-952860445]" namespace=default workflow=affected-workflow-1625018400
E0630 08:03:27.386046       1 leaderelection.go:361] Failed to update lock: Put "https://10.47.240.1:443/apis/coordination.k8s.io/v1/namespaces/argo/leases/workflow-controller": context deadline exceeded
I0630 08:03:27.681192       1 leaderelection.go:278] failed to renew lease argo/workflow-controller: timed out waiting for the condition
time="2021-06-30T08:03:28.071Z" level=info msg="Update leases 409"
time="2021-06-30T08:03:29.474Z" level=info msg="Create events 201"
time="2021-06-30T08:03:31.003Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/workflow-working-fine-1625040000-957740190/labelPodCompleted
E0630 08:03:31.031922       1 leaderelection.go:301] Failed to release lock: Operation cannot be fulfilled on leases.coordination.k8s.io "workflow-controller": the object has been modified; please apply your changes to the latest version and try again
time="2021-06-30T08:03:31.477Z" level=info msg="stopped leading" id=workflow-controller-7c568955d7-bxtdj
E0630 08:03:31.640872       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 78 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1b6eb00, 0x2c9aa80)
	/go/pkg/mod/k8s.io/apimachinery@v0.19.6/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/pkg/mod/k8s.io/apimachinery@v0.19.6/pkg/util/runtime/runtime.go:48 +0x89
panic(0x1b6eb00, 0x2c9aa80)
	/usr/local/go/src/runtime/panic.go:969 +0x1b9
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run.func2()
	/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:256 +0x7c
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run.func1(0xc0006547e0)
	/go/pkg/mod/k8s.io/client-go@v0.19.6/tools/leaderelection/leaderelection.go:200 +0x29
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc0006547e0, 0x206c3e0, 0xc0002a48c0)
	/go/pkg/mod/k8s.io/client-go@v0.19.6/tools/leaderelection/leaderelection.go:210 +0x15d
k8s.io/client-go/tools/leaderelection.RunOrDie(0x206c3e0, 0xc000522d40, 0x2078780, 0xc00085e000, 0x37e11d600, 0x2540be400, 0x12a05f200, 0xc000cb9c00, 0xc000a6a700, 0xc001096060, ...)
	/go/pkg/mod/k8s.io/client-go@v0.19.6/tools/leaderelection/leaderelection.go:222 +0x9c
created by github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run
	/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:241 +0xfb8
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x19cf4fc]

goroutine 78 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/pkg/mod/k8s.io/apimachinery@v0.19.6/pkg/util/runtime/runtime.go:55 +0x10c
panic(0x1b6eb00, 0x2c9aa80)
	/usr/local/go/src/runtime/panic.go:969 +0x1b9
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run.func2()
	/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:256 +0x7c
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run.func1(0xc0006547e0)
	/go/pkg/mod/k8s.io/client-go@v0.19.6/tools/leaderelection/leaderelection.go:200 +0x29
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc0006547e0, 0x206c3e0, 0xc0002a48c0)
	/go/pkg/mod/k8s.io/client-go@v0.19.6/tools/leaderelection/leaderelection.go:210 +0x15d
k8s.io/client-go/tools/leaderelection.RunOrDie(0x206c3e0, 0xc000522d40, 0x2078780, 0xc00085e000, 0x37e11d600, 0x2540be400, 0x12a05f200, 0xc000cb9c00, 0xc000a6a700, 0xc001096060, ...)
	/go/pkg/mod/k8s.io/client-go@v0.19.6/tools/leaderelection/leaderelection.go:222 +0x9c
created by github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run
	/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:241 +0xfb8

# Other than that is acting like this: 
time="2021-06-30T11:59:37.960Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.960Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.961Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.962Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.963Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.963Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.964Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.965Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.965Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.966Z" level=info msg="Patch events 200"
time="2021-06-30T11:59:37.966Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.968Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.968Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.969Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.970Z" level=info msg="template (node failing-workflow-1625018400-2046183375) active children parallelism reached 3/3" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.970Z" level=info msg="Workflow step group node failing-workflow-1625018400-1253377659 not yet completed" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.970Z" level=info msg="Workflow step group node failing-workflow-1625018400-1369709231 not yet completed" namespace=default workflow=failing-workflow-1625018400
time="2021-06-30T11:59:37.972Z" level=info msg="Patch events 200"
time="2021-06-30T11:59:37.977Z" level=info msg="Patch events 200"
time="2021-06-30T11:59:39.413Z" level=info msg="Get leases 200"
time="2021-06-30T11:59:39.420Z" level=info msg="Update leases 200"
time="2021-06-30T11:59:43.564Z" level=info msg="Enforcing history limit

# Pending pod main logs:
Logs everything as expected and finishes. 

# Logs from pending workflow's wait container, something like:
time="2021-06-30T09:34:45.973Z" level=info msg="Starting Workflow Executor" version="{v3.0.3 2021-05-11T21:14:20Z 02071057c082cf295ab8da68f1b2027ff8762b5a v3.0.3 clean go1.
15.7 gc linux/amd64}"
time="2021-06-30T09:34:45.979Z" level=info msg="Creating a docker executor"
time="2021-06-30T09:34:45.979Z" level=info msg="Executor (version: v3.0.3, build_date: 2021-05-11T21:14:20Z) initialized (pod: default/failing-workflow-workfl
ow-1625018400-531201185) with template:\n{\"name\":\"fill-products\",\"inputs\":{\"parameters\":[{\"name\":\"products\",\"value\":\"products\"}]},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"i
mage\":\"us.gcr.io/spaceknow-backend/failing-workflow-workflow-base-delivery:287\",\"command\":[\"python3\"],\"args\":[\"deliveries/datacube-fill-products.py\
",\"--products\",\"products\"],\"envF
rom\":[{\"secretRef\":{\"name\":\"failing-workflow-workflow\"}}],\"resources\":{\"limits\":{\"memory\":\"2Gi\"},\"requests\":{\"cpu\":\"2\",\"memory\":\"2Gi\"
}}},\"retryStrategy\":{\"limit\":2,\"retryPolicy\":\"Always\"}}"
time="2021-06-30T09:34:45.979Z" level=info msg="Starting annotations monitor"
time="2021-06-30T09:34:45.979Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=
label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=failing-workflow-workflow-1625018400-531201185"
time="2021-06-30T09:34:45.980Z" level=info msg="Starting deadline monitor"
time="2021-06-30T09:34:46.039Z" level=info msg="mapped container name \"main\" to container ID \"bcd7bcab1bd8ab36553933c1b41cb81deacfbab26a9a440f36360aecef06af6f\" (created
 at 2021-06-30 09:34:45 +0000 UTC, status Created)"
time="2021-06-30T09:34:46.039Z" level=info msg="mapped container name \"wait\" to container ID \"34831534a3aefb25a5744dfd102c8060dc2369767cab729b343f3b18d375828e\" (created
 at 2021-06-30 09:34:45 +0000 UTC, status Up)"
time="2021-06-30T09:34:46.980Z" level=info msg="docker wait bcd7bcab1bd8ab36553933c1b41cb81deacfbab26a9a440f36360aecef06af6f"
time="2021-06-30T09:35:15.026Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=
label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=failing-workflow-workflow-1625018400-531201185"
time="2021-06-30T09:35:16.066Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=
label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=failing-workflow-workflow-1625018400-531201185"
time="2021-06-30T09:35:26.101Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=
label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=failing-workflow-workflow-1625018400-531201185"
time="2021-06-30T09:35:36.135Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=
label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=failing-workflow-workflow-1625018400-531201185"
time="2021-06-30T09:35:46.168Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=
label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=failing-workflow-workflow-1625018400-531201185"
time="2021-06-30T09:35:56.200Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=
label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=failing-workflow-workflow-1625018400-531201185"
time="2021-06-30T09:36:06.234Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=
label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=failing-workflow-workflow-1625018400-531201185"
time="2021-06-30T09:36:16.266Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=
label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=failing-workflow-workflow-1625018400-531201185"
time="2021-06-30T09:36:26.299Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=
label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=failing-workflow-workflow-1625018400-531201185"
time="2021-06-30T09:36:36.333Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=
label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=failing-workflow-workflow-1625018400-531201185"
time="2021-06-30T09:36:46.372Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=
label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=failing-workflow-workflow-1625018400-531201185"
time="2021-06-30T09:36:56.406Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=
label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=failing-workflow-workflow-1625018400-531201185"
time="2021-06-30T09:37:04.406Z" level=info msg="Main container completed"
time="2021-06-30T09:37:04.406Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2021-06-30T09:37:04.406Z" level=info msg="Capturing script exit code"
time="2021-06-30T09:37:04.438Z" level=info msg="No output parameters"
time="2021-06-30T09:37:04.438Z" level=info msg="No output artifacts"
time="2021-06-30T09:37:04.438Z" level=info msg="Annotating pod with output"
time="2021-06-30T09:37:04.454Z" level=info msg="Patch pods 200"
time="2021-06-30T09:37:04.464Z" level=info msg="Killing sidecars []"
time="2021-06-30T09:37:04.464Z" level=info msg="Alloc=5435 TotalAlloc=11825 Sys=73041 NumGC=4 Goroutines=10"

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 85
Comments: 55 (32 by maintainers)

Most upvoted comments

Hello,

We are on aks 1.20.5 with containerd and pns as executor with argo 3.1 and we experience the same problem Updating the workflow’s status via kubectl also timeout with the same message We did hotfix it by splitting the workflow by 2 and having a step submitting both workflows

Pale-whale on Jun 30, 2021

I have updated K8S 1.19 -> 1.20 (Fix doesn’t work in 1.19 yet) I have been able to successfully test atomic map-type fix! It seems to work well with test example and Argo workflow successfully finishes with 1000 nodes. I will have to do the further testing in our production run but I believe this will fix the issue. It might take a while until proper fix propagates to the k8s versions in cloud providers but this atomic workaround seems to be working as expected.

For anyone experiencing this issue you can try simply edit existing Argo CustomResourceDefinition and adding x-kubernetes-map-type: atomic to spec and status in openAPIV3Schema properties.

KUBE_EDITOR="nano" kubectl edit crd workflows.argoproj.io -n argo

@alexec

klaucos on Jul 15, 2021

We have a very similar situation that we also started encountering when we went from Kubernetes 1.19 to Kubernetes 1.20.

After an upgrade to Kubernetes 1.20, we have hit upon a issue, where argo workflows with argo nodes fanout of more than about 100 were not getting their status updated properly. For example multiple argo nodes would have the erroneous status of ‘Pending’ in UI or cli while they were actively working and logging content and therefore should have been in a ‘Running’ state. And upon completion again the nodes, did not register the correct state ‘Completed’ either from the UI or from the argocli.

We saw this error in the logs for the workflow controller see below: time="2021-06-23T14:18:15.798Z" level=warning msg="Error updating workflow: Timeout: request did not complete within requested timeout 34s Timeout" namespace=my-namespace workflow=my-workflow-v4f8t

The large workflows steps require a large amount of information to be sent to the Kubernetes API server to be stored in it’s datastore etcd. The Kubernetes end point would error on this transaction and fail to record the node statuses and the argo workflow controller would then not have the right information on subsequent requests. WIthout proper argo node status information the argo workflow controller does not advance the argo workflow to completion.

A solution has been found, at least temporarily. The solution requires that the argo workflow controller be started with the env var ALWAYS_OFFLOAD_NODE_STATUS set to true. This allows the node status information to be stored by the argo workflow controller directly in a postgres database instead of in etcd via the Kubernetes API.

However this change does require that the argocli now directly connect to the argo server to get information about workflows in the cluster. But even with a port forward to the argo server and setting the following env vars:

export ARGO_SERVER=localhost:2746
export ARGO_TOKEN=$(argo auth token)
export ARGO_SECURE=false

we are getting inconsistent results on the workflow status from the UI and argocli. The argo cli returns the following error on an argo get:

✗ argo get my-workflow-wt967 -n my-namespace FATA[0001] rpc error: code = Unknown desc = upper: no more rows in this result set

The workflows do however complete now though, but we can’t view the workflows in the UI or get information on the nodes using the cli.

Aside: We suspect also the activation of flowschema in Kubernetes 1.20 as another contributor to this issue of the Kube API erroring - https://kubernetes.io/docs/concepts/cluster-administration/flow-control/ - since now the 'API Priority and Fairness ’ feature is always on. Because we are on GKE we cannot turn that feature back off. And creating a flowschema that allows the argo service account to be exempt from throttling seemed to have some positive impact (less Kube API server errors with large workflows) but did not fully resolve the issue.

UPDATE: More information on inconsistent error FATA[0001] rpc error: code = Unknown desc = upper: no more rows in this result set

This seems to only happen while the workflow is not complete. Once complete we are able to view the workflow in the UI and with run argo get against the workflow using argo cli (with appropriate env vars set and port forwarding).

bluelion on Jun 30, 2021

yes. Same same.

alexec on Jul 15, 2021

I’d expect we’d back-port the work-around to v3.1.3.

alexec on Jul 15, 2021

You’ll also need to update your argoexec image. That will fix the error.

alexec on Jul 9, 2021