argo-workflows: containers in containersets not appropriately reporting status if terminated

Checklist

  • Double-checked my configuration.
  • Tested using the latest version.
  • Used the Emissary executor.

Summary

What happened/what you expected to happen? When terminating a workflow due to a deadline, I expect that the workflow is terminated, yet we are still waiting for containerset containers to be terminated, even though they are finished with error in k8s.

An image of the workflow after timeout termination: Screen Shot 2022-04-29 at 7 05 21 PM

An image of the containers for shard-13 in k8s: Screen Shot 2022-04-29 at 7 07 00 PM

What version are you running? 3.3.2

Reproducible Workflow

reproducible-workflow.txt

Logs from the workflow controller:

controller-logs.txt

The workflow’s pods that are problematic:

items: []
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Logs from in your workflow’s wait container:

wait-logs.txt


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 5
  • Comments: 21 (10 by maintainers)

Commits related to this issue

Most upvoted comments

@the1schwartz @alexec I can also still reproduce this in 3.4.0

Screen Shot 2022-09-20 at 1 16 55 PM

Screenshot 2022-09-20 at 10 11 47

I can still reproduce this in 3.3.9.

We have built argo locally off the master branch and ran the attached workflow, we are still seeing the same errors Screenshot 2022-05-03 at 10 27 45

I can double check but it won’t be until next week, sorry! I have attached the workflow that reproduces the error if you’d like to run it beforehand though.