argo-workflows: containers in containersets not appropriately reporting status if terminated
Checklist
- Double-checked my configuration.
- Tested using the latest version.
- Used the Emissary executor.
Summary
What happened/what you expected to happen? When terminating a workflow due to a deadline, I expect that the workflow is terminated, yet we are still waiting for containerset containers to be terminated, even though they are finished with error in k8s.
An image of the workflow after timeout termination:
An image of the containers for shard-13 in k8s:
What version are you running? 3.3.2
Reproducible Workflow
Logs from the workflow controller:
The workflow’s pods that are problematic:
items: []
kind: List
metadata:
resourceVersion: ""
selfLink: ""
Logs from in your workflow’s wait container:
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 5
- Comments: 21 (10 by maintainers)
Commits related to this issue
- fix: Terminate rather than delete deadlined pods. Fixes #8545 Signed-off-by: Alex Collins <alex_collins@intuit.com> — committed to alexec/argo-workflows by alexec 2 years ago
- fix: Terminate, rather than delete, deadlined pods. Fixes #8545 (#8620) Signed-off-by: Alex Collins <alex_collins@intuit.com> — committed to argoproj/argo-workflows by alexec 2 years ago
@the1schwartz @alexec I can also still reproduce this in 3.4.0
I can still reproduce this in 3.3.9.
We have built argo locally off the master branch and ran the attached workflow, we are still seeing the same errors
I can double check but it won’t be until next week, sorry! I have attached the workflow that reproduces the error if you’d like to run it beforehand though.