argo-workflows: Workflow Failed: wait container is forcefully stopped
Checklist
- Double-checked my configuration.
- Tested using the latest version.
- Used the Emissary executor.
Summary
We have updated Argo Workflow to the version 3.3.8 and lately, we noticed some of the Workflows are failing due to the Pods being terminated.
When I analyse the Pod, I noticed the following:
- containerID: containerd://d10fcac735c905dc65717148e04437e32148bea560bb66e9ed35d718471f1bf8
image: argoproj/workflow-controller:v3.3.8-linux-amd64
imageID: argoproj/workflow-controller:v3.3.8-linux-amd64@sha256:1ecc5b305fb784271797c14436c30b4b6d23204d603d416fdb0382f62604325f
lastState: {}
name: wait
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: containerd://d10fcac735c905dc65717148e04437e32148bea560bb66e9ed35d718471f1bf8
exitCode: 2
finishedAt: "2022-08-10T07:06:08Z"
reason: Error
startedAt: "2022-08-10T07:06:07Z"
Also, when I check the logs of the aforementioned wait
container, I don’t see anything else than the following two lines:
fatal: bad g in signal handler
fatal: bad g in signal handler
It’s not a consistent error, and I have checked that it happens around 1/12 times, but it’s weird and I don’t see the way of fixing it.
What version are you running?
I’m running Argo Workflows 3.3.8, deployed by Helm in our cluster.
Diagnostics
Paste the smallest workflow that reproduces the bug. We must be able to run the workflow.
Any Workflow can reproduce this bug on our cluster at a point. Even the one in the examples.
# Logs from the workflow controller:
time="2022-08-10T08:51:07.741Z" level=info msg="Queueing Failed workflow metadata/metadata-process-6phwc for delete in 2m57s due to TTL"
# Logs from in your workflow's wait container, something like:
fatal: bad g in signal handler
fatal: bad g in signal handler
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 23 (13 by maintainers)
@isubasinghe it was a problem with the 3.3.x version.
I don’t know what changed, but once I upgraded to 3.4.x, the error stopped happening.
BTW, I’m closing this, as the upgrade to the latest version solved the weird issue.
no. I mean the :latest image tag