pipeline: Allow sidecars to terminate gracefully
Expected Behavior
A sidecar might need to complete any pending work before being torn down (eg, uploading metadata).
Actual Behavior
Sidecars are replaced with a nop image as soon as a TaskRun completes preventing any cleanup work.
Steps to Reproduce the Problem
- Create a TaskRun
- Using an AdmissionsController to inject a sidecar that streams output from the Task to another service.
We would expect all of the output to be uploaded as the task runs but whatever chunk isn’t uploaded before the sidecar is torn down is presumably lost.
Additional Info
There are ways around this but it seems to me that the very least, a sidecar should receive the kill signal so it can handle shutdown however it needs to. A timeout can always been inserted to ensure it doesn’t hang forever.
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 3
- Comments: 24 (6 by maintainers)
Hey, no apology necessary, it sometimes takes us a while to give new designs a proper look.
I am not opposed to the idea of introducing a mechanism to disable the “nop image swap” and let sidecars terminate themselves.
There is no workaround that I can think of which would work today without code changes. The reason this is so tricky is because of the requirement for the Sidecar to run to completion even if a Step failed. If this wasn’t a constraint then you could program your last Step to wait for a signal from the Sidecar before the Step and Task end. But unfortunately Steps after the failed one will simply not run so the Task will end immediately, killing the Sidecar no matter what.
Looking at the code as it currently stands
Sidecaris an alias ofStep, so we’d need to makeSidecarits own struct to add this new Sidecar-specific field. I like the idea that this is configured on a per-sidecar basis, at least in the initial version. Keeps the scope of change relatively small I think.So the best next step would be to write a short design doc and present it to the API Working Group to propose the idea. If you can’t make it to a WG feel free to write the doc and I’d be happy to communicate it out.
Currently, the sidecar is not behaving like a sidecar should. The sidecar should be able to choose when to kill itself. In my use case, I’m trying to process (conversion of format etc) and upload some test results even if the test step failed.
So, if there was no Nop image hack in place, my sidecar would just complete its task and the container status will be terminated. Isn’t it better to leave it up to the user on how and when he wants to terminate his sidecar?
If it’s absolutely necessary for Tekton to take this responsibility to kill the sidecar, can we at least have an option to disable it?
Like inside the sidecar definition, a flag like:
forceTermination: false?Which will disable the replacing mechanism with Nop image.
Some more insight why I think the newly introduced “Finally” wasn’t good enough for me to do the same: Finally is a separate task. We’re trying to avoid using PVC and moved to a single task design for our pipeline since PVC creation can take time. With Finally, if I want to share disk, I must use shared workspace.
Which forces us to modify certain dockerfiles
WORKDIRand evenCOPYcommands to make sure they are within the/workspace. It can also get messy since the workspace already have the original repo code cloned to it and now we’re copying something unrelated inside.Thanks!
Another alternative here would be to introduce an optional contract for sidecars: If you want to receive a SIGTERM / shutdown warning then your sidecar must contain a kill binary on
$PATH. If a sidecar provides this then Tekton can execute a similar container shutdown procedure to Kubernetes’ before switching out the sidecar container’s image tonop. Sidecars that don’t implement the contract will simply be stopped without warning.This would provide enough of a stopgap until the sidecars KEP lands I think.