kubernetes: Terminate a Job from an external controller

What would you like to be added?

/sig apps /wg batch /area batch

A mechanism to terminate a Job from an external controller, ensuring all the running pods are deleted.

Potentially, we could simply use a terminal Job condition as a signal. This could be implemented by enhancing the orphan pod worker to issue pod Deletions when the owner Job is marked as finished. This behavior would be in addition to removing the finalizers when the pods terminate.

Why is this needed?

Today, simply setting a terminal Job condition (Completed, Failed) might leave Pods running indefinitely (or until ttlSecondsAfterFinished is satisfied, if defined), if the controller doesn’t have a chance to issue Pod deletions first.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 22 (19 by maintainers)

Most upvoted comments

Thinking about this more, maybe it’s simpler than I think. Kueue can use its own finalizer, that Job isn’t aware of. The Job controller then watches for a deletionTimestamp. Once the deletion timestamp exists, the Job controller terminates any current Pods for the Job. Kueue can watch the status of the Job and remove the finalizer once the Job has either a "Failed" condition that is true, or a "Completed" condition that is true.

The Job remains in the API until the last finalizer is removed.

Would that work?