kubernetes: Job being constanly recreated despite RestartPolicy: Never
I have a job with the spec as shown below. During testing we are forcing a non zero exit code and would expect the job to not be recreated. However the Job controller keeps on creating new pods again and again while I would expect after reading the docs that it would run it once and if it fails it fails. Are our assumptions incorrect and if they are is there someway we can run an image on the K8S cluster and get the return value back without it being scheduled to run again on failure?
apiVersion: extensions/v1beta1
kind: Job
metadata:
name: app-deploy
spec:
selector:
matchLabels:
app: app-deploy
template:
metadata:
name: app-deploy
labels:
app: app-deploy
spec:
containers:
- name: db-upgrade
image: XXXX/XXX
restartPolicy: Never
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 20
- Comments: 15 (8 by maintainers)
Commits related to this issue
- Add tag to `bosh-task` for test roles A `bosh-task` with the tag `stop-on-failure` is a basic Pod that won't restart, even if it fails. This can be used to run tests where you need it to fail, and no... — committed to cloudfoundry-incubator/fissile by deleted user 7 years ago
- Add tag to `bosh-task` for test roles A `bosh-task` with the tag `stop-on-failure` is a basic Pod that won't restart, even if it fails. This can be used to run tests where you need it to fail, and no... — committed to cloudfoundry-incubator/fissile by deleted user 7 years ago
- becuase https://github.com/kubernetes/kubernetes/issues/20255 — committed to stephenlacy/kubermaster by stephenlacy 6 years ago
- becuase https://github.com/kubernetes/kubernetes/issues/20255 — committed to stephenlacy/kubermaster by stephenlacy 6 years ago
For any future readers, setting
restartPolicy: OnFailurewill prevent the neverending creation of pods because it will just restart the failing one.If you want to create new pods on failure with
restartPolicy: Never, you can limit them by settingactiveDeadlineSeconds– Upon reaching the deadline without success, the job will have status with reason: DeadlineExceeded. No more pods will be created, and existing pods will be deleted.Trying to parse “Generally speaking the restart policy is applied to a pod not to a job, as you might understood initially. k8s role is to actually run a pod to successful completion, it cannot judge based on the exit code whether that failure was expected and should re-run or it was not and should stop re-trying. It’s user job to do that distinction based on the output from k8s.” @soltysh
Does restartPolicy of a job have any meaning whatsoever? If jobs communicate their failure as an exit code, then how can the scheduler make the conclusion about what has failed? If jobs do not communicate their result as an exit code then how are they expected to do that instead?
In essence, i would like to trigger a job once, communicate failure/success somehow (i do this with an exit code now), and inspect the log to see what might have gone wrong/right.
Quite a bit late, but if anyone else looks for it’s https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#pod-backoff-failure-policy
@soltysh: the biggest problem with ‘try to reach completions’ that it create many dead/error pods objects in etcd which significantly slow down all api calls (and of course delete take forever). It looks like use need to implement watchdog for all jobs ):
@innovia
.spec.backoffLimitintroduced in https://github.com/kubernetes/kubernetes/pull/48075 and https://github.com/kubernetes/kubernetes/pull/51153 is what you’ll be looking for.I was still confused on why
restartPolicy: Neverrestarts the job and I wanted to know the difference betweenOnFailureandNever. I found openshift documentation page which helped me understand better. Hopefully this helps someone.Lastly, as kubernetes documentation states when in doubt use
NeverSounds like this is resolved.
Also, StackOverflow is the best place to ask questions like this, since it makes the answers much more findable by people who have similar questions in the future.
This is by design, see here. Generally speaking the restart policy is applied to a pod not to a job, as you might understood initially. k8s role is to actually run a pod to successful completion, it cannot judge based on the exit code whether that failure was expected and should re-run or it was not and should stop re-trying. It’s user job to do that distinction based on the output from k8s.
@soltysh: I appreciate your effort, but still not convinced that I need to write my own controller just for restart (of course we can use pod instead of job, but it will be nice to have consistent api/behavior between job and scheduled jobs). Any thoughts about adding restartPolicy at Job level?