kubernetes: Job being constanly recreated despite RestartPolicy: Never

I have a job with the spec as shown below. During testing we are forcing a non zero exit code and would expect the job to not be recreated. However the Job controller keeps on creating new pods again and again while I would expect after reading the docs that it would run it once and if it fails it fails. Are our assumptions incorrect and if they are is there someway we can run an image on the K8S cluster and get the return value back without it being scheduled to run again on failure?

apiVersion: extensions/v1beta1
kind: Job
metadata:
  name: app-deploy
spec:
  selector:
    matchLabels:
      app: app-deploy
  template:
    metadata:
      name: app-deploy
      labels:
        app: app-deploy
    spec:
      containers:
      - name: db-upgrade
        image: XXXX/XXX
      restartPolicy: Never

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 20
Comments: 15 (8 by maintainers)

Commits related to this issue

Add tag to `bosh-task` for test roles A `bosh-task` with the tag `stop-on-failure` is a basic Pod that won't restart, even if it fails. This can be used to run tests where you need it to fail, and no... — committed to cloudfoundry-incubator/fissile by deleted user 7 years ago
Add tag to `bosh-task` for test roles A `bosh-task` with the tag `stop-on-failure` is a basic Pod that won't restart, even if it fails. This can be used to run tests where you need it to fail, and no... — committed to cloudfoundry-incubator/fissile by deleted user 7 years ago
becuase https://github.com/kubernetes/kubernetes/issues/20255 — committed to stephenlacy/kubermaster by stephenlacy 6 years ago
becuase https://github.com/kubernetes/kubernetes/issues/20255 — committed to stephenlacy/kubermaster by stephenlacy 6 years ago

Most upvoted comments

For any future readers, setting restartPolicy: OnFailure will prevent the neverending creation of pods because it will just restart the failing one.

If you want to create new pods on failure with restartPolicy: Never, you can limit them by setting activeDeadlineSeconds – Upon reaching the deadline without success, the job will have status with reason: DeadlineExceeded. No more pods will be created, and existing pods will be deleted.

+79

philipbjorge on Jun 23, 2017

Trying to parse “Generally speaking the restart policy is applied to a pod not to a job, as you might understood initially. k8s role is to actually run a pod to successful completion, it cannot judge based on the exit code whether that failure was expected and should re-run or it was not and should stop re-trying. It’s user job to do that distinction based on the output from k8s.” @soltysh

Does restartPolicy of a job have any meaning whatsoever? If jobs communicate their failure as an exit code, then how can the scheduler make the conclusion about what has failed? If jobs do not communicate their result as an exit code then how are they expected to do that instead?

In essence, i would like to trigger a job once, communicate failure/success somehow (i do this with an exit code now), and inspect the log to see what might have gone wrong/right.

+24

remster on Oct 24, 2018

Quite a bit late, but if anyone else looks for it’s https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#pod-backoff-failure-policy

soltysh on Jan 9, 2018

@soltysh: the biggest problem with ‘try to reach completions’ that it create many dead/error pods objects in etcd which significantly slow down all api calls (and of course delete take forever). It looks like use need to implement watchdog for all jobs ):

eghobo on Jun 9, 2016

@innovia .spec.backoffLimit introduced in https://github.com/kubernetes/kubernetes/pull/48075 and https://github.com/kubernetes/kubernetes/pull/51153 is what you’ll be looking for.

soltysh on Sep 8, 2017

I was still confused on why restartPolicy: Never restarts the job and I wanted to know the difference between OnFailure and Never. I found openshift documentation page which helped me understand better. Hopefully this helps someone.

The Job specification restart policy only applies to the pods, and not the job controller. However, the job controller is hard-coded to keep retrying Jobs to completion.

As such, restartPolicy: Never or --restart=Never results in the same behavior as restartPolicy: OnFailure or --restart=OnFailure. That is, when a Job fails it is restarted automatically until it succeeds (or is manually discarded). The policy only sets which subsystem performs the restart.

With the Never policy, the job controller performs the restart. With each attempt, the job controller increments the number of failures in the Job status and create new pods. This means that with each failed attempt, the number of pods increases.

With the OnFailure policy, kubelet performs the restart. Each attempt does not increment the number of failures in the Job status. In addition, kubelet will retry failed Jobs starting pods on the same nodes.

Lastly, as kubernetes documentation states when in doubt use Never

thatInfrastructureGuy on Feb 2, 2021

Sounds like this is resolved.

Also, StackOverflow is the best place to ask questions like this, since it makes the answers much more findable by people who have similar questions in the future.

a-robinson on Jan 30, 2016

This is by design, see here. Generally speaking the restart policy is applied to a pod not to a job, as you might understood initially. k8s role is to actually run a pod to successful completion, it cannot judge based on the exit code whether that failure was expected and should re-run or it was not and should stop re-trying. It’s user job to do that distinction based on the output from k8s.

soltysh on Jan 28, 2016

@soltysh: I appreciate your effort, but still not convinced that I need to write my own controller just for restart (of course we can use pod instead of job, but it will be nice to have consistent api/behavior between job and scheduled jobs). Any thoughts about adding restartPolicy at Job level?

eghobo on Jun 9, 2016