kubernetes: Cronjobs - failedJobsHistoryLimit not reaping state `Error`

/kind bug /sig apps

Cronjob limits were defined in #52390 - however it doesn’t appear that failedJobsHistoryLimit will reap cronjob pods that end up in a state of Error

 kubectl get pods --show-all | grep cronjob | grep Error | wc -l
 566

Cronjob had failedJobsHistoryLimit set to 2

Environment:

  • Kubernetes version (use kubectl version):

Client Version: version.Info{Major:“1”, Minor:“7”, GitVersion:“v1.7.6”, GitCommit:“4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c”, GitTreeState:“clean”, BuildDate:“2017-09-15T08:51:09Z”, GoVersion:“go1.9”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“6”, GitVersion:“v1.6.4”, GitCommit:“d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae”, GitTreeState:“clean”, BuildDate:“2017-05-19T18:33:17Z”, GoVersion:“go1.7.5”, Compiler:“gc”, Platform:“linux/amd64”}

  • OS (e.g. from /etc/os-release):

Centos7.3

  • Kernel (e.g. uname -a):

4.4.83-1.el7.elrepo.x86_64 #1 SMP Thu Aug 17 09:03:51 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 26
  • Comments: 50 (27 by maintainers)

Most upvoted comments

It was probably not fixed, people just ghost on their own issues 😕

I’m seeing this happen as well (1.7.3) - successfulJobsHistoryLimit (set to 2) works fine, but failedJobsHistoryLimit (set to 5) will end up with hundreds of pods in CrashLoopBackOff until eventually it hits my nodes’ resource limits and then they just stack up in Pending

gonna freeze this until someone wants to volunteer to work on this /lifecycle frozen

reopening this because i see a lot of attempts to do so (only org members can use prow commands). /reopen

@soltysh Right now I don’t have any, I’ve been manually clearing them with a little bash one-liner for a while. Next time I experience it, though, I’ll grab the YAML and paste it here. Thanks!

Same problem for me: I’ve got ~8000 pods in state “Error” when failedJobsHistoryLimit was set to 5. The cronjob had wrong environment variable so containers were failed trying to start in application level. From the K8s side the configuration was ok, but internal application error led to this situation.