kubernetes: Cronjobs - failedJobsHistoryLimit not reaping state `Error`

/kind bug /sig apps

Cronjob limits were defined in #52390 - however it doesn’t appear that failedJobsHistoryLimit will reap cronjob pods that end up in a state of Error

 kubectl get pods --show-all | grep cronjob | grep Error | wc -l
 566

Cronjob had failedJobsHistoryLimit set to 2

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:“1”, Minor:“7”, GitVersion:“v1.7.6”, GitCommit:“4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c”, GitTreeState:“clean”, BuildDate:“2017-09-15T08:51:09Z”, GoVersion:“go1.9”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“6”, GitVersion:“v1.6.4”, GitCommit:“d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae”, GitTreeState:“clean”, BuildDate:“2017-05-19T18:33:17Z”, GoVersion:“go1.7.5”, Compiler:“gc”, Platform:“linux/amd64”}

OS (e.g. from /etc/os-release):

Centos7.3

Kernel (e.g. uname -a):

4.4.83-1.el7.elrepo.x86_64 #1 SMP Thu Aug 17 09:03:51 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

About this issue

Original URL
State: open
Created 7 years ago
Reactions: 26
Comments: 50 (27 by maintainers)

Most upvoted comments

It was probably not fixed, people just ghost on their own issues 😕

+11

2rs2ts on Oct 9, 2019

I’m seeing this happen as well (1.7.3) - successfulJobsHistoryLimit (set to 2) works fine, but failedJobsHistoryLimit (set to 5) will end up with hundreds of pods in CrashLoopBackOff until eventually it hits my nodes’ resource limits and then they just stack up in Pending

mcronce on Mar 11, 2018

gonna freeze this until someone wants to volunteer to work on this /lifecycle frozen

alejandrox1 on Jul 14, 2020

reopening this because i see a lot of attempts to do so (only org members can use prow commands). /reopen

alejandrox1 on Jul 14, 2020

@soltysh Right now I don’t have any, I’ve been manually clearing them with a little bash one-liner for a while. Next time I experience it, though, I’ll grab the YAML and paste it here. Thanks!

mcronce on Mar 19, 2018

Same problem for me: I’ve got ~8000 pods in state “Error” when failedJobsHistoryLimit was set to 5. The cronjob had wrong environment variable so containers were failed trying to start in application level. From the K8s side the configuration was ok, but internal application error led to this situation.

KIVagant on Mar 17, 2018