kubernetes: Cronjobs - failedJobsHistoryLimit not reaping state `Error`
/kind bug /sig apps
Cronjob limits were defined in #52390 - however it doesn’t appear that failedJobsHistoryLimit will reap cronjob pods that end up in a state of Error
kubectl get pods --show-all | grep cronjob | grep Error | wc -l
566
Cronjob had failedJobsHistoryLimit set to 2
Environment:
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:“1”, Minor:“7”, GitVersion:“v1.7.6”, GitCommit:“4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c”, GitTreeState:“clean”, BuildDate:“2017-09-15T08:51:09Z”, GoVersion:“go1.9”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“6”, GitVersion:“v1.6.4”, GitCommit:“d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae”, GitTreeState:“clean”, BuildDate:“2017-05-19T18:33:17Z”, GoVersion:“go1.7.5”, Compiler:“gc”, Platform:“linux/amd64”}
- OS (e.g. from /etc/os-release):
Centos7.3
- Kernel (e.g.
uname -a):
4.4.83-1.el7.elrepo.x86_64 #1 SMP Thu Aug 17 09:03:51 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 26
- Comments: 50 (27 by maintainers)
It was probably not fixed, people just ghost on their own issues 😕
I’m seeing this happen as well (1.7.3) -
successfulJobsHistoryLimit(set to 2) works fine, butfailedJobsHistoryLimit(set to 5) will end up with hundreds of pods inCrashLoopBackOffuntil eventually it hits my nodes’ resource limits and then they just stack up inPendinggonna freeze this until someone wants to volunteer to work on this /lifecycle frozen
reopening this because i see a lot of attempts to do so (only org members can use prow commands). /reopen
@soltysh Right now I don’t have any, I’ve been manually clearing them with a little bash one-liner for a while. Next time I experience it, though, I’ll grab the YAML and paste it here. Thanks!
Same problem for me: I’ve got ~8000 pods in state “Error” when failedJobsHistoryLimit was set to 5. The cronjob had wrong environment variable so containers were failed trying to start in application level. From the K8s side the configuration was ok, but internal application error led to this situation.