kubernetes: Kubernetes CronJob pods is not getting clean-up when Job is completed
sig/apps
What happened: I’ve created a CronJob with setting failedJobsHistoryLimit at 1 and successfulJobsHistoryLimit at 3. The pod spawned from the jobs is finished with status ‘Completed’ but not getting deleted from each scheduling so the number of Pods in the k8s cluster keeps increasing. What you expected to happen: The pods spawned from the cronjob should get cleanup after the job has completed How to reproduce it (as minimally and precisely as possible): Deploy elasticsearch curator 5.5.4 (stable/elasticsearch-curator) Anything else we need to know?:
# kubectl get cronjob -o yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
labels:
app: elasticsearch-curator
chart: elasticsearch-curator-1.0.1
heritage: Tiller
release: curator
name: curator-elasticsearch-curator
namespace: elk
spec:
concurrencyPolicy: Allow
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
labels:
app: elasticsearch-curator
release: curator
spec:
template:
metadata:
creationTimestamp: null
labels:
app: elasticsearch-curator
release: curator
spec:
containers:
- args:
- --config
- /etc/es-curator/config.yml
- /etc/es-curator/action_file.yml
command:
- curator
image: quay.io/pires/docker-elasticsearch-curator:5.5.4
imagePullPolicy: IfNotPresent
name: elasticsearch-curator
resources:
limits:
cpu: 300m
memory: 256Mi
requests:
cpu: 100m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/es-curator
name: config-volume
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: curator-elasticsearch-curator-config
name: config-volume
schedule: */59 * * * *
successfulJobsHistoryLimit: 3
suspend: false
# kubectl get pod -n elk
NAMESPACE NAME READY STATUS RESTARTS AGE
elk curator-elasticsearch-curator-1550807940-94vcr 0/1 Completed 0 12h
elk curator-elasticsearch-curator-1550808000-7dj8l 0/1 Completed 0 12h
elk curator-elasticsearch-curator-1550811540-7kpnt 0/1 Completed 0 11h
elk curator-elasticsearch-curator-1550811600-r4sql 0/1 Completed 0 11h
elk curator-elasticsearch-curator-1550815140-rrmdv 0/1 Completed 0 10h
elk curator-elasticsearch-curator-1550815200-h6v47 0/1 Completed 0 10h
elk curator-elasticsearch-curator-1550818740-wgm8n 0/1 Completed 0 9h
elk curator-elasticsearch-curator-1550818800-6pr9f 0/1 Completed 0 9h
elk curator-elasticsearch-curator-1550822340-qg7qg 0/1 Completed 0 8h
elk curator-elasticsearch-curator-1550822400-dgh5v 0/1 Completed 0 8h
elk curator-elasticsearch-curator-1550825940-mxf4p 0/1 Completed 0 7h47m
elk curator-elasticsearch-curator-1550826000-8wxt9 0/1 Completed 0 7h46m
elk curator-elasticsearch-curator-1550829540-2bjfq 0/1 Completed 0 6h47m
elk curator-elasticsearch-curator-1550829600-tg8qj 0/1 Completed 0 6h46m
elk curator-elasticsearch-curator-1550833140-xt5vp 0/1 Completed 0 5h47m
elk curator-elasticsearch-curator-1550833200-wz996 0/1 Completed 0 5h46m
elk curator-elasticsearch-curator-1550836740-tdhg2 0/1 Completed 0 4h47m
elk curator-elasticsearch-curator-1550836800-96mz6 0/1 Completed 0 4h46m
elk curator-elasticsearch-curator-1550840340-ljz2c 0/1 Completed 0 3h47m
elk curator-elasticsearch-curator-1550840400-pftl4 0/1 Completed 0 3h46m
elk curator-elasticsearch-curator-1550843940-27pdn 0/1 Completed 0 167m
elk curator-elasticsearch-curator-1550844000-vl452 0/1 Completed 0 166m
elk curator-elasticsearch-curator-1550847540-mrfbp 0/1 Completed 0 107m
elk curator-elasticsearch-curator-1550847600-tp2fv 0/1 Completed 0 106m
elk curator-elasticsearch-curator-1550851140-d4ggz 0/1 Completed 0 47m
elk curator-elasticsearch-curator-1550851200-cgr2m 0/1 Completed 0 46m
Environment:
- Kubernetes version (use
kubectl version
): v1.13 - Cloud provider or hardware configuration: Azure
- OS (e.g:
cat /etc/os-release
): RHEL 7.6 - Kernel (e.g.
uname -a
): 3.10.0-957.1.3.el7.x86_64 - Install tools: kubespray
- Others: curator 5.5.4
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 15
- Comments: 49 (15 by maintainers)
Sorry but why is it so difficult to fix this??!??! This is a very annoying problem. If a job starts every minute then in few days the GKE quota is reached - Why this is not prioritised?? Why no one looks at this problem??
It looks like the referenced fix was merged into 1.20 – is there any known solution for 1.18 users? will the fix be backported into 1.18.x?
We’re currently running k8s 1.19 and we’re running into this issue. Is there a known solution for 1.19.x users?
Now running K8s 1.13.10 and it seems the completed (cronjob) pods are not automatically remove. I thought this was resolved in 1.13.3
I try on 1.27.x and this issue still exist.
Pretty sure this is also the case in 1.14.x.
We’re facing the same issues on 1.27 (AWS EKS). The jobs are correctly cleaned, the pods are kept forever. As a workaround I’ve built a cleanup utility to remove any pods having a parent job which doesn’t exist anymore: https://github.com/davidgiga1993/cronjob-pod-cleaner
Still a problem on 1.27 AWS EKS:
failedJobsHistoryLimit
set to 1 and we have 7 in an error state.I could solve this by setting successfulJobsHistoryLimit: 0 failedJobsHistoryLimit: 0
On 1.22 and we are still seeing this as well.