kubernetes: Kubernetes CronJob pods is not getting clean-up when Job is completed

sig/apps

What happened: I’ve created a CronJob with setting failedJobsHistoryLimit at 1 and successfulJobsHistoryLimit at 3. The pod spawned from the jobs is finished with status ‘Completed’ but not getting deleted from each scheduling so the number of Pods in the k8s cluster keeps increasing. What you expected to happen: The pods spawned from the cronjob should get cleanup after the job has completed How to reproduce it (as minimally and precisely as possible): Deploy elasticsearch curator 5.5.4 (stable/elasticsearch-curator) Anything else we need to know?:

   # kubectl get cronjob -o yaml

		apiVersion: batch/v1beta1
		kind: CronJob
		metadata:
		  labels:
		    app: elasticsearch-curator
		    chart: elasticsearch-curator-1.0.1
		    heritage: Tiller
		    release: curator
		  name: curator-elasticsearch-curator
		  namespace: elk
		spec:
		  concurrencyPolicy: Allow
		  failedJobsHistoryLimit: 1
		  jobTemplate:
		    metadata:
		      creationTimestamp: null
		      labels:
		        app: elasticsearch-curator
		        release: curator
		    spec:
		      template:
		        metadata:
		          creationTimestamp: null
		          labels:
		            app: elasticsearch-curator
		            release: curator
		        spec:
		          containers:
		          - args:
		            - --config
		            - /etc/es-curator/config.yml
		            - /etc/es-curator/action_file.yml
		            command:
		            - curator
		            image: quay.io/pires/docker-elasticsearch-curator:5.5.4
		            imagePullPolicy: IfNotPresent
		            name: elasticsearch-curator
		            resources:
		              limits:
		                cpu: 300m
		                memory: 256Mi
		              requests:
		                cpu: 100m
		                memory: 128Mi
		            terminationMessagePath: /dev/termination-log
		            terminationMessagePolicy: File
		            volumeMounts:
		            - mountPath: /etc/es-curator
		              name: config-volume
		          dnsPolicy: ClusterFirst
		          restartPolicy: Never
		          schedulerName: default-scheduler
		          securityContext: {}
		          terminationGracePeriodSeconds: 30
		          volumes:
		          - configMap:
		              defaultMode: 420
		              name: curator-elasticsearch-curator-config
		            name: config-volume
		  schedule: */59 * * * *
		  successfulJobsHistoryLimit: 3
		  suspend: false



   # kubectl get pod -n elk

             NAMESPACE        NAME                                                        READY   STATUS    RESTARTS     AGE
                elk           curator-elasticsearch-curator-1550807940-94vcr              0/1     Completed   0          12h
		elk           curator-elasticsearch-curator-1550808000-7dj8l              0/1     Completed   0          12h
		elk           curator-elasticsearch-curator-1550811540-7kpnt              0/1     Completed   0          11h
		elk           curator-elasticsearch-curator-1550811600-r4sql              0/1     Completed   0          11h
		elk           curator-elasticsearch-curator-1550815140-rrmdv              0/1     Completed   0          10h
		elk           curator-elasticsearch-curator-1550815200-h6v47              0/1     Completed   0          10h
		elk           curator-elasticsearch-curator-1550818740-wgm8n              0/1     Completed   0          9h
		elk           curator-elasticsearch-curator-1550818800-6pr9f              0/1     Completed   0          9h
		elk           curator-elasticsearch-curator-1550822340-qg7qg              0/1     Completed   0          8h
		elk           curator-elasticsearch-curator-1550822400-dgh5v              0/1     Completed   0          8h
		elk           curator-elasticsearch-curator-1550825940-mxf4p              0/1     Completed   0          7h47m
		elk           curator-elasticsearch-curator-1550826000-8wxt9              0/1     Completed   0          7h46m
		elk           curator-elasticsearch-curator-1550829540-2bjfq              0/1     Completed   0          6h47m
		elk           curator-elasticsearch-curator-1550829600-tg8qj              0/1     Completed   0          6h46m
		elk           curator-elasticsearch-curator-1550833140-xt5vp              0/1     Completed   0          5h47m
		elk           curator-elasticsearch-curator-1550833200-wz996              0/1     Completed   0          5h46m
		elk           curator-elasticsearch-curator-1550836740-tdhg2              0/1     Completed   0          4h47m
		elk           curator-elasticsearch-curator-1550836800-96mz6              0/1     Completed   0          4h46m
		elk           curator-elasticsearch-curator-1550840340-ljz2c              0/1     Completed   0          3h47m
		elk           curator-elasticsearch-curator-1550840400-pftl4              0/1     Completed   0          3h46m
		elk           curator-elasticsearch-curator-1550843940-27pdn              0/1     Completed   0          167m
		elk           curator-elasticsearch-curator-1550844000-vl452              0/1     Completed   0          166m
		elk           curator-elasticsearch-curator-1550847540-mrfbp              0/1     Completed   0          107m
		elk           curator-elasticsearch-curator-1550847600-tp2fv              0/1     Completed   0          106m
		elk           curator-elasticsearch-curator-1550851140-d4ggz              0/1     Completed   0          47m
		elk           curator-elasticsearch-curator-1550851200-cgr2m              0/1     Completed   0          46m

Environment:

  • Kubernetes version (use kubectl version): v1.13
  • Cloud provider or hardware configuration: Azure
  • OS (e.g: cat /etc/os-release): RHEL 7.6
  • Kernel (e.g. uname -a): 3.10.0-957.1.3.el7.x86_64
  • Install tools: kubespray
  • Others: curator 5.5.4

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 15
  • Comments: 49 (15 by maintainers)

Most upvoted comments

Sorry but why is it so difficult to fix this??!??! This is a very annoying problem. If a job starts every minute then in few days the GKE quota is reached - Why this is not prioritised?? Why no one looks at this problem??

It looks like the referenced fix was merged into 1.20 – is there any known solution for 1.18 users? will the fix be backported into 1.18.x?

We’re currently running k8s 1.19 and we’re running into this issue. Is there a known solution for 1.19.x users?

Now running K8s 1.13.10 and it seems the completed (cronjob) pods are not automatically remove. I thought this was resolved in 1.13.3

I try on 1.27.x and this issue still exist.

Pretty sure this is also the case in 1.14.x.

I try on 1.27.x and this issue still exist.

We’re facing the same issues on 1.27 (AWS EKS). The jobs are correctly cleaned, the pods are kept forever. As a workaround I’ve built a cleanup utility to remove any pods having a parent job which doesn’t exist anymore: https://github.com/davidgiga1993/cronjob-pod-cleaner

Still a problem on 1.27 AWS EKS: failedJobsHistoryLimit set to 1 and we have 7 in an error state.

I could solve this by setting successfulJobsHistoryLimit: 0 failedJobsHistoryLimit: 0

On 1.22 and we are still seeing this as well.