kubernetes: Kubelet does not delete evicted pods

/kind feature

What happened: Kubelet has evicted pods due to disk pressure. Eventually, the disk pressure went away and the pods were scheduled and started again, but the evicted pods remained in the list of pods (kubectl get pod --show-all).

What you expected to happen: Wouldn’t it be better if the kubelet would have deleted those evicted pods? The expected behaviour would therefore be to not see the evicted pods anymore, i.e. that they get deleted.

How to reproduce it (as minimally and precisely as possible): Start kubelet with --eviction-hard and --eviction-soft with high thresholds or fill up the disk of a worker node.

Environment:

Kubernetes version (use kubectl version): 1.8.2
Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): Container Linux 1465.7.0 (Ladybug)
Kernel (e.g. uname -a): 4.12.10-coreos

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 21
Comments: 17 (8 by maintainers)

Most upvoted comments

A quick workaround we use, is to delete all evicted pods manually after an incident: kubectl get pods --all-namespaces -ojson | jq -r '.items[] | select(.status.reason!=null) | select(.status.reason | contains("Evicted")) | .metadata.name + " " + .metadata.namespace' | xargs -n2 -l bash -c 'kubectl delete pods $0 --namespace=$1' Not as nice as automatic delete, but it works. (Tested with 1.6.7, i heard in 1.7 you need to add --show-all)

+91

krallistic on Nov 7, 2017

I suppose, this issue can be closed, because the evicted pods deletion can be controlled through settings in kube-controller-manager.

For those k8s users who hit the kube-apiserver or etcd performance issues due to too many evicted pods, i would recommend updating the kube-controller-manager config to set --terminated-pod-gc-threshold 100 or similar small value. The default GC threshold is 12500, which is too high for most etcd installations. Reading 12500 pod records from etcd takes seconds to complete.

Also ask yourself why are there so many evicted pods? Maybe your kube-scheduler keeps scheduling pods on a node which already reports DiskPressure or MemoryPressure? This could be the case if the kube-scheduler is configured with a custom --policy-config-file which has no CheckNodeMemoryPressure or CheckNodeDiskPressure in the list of policy predicates.

$ kube-controller-manager --help 2>&1|grep terminated
      --terminated-pod-gc-threshold int32                                 Number of terminated pods that can exist before the terminated pod garbage collector starts deleting terminated pods. If <= 0, the terminated pod garbage collector is disabled. (default 12500)

+41

kabakaev on Feb 12, 2018

Why does kubernetes keep evicted pod, and what is the purpose of this design？

+21

gosoon on Sep 24, 2021

@so0k I created a cron job using a Yaml file with this config (need to fix the formatting, check https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/) :

apiVersion: batch/v1beta1 kind: CronJob metadata: name: delete-failed-pods spec: schedule: “*/30 * * * *” failedJobsHistoryLimit: 1 successfulJobsHistoryLimit: 1 jobTemplate: spec: template: spec: containers: - name: kubectl-runner image: wernight/kubectl command: [“sh”, “-c”, “kubectl get pods --all-namespaces --field-selector ‘status.phase==Failed’ -o json | kubectl delete -f -”] restartPolicy: OnFailure

Create the task with kubectl create -f "PATH_TO_cronjob.yaml"

Check the status of the task with kubectl get cronjob delete-failed-pods

Delete the task with delete cronjob delete-failed-pods

shelbaz on May 13, 2020

@kabakaev - wouldn’t pod gc cover all pods (including terminated pods for other reasons) - what if we just want evicted pods to be cleaned up periodically?

so0k on Feb 27, 2018

Statefulset will auto delete Failed pod https://github.com/kubernetes/kubernetes/blob/52eea971c57580c6b1b74f0a12bf9cc6083a4d6b/pkg/controller/statefulset/stateful_set_control.go#L386-L393. For now Deployment and DaemonSet do not do this.

andyxning on Jun 14, 2021