kubernetes: kubelet is not able to delete pod with mounted secret/configmap after restart
From https://github.com/kubernetes/kubernetes/issues/96038#issuecomment-728928671
What happened: In https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1328563866136743936 one of the nodes (gce-scale-cluster-minion-group-bddx) restarted for some reason (some kernel panic).
The last kubelet log entry is at 08:13:05.362859 and the first one after restart is 08:15:40.361826.
In the meantime (at 08:13:17.496033), one of the pods (small-deployment-167-56c965c4cf-9pw8k) running on that kubelet has been deleted by generic-garbage-collector (i.e. deletionTimestamp has been set).
Kubelet after restart was never able to mark this pod as deleted (i.e. object has been never actually deleted).
What you expected to happen: After kubelet’s restart, the pod object will be deleted from etcd.
How to reproduce it (as minimally and precisely as possible): Based on our logs, stopping kubelet for a while, deleting pod running on it and restarting kubelet should trigger this issue.
Potentially, it may be important that the pod was using configmap (that has been already deleted). Anything else we need to know?:
Link to the test run: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1328563866136743936 Kubelet’s logs: http://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1328563866136743936/artifacts/gce-scale-cluster-minion-group-bddx/kubelet.log Pod name: test-sk0eco-5/small-deployment-167-56c965c4cf-9pw8k More logs (like all kube-apiserver’s logs for that pod can be found here): https://github.com/kubernetes/kubernetes/issues/96038#issuecomment-728939343
Environment:
- Kubernetes version (use
kubectl version
): v1.20.0-beta.1.663+147a120948482e - Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
): - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 24 (17 by maintainers)
Commits related to this issue
- Merge pull request #1822 from mborsz/miti Mitigate https://github.com/kubernetes/kubernetes/issues/96635 in load test — committed to kubernetes/perf-tests by k8s-ci-robot 3 years ago
I was taking a look at reproducing and fixing this issue and wanted to post my findings. Basically what is happening is:
The problem is - we simply can’t choose to skip adding volumes to DSOW if pod has deletionTimestamp, because that will result in volume never getting cleaned up. So fix proposed https://github.com/kubernetes/kubernetes/pull/96790 is not full proof.
A real solution IMO is to add all pods+volumes in uncertain state during reconstruction, so as volumes can be removed from DSOW and volumes are still required to be cleaned up before pod can be terminated. @jsafrane has a PR that implements part of this solution - https://github.com/kubernetes/kubernetes/pull/108180 . I am looking in using it to fix this bug.