kubernetes: Pods stuck in Terminating status (UnmountVolume.TearDown failed)
/kind bug
What happened:
Many pods after deleting will stick infinitely in Terminating status, due to inability to unmount volumes (UnmountVolume.TearDown failed
; mostly default secrets, but occasionally it would also include openebs volumes). It may be connected with /var/lib/kubelet
symlinked into /opt/kubelet
, but thats a guess based on hints in #65110. Unlike in the mentioned issue, my problem persisted after updating to 1.10.5.
What you expected to happen: Pod gets removed and secret unmounted.
How to reproduce it (as minimally and precisely as possible): Remove some pods.
Anything else we need to know?:
Symptoms so far:
mount
,cat /proc/mount
and/etc/mtab
all lists the volume as mounted - sometimes even multiple times! The entry for sample problem is:tmpfs on /path-to-kubelet/pods/$POD_UID/volumes/kubernetes.io~secret/default-token-xxxxx type tmpfs (rw,relatime)
umount
on directory listed in/proc/mount
repliesumount: /path-to-kubelet/pods/$POD_UID/volumes/kubernetes.io~secret/default-token-xxxxx: not mounted
rm -rf
tells me thatDevice or resource busy
lsof | grep $POD_UID
doesn’t show any process using this pathdocker ps -qa | xargs docker container inspect -f '{{ .Name }} {{ json .Mounts }}' | grep $POD_UID
doesn’t show any container using old pod path
Other stuff:
- Kubernetes 1.9.2, (updated to 1.10.5; issue persisted)
- Ubuntu 16.04.4 LTS
- uname -a: Linux 4.4.0-121-generic
- Docker 17.03.1-ce (updated to 17.03.2-ce; issue persisted)
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 7
- Comments: 21 (5 by maintainers)
This issue is affecting our installations as well. Setting the kubelet flag
--root-dir
, as mentioned in #65110, to the value referenced by our symlink from /var/lib/kublet seems to resolve the issue.We noticed this specifically on pods which use volume subPath mounts. The source was either a configMap or secret in most cases
Example recurring error in kubelet logs manifests as a
device or resource busy
After changing the root-dir of kubelet, these entries appears once and then the volume is removed and the pod completes termination
Versions:
We’re running into the same issue running 1.13.5, would be interested in that hotfix command… regards, strowi
We currently have it working on v1.9 to v1.12 using the following parameters on docker 18.06 (latest coreos stable):
We are observing this problem on k8s 1.11.3; similar to @chrischdi the kubelet also runs in a container