kubernetes: Pods stuck in Terminating status (UnmountVolume.TearDown failed)

/kind bug

What happened: Many pods after deleting will stick infinitely in Terminating status, due to inability to unmount volumes (UnmountVolume.TearDown failed; mostly default secrets, but occasionally it would also include openebs volumes). It may be connected with /var/lib/kubelet symlinked into /opt/kubelet, but thats a guess based on hints in #65110. Unlike in the mentioned issue, my problem persisted after updating to 1.10.5.

What you expected to happen: Pod gets removed and secret unmounted.

How to reproduce it (as minimally and precisely as possible): Remove some pods.

Anything else we need to know?:

Symptoms so far:

mount, cat /proc/mount and /etc/mtab all lists the volume as mounted - sometimes even multiple times! The entry for sample problem is: tmpfs on /path-to-kubelet/pods/$POD_UID/volumes/kubernetes.io~secret/default-token-xxxxx type tmpfs (rw,relatime)
umount on directory listed in /proc/mount replies umount: /path-to-kubelet/pods/$POD_UID/volumes/kubernetes.io~secret/default-token-xxxxx: not mounted
rm -rf tells me that Device or resource busy
lsof | grep $POD_UID doesn’t show any process using this path
docker ps -qa | xargs docker container inspect -f '{{ .Name }} {{ json .Mounts }}' | grep $POD_UID doesn’t show any container using old pod path

Other stuff:

Kubernetes 1.9.2, (updated to 1.10.5; issue persisted)
Ubuntu 16.04.4 LTS
uname -a: Linux 4.4.0-121-generic
Docker 17.03.1-ce (updated to 17.03.2-ce; issue persisted)

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 7
Comments: 21 (5 by maintainers)

Most upvoted comments

This issue is affecting our installations as well. Setting the kubelet flag --root-dir, as mentioned in #65110, to the value referenced by our symlink from /var/lib/kublet seems to resolve the issue.

We noticed this specifically on pods which use volume subPath mounts. The source was either a configMap or secret in most cases

Example recurring error in kubelet logs manifests as a device or resource busy

estedpendingoperations.go:267] Operation for "\"kubernetes.io/configmap/3a92a5bf-8422-11e8-830b-066c762652cc-configmap-XXXXX\" (\"3a92a5bf-8422-11e8-830b-066c762652cc\")" failed. No retries permitted until 2018-07-13 16:03:57.636104303 +0000 UTC m=+67.631270721 (durationBeforeRetry 32s). Error: "error cleaning subPath mounts for volume \"configmap-XXXXXX\" (UniqueName: \"kubernetes.io/configmap/3a92a5bf-8422-11e8-830b-066c762652cc-configmap-XXXXXX\") pod \"3a92a5bf-8422-11e8-830b-066c762652cc\" (UID: \"3a92a5bf-8422-11e8-830b-066c762652cc\") : error deleting /var/lib/kubelet/pods/3a92a5bf-8422-11e8-830b-066c762652cc/volume-subpaths/configmap-XXXXX/xxxxx/1: remove /var/lib/kubelet/pods/3a92a5bf-8422-11e8-830b-066c762652cc/volume-subpaths/configmap-XXXXX/xxxxxx/1: device or resource busy"

After changing the root-dir of kubelet, these entries appears once and then the volume is removed and the pod completes termination

Versions:

Kubernetes 1.10.5
Ubuntu 16.04.4 LTS
uname -a: 4.4.0-1057-aws
docker: 1.13.1

barnsnake351 on Jul 13, 2018

We’re running into the same issue running 1.13.5, would be interested in that hotfix command… regards, strowi

strowi on Apr 10, 2019

We currently have it working on v1.9 to v1.12 using the following parameters on docker 18.06 (latest coreos stable):

ExecStart=/usr/bin/docker run --rm --name %n \
    --net=host --pid=host --privileged \
    -v /:/rootfs:ro \
    -v /etc/kubernetes:/etc/kubernetes:ro \
    -v /etc/kubernetes/ssl/kubelet:/etc/kubernetes/ssl/kubelet \
    -v /etc/cni:/etc/cni \
    -v /opt/cni/bin:/opt/cni/host-bin \
    -v /var/lib/cni:/var/lib/cni \
    --mount type=bind,src="/var/lib/kubelet/",dst="/var/lib/kubelet",bind-propagation=shared \
    -v /var/run:/var/run:rw \
    -v /dev:/dev \
    -v /sys:/sys:ro \
    -v /sys/fs/cgroup:/sys/fs/cgroup:rw \
    -v /var/lib/docker/:/var/lib/docker:rw \
    -v /var/log:/var/log:shared \
    -v /etc/cloud.conf:/etc/cloud.conf:ro \
    -v /etc/ssl/certs/:/etc/ssl/certs/:ro \
    {{ $registry }}{{ $image }} \
...

chrischdi on Mar 30, 2019

We are observing this problem on k8s 1.11.3; similar to @chrischdi the kubelet also runs in a container

alena1108 on Nov 27, 2018