kubernetes: Pods do not get cleaned up
https://github.com/kubernetes/kubernetes/issues/28750 describes the problem for a much older Kubernetes version and is marked as fixed.
Is this a BUG REPORT or FEATURE REQUEST? (choose one): Bug Report
Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:44:38Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1+coreos.0", GitCommit:"9212f77ed8c169a0afa02e58dce87913c6387b3e", GitTreeState:"clean", BuildDate:"2017-04-04T00:32:53Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Environment:
- 3 Physical Linux servers
- CoreOS 1353.7.0 (latest stable)
- 4.9.24-coreos
- Set up from scratch, custom OSPF networking using CNI
- Hyperkube using kubelet-wrapper
What happened: Some terminated pods are not cleaned up (staying in Terminating state) for a long time (maybe indefinitely?) because of issues deleting secret volumes.
Log excerpt:
May 11 20:56:07 yellow kubelet-wrapper[7595]: E0511 20:56:07.540894 7595 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/91eabb5e-336f-11e7-927a-d43d7e00dee7-default-token-86xhs\" (\"91eabb5e-336f-11e7-927a-d43d7e00dee7\")" failed. No retries permitted until 2017-05-11 20:58:07.540871959 +0000 UTC (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/secret/91eabb5e-336f-11e7-927a-d43d7e00dee7-default-token-86xhs" (volume.spec.Name: "default-token-86xhs") pod "91eabb5e-336f-11e7-927a-d43d7e00dee7" (UID: "91eabb5e-336f-11e7-927a-d43d7e00dee7") with: rename /var/lib/kubelet/pods/91eabb5e-336f-11e7-927a-d43d7e00dee7/volumes/kubernetes.io~secret/default-token-86xhs /var/lib/kubelet/pods/91eabb5e-336f-11e7-927a-d43d7e00dee7/volumes/kubernetes.io~secret/wrapped_default-token-86xhs.deleting~739204662: device or resource busy
May 11 20:56:07 yellow kubelet-wrapper[7595]: E0511 20:56:07.540858 7595 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/6d50be26-3371-11e7-a5bf-74d435166e57-default-token-86xhs\" (\"6d50be26-3371-11e7-a5bf-74d435166e57\")" failed. No retries permitted until 2017-05-11 20:58:07.540839041 +0000 UTC (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/secret/6d50be26-3371-11e7-a5bf-74d435166e57-default-token-86xhs" (volume.spec.Name: "default-token-86xhs") pod "6d50be26-3371-11e7-a5bf-74d435166e57" (UID: "6d50be26-3371-11e7-a5bf-74d435166e57") with: rename /var/lib/kubelet/pods/6d50be26-3371-11e7-a5bf-74d435166e57/volumes/kubernetes.io~secret/default-token-86xhs /var/lib/kubelet/pods/6d50be26-3371-11e7-a5bf-74d435166e57/volumes/kubernetes.io~secret/wrapped_default-token-86xhs.deleting~556013427: device or resource busy
May 11 20:56:07 yellow kubelet-wrapper[7595]: I0511 20:56:07.540561 7595 reconciler.go:190] UnmountVolume operation started for volume "kubernetes.io/secret/91eabb5e-336f-11e7-927a-d43d7e00dee7-default-token-86xhs" (spec.Name: "default-token-86xhs") from pod "91eabb5e-336f-11e7-927a-d43d7e00dee7" (UID: "91eabb5e-336f-11e7-927a-d43d7e00dee7").
May 11 20:56:07 yellow kubelet-wrapper[7595]: I0511 20:56:07.540462 7595 reconciler.go:190] UnmountVolume operation started for volume "kubernetes.io/secret/6d50be26-3371-11e7-a5bf-74d435166e57-default-token-86xhs" (spec.Name: "default-token-86xhs") from pod "6d50be26-3371-11e7-a5bf-74d435166e57" (UID: "6d50be26-3371-11e7-a5bf-74d435166e57").
Excerpt from mount
:
tmpfs on /var/lib/kubelet/pods/91eabb5e-336f-11e7-927a-d43d7e00dee7/volumes/kubernetes.io~secret/default-token-86xhs type tmpfs (rw,relatime,seclabel)
Output from fuser -vm
:
USER PID ACCESS COMMAND
/var/lib/kubelet/pods/91eabb5e-336f-11e7-927a-d43d7e00dee7/volumes/kubernetes.io~secret/default-token-86xhs:
root kernel mount /var/lib/kubelet/pods/91eabb5e-336f-11e7-927a-d43d7e00dee7/volumes/kubernetes.io~secret/default-token-86xhs
The reason why they’re not being cleaned up is because the volume is not being unmounted before being moved to the deletion area and you can’t move a directory which is a mountpoint with a rename() syscall, which is what Go does internally when you call os.Rename.
What you expected to happen: The Pods should be cleaned up.
How to reproduce it (as minimally and precisely as possible): Happens on all three machines, so probably CoreOS + kubelet-wrapper should work.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 4
- Comments: 16 (9 by maintainers)
@lorenz I had the same issue, but it’s now resolved. Maybe you could try “grep -l container_id /proc/*/mountinfo” to check who’s preventing your pod from terminating.
@rambo45 Seeing exactly the same. Luckily CoreOS still ships 1.12 if you enable it, so I’m currently running that option everywhere. I have a lot of container churn, so staying on 17.09 (what CoreOS ships by default) was not an option, within 24h I accumulated a few hundred pods stuck in terminating. Still awaiting a proper cri-containerd so that I can get rid of Docker for good.