kubernetes: Kubelet is not cleaning up pod volumes, leaving statefulset in pending state
Bug report:
/kind bug /kind sig-openstack /kind sig-storage
What happened: I created a statefulset for job runners, each with some mounted volumes for scratch space, and after a few recreates the pods were stuck in pending. Reading over kubelet logs it looks like the pods are not marked as free because volumes are not disappearing.
What you expected to happen: Statefulset up and down should have worked clean, removing whatever was on the kube node, and making the pod eligible to move.
How to reproduce it (as minimally and precisely as possible): Create a statefulset, cycle it a few times. Maybe use Openstack Volumes too.
Anything else we need to know?:
Sample kubelet errors:
E0710 17:05:18.472909 5658 kubelet_volumes.go:110] Orphaned pod "352193f6-5871-11e7-a01f-fa163ee1f782" found, but error <nil> occured during reading volume dir from disk
E0710 17:05:18.473150 5658 kubelet_volumes.go:110] Orphaned pod "6ff3b92c-587b-11e7-8b7e-fa163e7835c1" found, but error <nil> occured during reading volume dir from disk
E0710 17:05:18.473421 5658 kubelet_volumes.go:110] Orphaned pod "82a19d7f-3505-11e7-bb41-fa163e11ae1a" found, but error <nil> occured during reading volume dir from disk
E0710 17:05:20.476006 5658 kubelet_volumes.go:110] Orphaned pod "352193f6-5871-11e7-a01f-fa163ee1f782" found, but error <nil> occured during reading volume dir from disk
E0710 17:05:20.476240 5658 kubelet_volumes.go:110] Orphaned pod "6ff3b92c-587b-11e7-8b7e-fa163e7835c1" found, but error <nil> occured during reading volume dir from disk
E0710 17:05:20.476448 5658 kubelet_volumes.go:110] Orphaned pod "82a19d7f-3505-11e7-bb41-fa163e11ae1a" found, but error <nil> occured during reading volume dir from disk
E0710 17:05:22.483983 5658 kubelet_volumes.go:110] Orphaned pod "352193f6-5871-11e7-a01f-fa163ee1f782" found, but error <nil> occured during reading volume dir from disk
E0710 17:05:22.484162 5658 kubelet_volumes.go:110] Orphaned pod "6ff3b92c-587b-11e7-8b7e-fa163e7835c1" found, but error <nil> occured during reading volume dir from disk
E0710 17:05:22.484264 5658 kubelet_volumes.go:110] Orphaned pod "82a19d7f-3505-11e7-bb41-fa163e11ae1a" found, but error <nil> occured during reading volume dir from disk
E0710 17:05:24.472593 5658 kubelet_volumes.go:110] Orphaned pod "352193f6-5871-11e7-a01f-fa163ee1f782" found, but error <nil> occured during reading volume dir from disk
E0710 17:05:24.472978 5658 kubelet_volumes.go:110] Orphaned pod "6ff3b92c-587b-11e7-8b7e-fa163e7835c1" found, but error <nil> occured during reading volume dir from disk
E0710 17:05:24.473200 5658 kubelet_volumes.go:110] Orphaned pod "82a19d7f-3505-11e7-bb41-fa163e11ae1a" found, but error <nil> occured during reading volume dir from disk
Here’s what’s inside one of the kubelet directories from that error log:
kube-node-03 core # cd /var/lib/kubelet/pods/352193f6-5871-11e7-a01f-fa163ee1f782
kube-node-03 352193f6-5871-11e7-a01f-fa163ee1f782 # ls
containers etc-hosts plugins volumes
kube-node-03 352193f6-5871-11e7-a01f-fa163ee1f782 # ls *
etc-hosts
containers:
job-runner
plugins:
kubernetes.io~empty-dir
volumes:
kubernetes.io~cinder kubernetes.io~secret
Environment:
- Kubernetes version (use
kubectl version
):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T22:51:55Z", GoVersion:"go1.8.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5+", GitVersion:"1.5-cinder_v2_api-os-verbose-b7-2-g41406bdbbc3c2e-dirty", GitCommit:"41406bdbbc3c2e9fd83c7b80825f39ec4008b699", GitTreeState:"dirty", BuildDate:"2017-07-08T00:04:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
I run a custom build with some openstack API fixes.
- Cloud provider or hardware configuration**:
- OpenStack
- Kubespray distribution
- OS (e.g. from /etc/os-release):
kube-node-03 352193f6-5871-11e7-a01f-fa163ee1f782 # cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1298.7.0
VERSION_ID=1298.7.0
BUILD_ID=2017-03-31-0215
PRETTY_NAME="Container Linux by CoreOS 1298.7.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
- Kernel (e.g.
uname -a
):
kube-node-03 352193f6-5871-11e7-a01f-fa163ee1f782 # uname -a
Linux kube-node-03.openstacklocal 4.9.16-coreos-r1 #1 SMP Fri Mar 31 02:07:42 UTC 2017 x86_64 Intel Xeon E312xx (Sandy Bridge) GenuineIntel GNU/Linux
- Install tools:
- Others:
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 21 (12 by maintainers)
@yastij I think there is some confusion. The problem has nothing to do with the state of PVC/backends. The problem is that Pod could not be removed because there was data leftover in a kubernetes directory on the kube-node.
Yes, ultimately resulting in https://github.com/openshift/origin/issues/15252. I tagged some of our people to help track it.
@sttts how serendipitous, the title looks a lot like our rebase failure.