kubernetes: Orphaned pods fail to get cleaned up

Kubernetes version

Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.6", GitCommit:"e569a27d02001e343cb68086bc06d47804f62af6", GitTreeState:"clean", BuildDate:"2016-11-12T05:16:27Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Ubuntu 16.04.1 LTS
  • Kernel (e.g. uname -a): Linux 4.4.0-53-generic #74-Ubuntu SMP Fri Dec 2 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

What happened: syslogs are getting spammed every 2 seconds with these kubelet errors:

Dec  9 13:14:02 ip-10-50-242-179 start-kubelet.sh[31129]: E1209 13:14:02.300355   31129 kubelet_volumes.go:159] Orphaned pod "ff614192-bcc4-11e6-a20e-0a591a8e83d7" found, but error open /var/lib/kubelet/pods/ff614192-bcc4-11e6-a20e-0a591a8e83d7/volumes: no such file or directory occured during reading volume dir from disk
Dec  9 13:14:02 ip-10-50-242-179 start-kubelet.sh[31129]: E1209 13:14:02.300373   31129 kubelet_getters.go:249] Could not read directory /var/lib/kubelet/pods/ff769116-bcf4-11e6-a20e-0a591a8e83d7/volumes: open /var/lib/kubelet/pods/ff769116-bcf4-11e6-a20e-0a591a8e83d7/volumes: no such file or directory

We get the above 2 log entries for all the non-running pods (2150) every 2 seconds. So our logs get into the Gb pretty quickly

There are 2160 pods in /var/lib/kubelet/pods/

~# ls /var/lib/kubelet/pods/ | wc -l
2160

But only 10 are running and attached to volumes

~# df -h | grep kubelet
/dev/xvdf                  256G  232M  256G   1% /var/lib/kubelet
tmpfs                      7.4G  8.0K  7.4G   1% /var/lib/kubelet/pods/5b884f1f-bbcd-11e6-a20e-0a591a8e83d7/volumes/kubernetes.io~secret/secrets
tmpfs                      7.4G   12K  7.4G   1% /var/lib/kubelet/pods/5b884f1f-bbcd-11e6-a20e-0a591a8e83d7/volumes/kubernetes.io~secret/default-token-lfu24
tmpfs                      7.4G   12K  7.4G   1% /var/lib/kubelet/pods/15302286-bbaa-11e6-a20e-0a591a8e83d7/volumes/kubernetes.io~secret/default-token-m0h9s
tmpfs                      7.4G   12K  7.4G   1% /var/lib/kubelet/pods/b0395433-a546-11e6-9670-0a591a8e83d7/volumes/kubernetes.io~secret/default-token-n79fe
tmpfs                      7.4G   12K  7.4G   1% /var/lib/kubelet/pods/1198c11a-bd25-11e6-a20e-0a591a8e83d7/volumes/kubernetes.io~secret/default-token-np531
tmpfs                      7.4G   12K  7.4G   1% /var/lib/kubelet/pods/473d7d51-bd25-11e6-a20e-0a591a8e83d7/volumes/kubernetes.io~secret/default-token-smuz3
tmpfs                      7.4G   12K  7.4G   1% /var/lib/kubelet/pods/e17b1a95-bd36-11e6-a20e-0a591a8e83d7/volumes/kubernetes.io~secret/default-token-1xs9g
tmpfs                      7.4G   12K  7.4G   1% /var/lib/kubelet/pods/2a36441b-bd57-11e6-a20e-0a591a8e83d7/volumes/kubernetes.io~secret/default-token-qbw68
tmpfs                      7.4G   12K  7.4G   1% /var/lib/kubelet/pods/cf6c04f4-bd64-11e6-a20e-0a591a8e83d7/volumes/kubernetes.io~secret/default-token-n79fe
tmpfs                      7.4G  8.0K  7.4G   1% /var/lib/kubelet/pods/24130c15-bdf5-11e6-98c0-0615e1fbbfc7/volumes/kubernetes.io~secret/secrets
tmpfs                      7.4G   12K  7.4G   1% /var/lib/kubelet/pods/24130c15-bdf5-11e6-98c0-0615e1fbbfc7/volumes/kubernetes.io~secret/default-token-9ksrm
tmpfs                      7.4G   12K  7.4G   1% /var/lib/kubelet/pods/a271290c-bdf6-11e6-98c0-0615e1fbbfc7/volumes/kubernetes.io~secret/default-token-n79fe

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 2
  • Comments: 59 (48 by maintainers)

Commits related to this issue

Most upvoted comments

TL;DR: It seeoms to be a problem when running kubelet in rkt fly on CoreOS. I opened an issue at CoreOS (https://github.com/coreos/bugs/issues/1831).

Currently happens on the system:

Feb 25 14:22:54 kubi-kube-worker-ycnoukyptmzw-0-xuflmcdypg5d1.novalocal kubelet-wrapper[12194]: E0225 14:22:54.170472   12194 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/d2ddd075-f2b2-11e6-808d-fa163eac0cd0-heapster-token-f1g9p\" (\"d2ddd075-f2b2-11e6-808d-fa163eac0cd0\")" failed. No retries permitted until 2017-02-25 14:24:54.170358703 +0000 UTC (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/secret/d2ddd075-f2b2-11e6-808d-fa163eac0cd0-heapster-token-f1g9p" (volume.spec.Name: "heapster-token-f1g9p") pod "d2ddd075-f2b2-11e6-808d-fa163eac0cd0" (UID: "d2ddd075-f2b2-11e6-808d-fa163eac0cd0") with: rename /var/lib/kubelet/pods/d2ddd075-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/heapster-token-f1g9p /var/lib/kubelet/pods/d2ddd075-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/wrapped_heapster-token-f1g9p.deleting~394175716: device or resource busy
Feb 25 14:22:54 kubi-kube-worker-ycnoukyptmzw-0-xuflmcdypg5d1.novalocal kubelet-wrapper[12194]: E0225 14:22:54.170592   12194 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/b539bd4a-f2b2-11e6-808d-fa163eac0cd0-default-token-q6jpp\" (\"b539bd4a-f2b2-11e6-808d-fa163eac0cd0\")" failed. No retries permitted until 2017-02-25 14:24:54.170569069 +0000 UTC (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/secret/b539bd4a-f2b2-11e6-808d-fa163eac0cd0-default-token-q6jpp" (volume.spec.Name: "default-token-q6jpp") pod "b539bd4a-f2b2-11e6-808d-fa163eac0cd0" (UID: "b539bd4a-f2b2-11e6-808d-fa163eac0cd0") with: rename /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/wrapped_default-token-q6jpp.deleting~172107507: device or resource busy

But it can’t be moved, since it is still mounted. So like you mentioned, kubelet does not consider the volume to be a tmpfs.

# lsof -n | grep "token-" || echo "Nothing"
Nothing
# mount | grep "token-"
tmpfs on /var/lib/rkt/pods/run/20bc48e1-cf9a-4dae-9c33-c89dd4e2cfc3/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/rkt/pods/run/20bc48e1-cf9a-4dae-9c33-c89dd4e2cfc3/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/pods/d2ddd075-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/heapster-token-f1g9p type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/kubelet/pods/d2ddd075-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/heapster-token-f1g9p type tmpfs (rw,relatime,seclabel)

Hmm. There was a crashed kubelet (out of space on this node)

# rkt list
UUID            APP             IMAGE NAME                                      STATE   CREATED         STARTED         NETWORKS
20bc48e1        hyperkube       quay.io/coreos/hyperkube:v1.5.2_coreos.2        exited  1 day ago       1 day ago
73a545fc        flannel         quay.io/coreos/flannel:v0.6.2                   running 1 day ago       1 day ago
e67f6189        hyperkube       quay.io/coreos/hyperkube:v1.5.2_coreos.2        running 25 minutes ago  25 minutes ago
kubi-kube-worker-ycnoukyptmzw-0-xuflmcdypg5d1 containers # rkt rm 20bc48e1
"20bc48e1-cf9a-4dae-9c33-c89dd4e2cfc3"
# mount | grep token-
tmpfs on /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/kubelet/pods/d2ddd075-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/heapster-token-f1g9p type tmpfs (rw,relatime,seclabel)

Some mounts are gone.

Turn on more logging

 14:47:16.475315   13150 empty_dir_linux.go:38] Determining mount medium of /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp
 14:47:16.476148   13150 empty_dir_linux.go:48] Statfs_t of /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp: {Type:61267 Bsize:4096 Blocks:4474386 Bfree:3517504 Bavail:3312627 Files:4625792 Ffree:4471664 Fsid:{X__val:[-2141875238 -1373838413]}

Okey, let’s drill this one down, according to https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/empty_dir/empty_dir_linux.go#L37, just to be sure, this works like expected.

package main

import (
        "flag"
        "fmt"
        "os"
        "syscall"
)

const linuxTmpfsMagic = 0x01021994

func main() {

        path := ""
        flag.StringVar(&path, "path", "", "Path of file")
        flag.Parse()
        if path == "" {
                fmt.Println("Provide a path")
                os.Exit(1)
        }

        buf := syscall.Statfs_t{}
        if err := syscall.Statfs(path, &buf); err != nil {
                fmt.Printf("statfs(%q): %v\n", path, err)
                os.Exit(1)
        }

        fmt.Printf("Statfs_t of %q: %+v\n", path, buf)
        if buf.Type == linuxTmpfsMagic {
                fmt.Printf("%q is a tmpfs\n", path)
        } else {
                fmt.Printf("%q NOT a tmpfs\n", path)
        }
}

This works:

# ./statfs -path /dev
Statfs_t of "/dev": {Type:16914836 Bsize:4096 Blocks:502378 Bfree:502378 Bavail:502378 Files:502378 Ffree:502048 Fsid:{X__val:[0 0]} Namelen:255 Frsize:4096 Flags:34 Spare:[0 0 0 0]}
"/dev" is a tmpfs

Trying this on the affected node, with the real file

# ./statfs -path /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp
Statfs_t of "/var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp": {Type:16914836 Bsize:4096 Blocks:506397 Bfree:506394 Bavail:506394 Files:506397 Ffree:506388 Fsid:{X__val:[0 0]} Namelen:255 Frsize:4096 Flags:4128 Spare:[0 0 0 0]}
"/var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp" is a tmpfs

Surprise, this is a tmpfs like expected. So what else could this be? I noticed that Type:61267 shows us, that we are a ext4 mount point. So likely kubelets hits /.

For sure, kubelet is running as a rkt fly container.

# chroot /proc/$(pgrep kubelet)/root
# /run/statfs -path /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp
Statfs_t of "/var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp": {Type:61267 Bsize:4096 Blocks:4474386 Bfree:3510932 Bavail:3306055 Files:4625792 Ffree:4471667 Fsid:{X__val:[-2141875238 -1373838413]} Namelen:255 Frsize:4096 Flags:4128 Spare:[0 0 0 0]}
"/var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp" NOT a tmpfs
# mount | grep "token-" || echo "Nothing"
Nothing

Well, this would have been discoverable without the lines of code. But anyway.

It is quite simple to reproduce this behavior. Just start a Pod with a Secret. Stop kubelet on the node, I left it off until the api server recognised. Started it again (with help kubelet-wrapper, of course) and waited until the api server showed this node as Ready. I made sure, the pod was still running on this node. After this, I just deleted it with kubectl. Voila, one more Orphand Pod with the same symptoms.

I’m trying to gather the logs now, but fyi after logging into the machine i saw this:

Failed Units: 42
  var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-us\x2dwest\x2d2a-vol\x2d312e2db9.mount
  var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-us\x2dwest\x2d2a-vol\x2daa2e2d22.mount
  var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-us\x2dwest\x2d2a-vol\x2db82e2d30.mount
  var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-us\x2dwest\x2d2a-vol\x2dc12e2d49.mount
  var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-us\x2dwest\x2d2a-vol\x2dcf2e2d47.mount
  var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-us\x2dwest\x2d2a-vol\x2dda2e2d52.mount
  var-lib-kubelet-pods-0137d42f\x2dd393\x2d11e6\x2dae49\x2d02575f5c41f7-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2de5aaae97\x2dd38d\x2d11e6\x2dae49\x2d02575f5c41f7.mount
  var-lib-kubelet-pods-024ab4d9\x2dd373\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-07b65c30\x2dd399\x2d11e6\x2dae49\x2d02575f5c41f7-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2de59d2808\x2dd38d\x2d11e6\x2dae49\x2d02575f5c41f7.mount
  var-lib-kubelet-pods-11bc149c\x2dd357\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-1832020d\x2dd394\x2d11e6\x2dae49\x2d02575f5c41f7-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d614fdad9\x2dd38c\x2d11e6\x2dae49\x2d02575f5c41f7.mount
  var-lib-kubelet-pods-212d6a28\x2dd33b\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-23ecb8ce\x2dd38e\x2d11e6\x2dae49\x2d02575f5c41f7-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2de59d2808\x2dd38d\x2d11e6\x2dae49\x2d02575f5c41f7.mount
  var-lib-kubelet-pods-2f509e87\x2dd37e\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-3ec1f209\x2dd362\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-4e333c7f\x2dd346\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-5c5690d9\x2dd389\x2d11e6\x2dbcc3\x2d0288d7377d03-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-5da48138\x2dd32a\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-6bfd9cac\x2dd36d\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-746099a1\x2dd399\x2d11e6\x2dae49\x2d02575f5c41f7-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d35417245\x2dd399\x2d11e6\x2dae49\x2d02575f5c41f7.mount
  var-lib-kubelet-pods-7b391804\x2dd351\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-7c93f28c\x2dd390\x2d11e6\x2dae49\x2d02575f5c41f7-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d7c926248\x2dd390\x2d11e6\x2dae49\x2d02575f5c41f7.mount
  var-lib-kubelet-pods-895e126f\x2dd394\x2d11e6\x2dbcc3\x2d0288d7377d03-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-8aaa786c\x2dd335\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-912dcd7a\x2dd390\x2d11e6\x2dae49\x2d02575f5c41f7-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2de59d2808\x2dd38d\x2d11e6\x2dae49\x2d02575f5c41f7.mount
  var-lib-kubelet-pods-98cdb61d\x2dd378\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-9a1c07b4\x2dd319\x2d11e6\x2d98cc\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-a83f0b24\x2dd35c\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-b7b0558c\x2dd340\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-c4a706ab\x2dd396\x2d11e6\x2dae49\x2d02575f5c41f7-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2de59d2808\x2dd38d\x2d11e6\x2dae49\x2d02575f5c41f7.mount
  var-lib-kubelet-pods-c5d3884d\x2dd383\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-c721737a\x2dd324\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-d544dd02\x2dd367\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-e4b629d7\x2dd34b\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-f2d9ca2a\x2dd38e\x2d11e6\x2dbcc3\x2d0288d7377d03-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-kubelet-pods-f4277caf\x2dd32f\x2d11e6\x2da64e\x2d02c67b45e563-volumes-kubernetes.io\x7eaws\x2debs-pvc\x2d6937252b\x2dd315\x2d11e6\x2d98cc\x2d02c67b45e563.mount
  var-lib-rkt-pods-run-772ff08e\x2d3567\x2d4cab\x2db9d0\x2d6f599245431f-stage1-rootfs-opt-stage2-hyperkube-rootfs-var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-us\x2dwest\x2d2a-vol\x2d312e2db9.mount
  var-lib-rkt-pods-run-772ff08e\x2d3567\x2d4cab\x2db9d0\x2d6f599245431f-stage1-rootfs-opt-stage2-hyperkube-rootfs-var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-us\x2dwest\x2d2a-vol\x2daa2e2d22.mount
  var-lib-rkt-pods-run-772ff08e\x2d3567\x2d4cab\x2db9d0\x2d6f599245431f-stage1-rootfs-opt-stage2-hyperkube-rootfs-var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-us\x2dwest\x2d2a-vol\x2db82e2d30.mount
  var-lib-rkt-pods-run-772ff08e\x2d3567\x2d4cab\x2db9d0\x2d6f599245431f-stage1-rootfs-opt-stage2-hyperkube-rootfs-var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-us\x2dwest\x2d2a-vol\x2dc12e2d49.mount
  var-lib-rkt-pods-run-772ff08e\x2d3567\x2d4cab\x2db9d0\x2d6f599245431f-stage1-rootfs-opt-stage2-hyperkube-rootfs-var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-us\x2dwest\x2d2a-vol\x2dcf2e2d47.mount
  var-lib-rkt-pods-run-772ff08e\x2d3567\x2d4cab\x2db9d0\x2d6f599245431f-stage1-rootfs-opt-stage2-hyperkube-rootfs-var-lib-kubelet-plugins-kubernetes.io-aws\x2debs-mounts-aws-us\x2dwest\x2d2a-vol\x2dda2e2d52.mount

@jingxu97 we consider logs a bit sensitive, can I pass them to you on the kubernetes slack server?

In our case, we have a new cluster (no upgrade) using v1.5.1. and still see a lot of these errors:

Jan 05 04:41:15 ip-x-x-x-x kubelet-wrapper[2331]: E0105 04:41:15.266077    2331 kubelet_volumes.go:110] Orphaned pod "01cb497c-d221-11e6-98cc-02c67b45e563" found, but error <nil> occured during reading volume dir from disk
Jan 05 04:41:15 ip-x-x-x-x kubelet-wrapper[2331]: E0105 04:41:15.266166    2331 kubelet_volumes.go:110] Orphaned pod "022b28b0-d2d2-11e6-9038-0288d7377d03" found, but error <nil> occured during reading volume dir from disk

I’m able to reproduce it as follows:

  • create a pvc
  • create a pod that refers to the pvc
  • delete the pod

We are running kubelet as rkt(v1.20.0) container on CoreOS:

Environment="RKT_RUN_ARGS=--uuid-file-save=/var/run/kubelet-pod.uuid \
  --volume dns,kind=host,source=/etc/resolv.conf \
  --mount volume=dns,target=/etc/resolv.conf \
  --volume rkt,kind=host,source=/opt/bin/host-rkt \
  --mount volume=rkt,target=/usr/bin/rkt \
  --volume var-lib-rkt,kind=host,source=/var/lib/rkt \
  --mount volume=var-lib-rkt,target=/var/lib/rkt \
  --volume stage,kind=host,source=/tmp \
  --mount volume=stage,target=/tmp \
  --volume var-log,kind=host,source=/var/log \
  --mount volume=var-log,target=/var/log"
ExecStartPre=/usr/bin/mkdir -p /var/log/containers
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
  --rkt-path=/usr/bin/rkt \
  --rkt-stage1-image=coreos.com/rkt/stage1-coreos \
  --kubeconfig=/etc/kubernetes/kubeconfig \
  --require-kubeconfig \
  --register-node=true \
  --allow-privileged=true \
  --config=/etc/kubernetes/manifests \
  --cluster-dns=25.0.0.10 \
  --cluster-domain=cluster.local \
  --cloud-provider=aws \
  --cadvisor-port=4194 \
  --image-gc-high-threshold=80 \
  --image-gc-low-threshold=70 \
  --kube-reserved=cpu=200m,memory=500Mi \
  --system-reserved=cpu=150m,memory=250Mi
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/run/kubelet-pod.uuid
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target```

I experience the same issue when a k8s node is rebooted. After restart, kubelet is unable to clean the crashed docker containers. K8s version is 1.4.

As a quick-and-dirty workaround, container cleanup will continue if the “volumes” directory is created manually: tail -n 100 /var/log/kubernetes/kubelet.log | grep "found, but error open /var/lib/kubelet/pods" | perl -pe 's#^.*(/var/lib/kubelet/pods/.+/volumes).+$#$1#'|sort -u|xargs mkdir