ceph-csi: CephFS PVs cannot unmount when the same volume is mounted twice on the same node.
Describe the bug
When two (or possibly more) ReadWriteMany PVs are used to attach the same CephFS volume or subvolume on the same node, these volumes will be stuck unable to unmount due to:
3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-25 01:12:35.184818115 +0000 UTC m=+773997.882937495 (durationBeforeRetry 2m2s). Error: \"GetDeviceMountRefs check failed for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") on node \\\"k8s-worker-5\\\" : The device mount path \\\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\\\" is still mounted by other references [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount /var/lib/kubelet/pods/89b628b6-b78a-4cb3-bea4-ada4ce6d0ea2/volumes/kubernetes.io~csi/offsite-backup/mount
In my case, here are the two mounts:
10.10.0.1:6789,10.10.0.2:6789,10.10.0.3:6789:/ on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount type ceph (rw,relatime,name=k8s-static,secret=<hidden>,acl,mds_namespace=clusterfs-capacity,_netdev)
10.10.0.1:6789,10.10.0.2:6789,10.10.0.3:6789:/ on /var/lib/kubelet/pods/89b628b6-b78a-4cb3-bea4-ada4ce6d0ea2/volumes/kubernetes.io~csi/cephfs-snapshot/globalmount type ceph (rw,relatime,name=k8s-static,secret=<hidden>,acl,mds_namespace=clusterfs-capacity,_netdev)
This also happens with subvolumes. So when I try to mount /volumes/shared
twice on the same node, the same problem occurs when one of those workloads stops. This is most obvious with cronjobs but replicas will do this too the moment the pod using the PV is stopped.
Environment details
- Image/version of Ceph CSI driver :
quay.io/cephcsi/cephcsi:v3.3.1
- Helm chart version :
rook-ceph:1.6.5
- Kernel version :
Linux k8s-worker-5 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
- Mounter used for mounting PVC (for cephfs its
fuse
orkernel
. for rbd itskrbd
orrbd-nbd
) : kernel - Kubernetes cluster version :
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:15:20Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
- Ceph cluster version :
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)
Steps to reproduce
Steps to reproduce the behavior:
- Create a CephFS Volume
- Create two ReadWriteMany PVs and associated PVCs and workloads such that they both schedule on the same node.
- Delete the workloads. You shouldn’t have to delete the PVCs and PVs but deleting them will not fix the issue.
- Check the mounts on the node and notice that they’re still there. Look in the kubelet log and notice the “The device mount path X is still mounted by other references”.
Actual results
The CephFS mounts on the node remain forever and the log message appears continually as the kubelet tries to unmount each volume repeatedly. Notably, this does not prevent starting these workloads again on the same node as far as I’ve seen. The VolumeAttachment
the PVs were using remain forever as well.
Expected behavior
The CephFS mounts should be properly unmounted from the node when they are no longer needed and the VolumeAttachment
objects should be properly removed. Errors should not continue to appear forever in logs.
Logs
Unfortunately I don’t have the csi logs at the moment but I’ll try to update with those when I have a chance to replicate this. Here are some relevant Kubelet logs:
{"log":"I0624 23:00:00.377256 3568 topology_manager.go:233] [topologymanager] Topology Admit Handler\n","stream":"stderr","time":"2021-06-24T23:00:00.377925868Z"}
{"log":"I0624 23:00:00.454567 3568 kubelet.go:1830] SyncLoop (ADD, \"api\"): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\"\n","stream":"stderr","time":"2021-06-24T23:00:00.454795089Z"}
{"log":"I0624 23:00:00.454724 3568 topology_manager.go:233] [topologymanager] Topology Admit Handler\n","stream":"stderr","time":"2021-06-24T23:00:00.455041084Z"}
{"log":"I0624 23:00:00.454763 3568 reflector.go:207] Starting reflector *v1.Secret (0s) from object-\"rook-ceph\"/\"local\"\n","stream":"stderr","time":"2021-06-24T23:00:00.45506415Z"}
{"log":"I0624 23:00:00.454884 3568 reflector.go:207] Starting reflector *v1.ConfigMap (0s) from object-\"rook-ceph\"/\"ceph-snapshot-schedule\"\n","stream":"stderr","time":"2021-06-24T23:00:00.455257507Z"}
{"log":"I0624 23:00:00.454777 3568 reflector.go:207] Starting reflector *v1.Secret (0s) from object-\"rook-ceph\"/\"default-token-75rjf\"\n","stream":"stderr","time":"2021-06-24T23:00:00.455272909Z"}
{"log":"I0624 23:00:01.078168 3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"default-token-75rjf\" (UniqueName: \"kubernetes.io/secret/18e0a947-1b51-4b6b-81f5-25ce75a276ed-default-token-75rjf\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:01.078440245Z"}
{"log":"I0624 23:00:01.078404 3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:01.078512328Z"}
{"log":"E0624 23:00:01.078569 3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:01.578480045 +0000 UTC m=+766044.276599415 (durationBeforeRetry 500ms). Error: \"Volume has not been added to the list of VolumesInUse in the node's volume status for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") pod \\\"cephfs-snapshot-1624575600-v8ntn\\\" (UID: \\\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\\\") \"\n","stream":"stderr","time":"2021-06-24T23:00:01.078742012Z"}
{"log":"I0624 23:00:01.178864 3568 reconciler.go:269] operationExecutor.MountVolume started for volume \"default-token-75rjf\" (UniqueName: \"kubernetes.io/secret/18e0a947-1b51-4b6b-81f5-25ce75a276ed-default-token-75rjf\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:01.179141074Z"}
{"log":"I0624 23:00:01.181734 3568 operation_generator.go:663] MountVolume.SetUp succeeded for volume \"default-token-75rjf\" (UniqueName: \"kubernetes.io/secret/18e0a947-1b51-4b6b-81f5-25ce75a276ed-default-token-75rjf\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:01.181873756Z"}
{"log":"I0624 23:00:01.581055 3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:01.581311826Z"}
{"log":"E0624 23:00:01.581300 3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:02.581228609 +0000 UTC m=+766045.279347994 (durationBeforeRetry 1s). Error: \"Volume has not been added to the list of VolumesInUse in the node's volume status for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") pod \\\"cephfs-snapshot-1624575600-v8ntn\\\" (UID: \\\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\\\") \"\n","stream":"stderr","time":"2021-06-24T23:00:01.581403135Z"}
{"log":"E0624 23:00:01.643637 3568 dns.go:135] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.1.42.11 10.1.42.12 10.10.0.7\n","stream":"stderr","time":"2021-06-24T23:00:01.643878511Z"}
{"log":"I0624 23:00:02.586441 3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:02.586703322Z"}
{"log":"E0624 23:00:02.586745 3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:04.586676834 +0000 UTC m=+766047.284796230 (durationBeforeRetry 2s). Error: \"Volume has not been added to the list of VolumesInUse in the node's volume status for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") pod \\\"cephfs-snapshot-1624575600-v8ntn\\\" (UID: \\\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\\\") \"\n","stream":"stderr","time":"2021-06-24T23:00:02.586899124Z"}
{"log":"I0624 23:00:03.207012 3568 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF\n","stream":"stderr","time":"2021-06-24T23:00:03.207553539Z"}
{"log":"I0624 23:00:03.590178 3568 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 18896cc9a298dcfe65634509eb6deb59923f0484b5097b115aa6ba43a3052df3\n","stream":"stderr","time":"2021-06-24T23:00:03.59032184Z"}
{"log":"I0624 23:00:04.594904 3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:04.595143553Z"}
{"log":"E0624 23:00:04.595066 3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:08.595004779 +0000 UTC m=+766051.293124152 (durationBeforeRetry 4s). Error: \"Volume has not been added to the list of VolumesInUse in the node's volume status for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") pod \\\"cephfs-snapshot-1624575600-v8ntn\\\" (UID: \\\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\\\") \"\n","stream":"stderr","time":"2021-06-24T23:00:04.595195735Z"}
{"log":"I0624 23:00:05.656736 3568 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 18896cc9a298dcfe65634509eb6deb59923f0484b5097b115aa6ba43a3052df3\n","stream":"stderr","time":"2021-06-24T23:00:05.656820926Z"}
{"log":"I0624 23:00:05.656907 3568 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: cb709eb507aa01e55c9d02395f209322dff81b73d9d54ba2778b246c2f67033e\n","stream":"stderr","time":"2021-06-24T23:00:05.657041529Z"}
{"log":"I0624 23:00:05.801135 3568 reconciler.go:196] operationExecutor.UnmountVolume started for volume \"default-token-8c5pm\" (UniqueName: \"kubernetes.io/secret/e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3-default-token-8c5pm\") pod \"e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3\" (UID: \"e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3\") \n","stream":"stderr","time":"2021-06-24T23:00:05.8013881Z"}
{"log":"I0624 23:00:05.804318 3568 operation_generator.go:788] UnmountVolume.TearDown succeeded for volume \"kubernetes.io/secret/e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3-default-token-8c5pm\" (OuterVolumeSpecName: \"default-token-8c5pm\") pod \"e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3\" (UID: \"e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3\"). InnerVolumeSpecName \"default-token-8c5pm\". PluginName \"kubernetes.io/secret\", VolumeGidValue \"\"\n","stream":"stderr","time":"2021-06-24T23:00:05.804518659Z"}
{"log":"I0624 23:00:05.902020 3568 reconciler.go:319] Volume detached for volume \"default-token-8c5pm\" (UniqueName: \"kubernetes.io/secret/e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3-default-token-8c5pm\") on node \"k8s-worker-5\" DevicePath \"\"\n","stream":"stderr","time":"2021-06-24T23:00:05.902186937Z"}
{"log":"I0624 23:00:06.648903 3568 kuberuntime_container.go:635] Killing container \"docker://cb709eb507aa01e55c9d02395f209322dff81b73d9d54ba2778b246c2f67033e\" with a 30 second grace period\n","stream":"stderr","time":"2021-06-24T23:00:06.649028417Z"}
{"log":"W0624 23:00:06.683357 3568 pod_container_deletor.go:79] Container \"72f62eaeaa73627aa0b59a2bcf063da9e9e59b917f06939ca7508a4b24fab131\" not found in pod's containers\n","stream":"stderr","time":"2021-06-24T23:00:06.683536023Z"}
{"log":"I0624 23:00:08.611624 3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:08.61186974Z"}
{"log":"E0624 23:00:08.611764 3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:16.611705925 +0000 UTC m=+766059.309825395 (durationBeforeRetry 8s). Error: \"Volume has not been added to the list of VolumesInUse in the node's volume status for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") pod \\\"cephfs-snapshot-1624575600-v8ntn\\\" (UID: \\\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\\\") \"\n","stream":"stderr","time":"2021-06-24T23:00:08.61189224Z"}
{"log":"I0624 23:00:15.872682 3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:15.873009605Z"}
{"log":"I0624 23:00:15.872704 3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:15.873049345Z"}
{"log":"I0624 23:00:15.872743 3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.rbd.csi.ceph.com/csi.sock \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:15.873057553Z"}
{"log":"I0624 23:00:15.872753 3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:15.873064906Z"}
{"log":"I0624 23:00:15.872794 3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:15.873069483Z"}
{"log":"I0624 23:00:15.872812 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002328c30, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:15.873073407Z"}
{"log":"I0624 23:00:15.873155 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002328c30, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:15.873318882Z"}
{"log":"I0624 23:00:15.873887 3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:15.874025185Z"}
{"log":"I0624 23:00:15.873907 3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:15.874048718Z"}
{"log":"I0624 23:00:15.873924 3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.rbd.csi.ceph.com/csi.sock \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:15.874056883Z"}
{"log":"I0624 23:00:15.873932 3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:15.874064326Z"}
{"log":"I0624 23:00:15.873955 3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:15.874070659Z"}
{"log":"I0624 23:00:15.873893 3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:15.874081657Z"}
{"log":"I0624 23:00:15.874100 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0023290b0, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:15.874218667Z"}
{"log":"I0624 23:00:15.874274 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0023290b0, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:15.874396242Z"}
{"log":"I0624 23:00:15.882636 3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:15.882778379Z"}
{"log":"I0624 23:00:16.644814 3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:16.645100842Z"}
{"log":"I0624 23:00:16.649652 3568 operation_generator.go:1348] Controller attach succeeded for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") device path: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:16.649792261Z"}
{"log":"I0624 23:00:16.745284 3568 reconciler.go:269] operationExecutor.MountVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:16.745544683Z"}
{"log":"I0624 23:00:16.745626 3568 operation_generator.go:558] MountVolume.WaitForAttach entering for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") DevicePath \"\"\n","stream":"stderr","time":"2021-06-24T23:00:16.745794718Z"}
{"log":"I0624 23:00:16.750017 3568 operation_generator.go:567] MountVolume.WaitForAttach succeeded for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") DevicePath \"csi-225af1b82c928c791922af9189702d68cbd12844b13a32dd8458a635aff88dff\"\n","stream":"stderr","time":"2021-06-24T23:00:16.750220225Z"}
{"log":"I0624 23:00:16.750272 3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:16.750467649Z"}
{"log":"I0624 23:00:16.750303 3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:16.750485603Z"}
{"log":"I0624 23:00:16.750320 3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.cephfs.csi.ceph.com/csi.sock \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.750492187Z"}
{"log":"I0624 23:00:16.750330 3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:16.750513769Z"}
{"log":"I0624 23:00:16.750374 3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:16.750519618Z"}
{"log":"I0624 23:00:16.750412 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002666a40, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.750524834Z"}
{"log":"I0624 23:00:16.750983 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002666a40, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.751150652Z"}
{"log":"I0624 23:00:16.751918 3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:16.752029379Z"}
{"log":"I0624 23:00:16.759246 3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:16.759427933Z"}
{"log":"I0624 23:00:16.759279 3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:16.75944273Z"}
{"log":"I0624 23:00:16.759294 3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.cephfs.csi.ceph.com/csi.sock \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.759446981Z"}
{"log":"I0624 23:00:16.759300 3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:16.759451276Z"}
{"log":"I0624 23:00:16.759356 3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:16.759467142Z"}
{"log":"I0624 23:00:16.759410 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0026670b0, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.759481715Z"}
{"log":"I0624 23:00:16.759773 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0026670b0, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.759883849Z"}
{"log":"I0624 23:00:16.789406 3568 operation_generator.go:596] MountVolume.MountDevice succeeded for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") device mount path \"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\"\n","stream":"stderr","time":"2021-06-24T23:00:16.789641707Z"}
{"log":"I0624 23:00:16.789516 3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:16.789681224Z"}
{"log":"I0624 23:00:16.794440 3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:16.794656349Z"}
{"log":"I0624 23:00:16.794463 3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:16.794680954Z"}
{"log":"I0624 23:00:16.794497 3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.cephfs.csi.ceph.com/csi.sock \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.794685778Z"}
{"log":"I0624 23:00:16.794506 3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:16.794690818Z"}
{"log":"I0624 23:00:16.794576 3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:16.79476636Z"}
{"log":"I0624 23:00:16.794774 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0031e3440, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.794804153Z"}
{"log":"I0624 23:00:16.795164 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0031e3440, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.795427525Z"}
{"log":"I0624 23:00:16.795808 3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:16.795985696Z"}
{"log":"I0624 23:00:16.799595 3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:16.799801855Z"}
{"log":"I0624 23:00:16.799630 3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:16.799829354Z"}
{"log":"I0624 23:00:16.799647 3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.cephfs.csi.ceph.com/csi.sock \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.799833817Z"}
{"log":"I0624 23:00:16.799654 3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:16.799838692Z"}
{"log":"I0624 23:00:16.799720 3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:16.799842903Z"}
{"log":"I0624 23:00:16.799758 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002667630, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.799860207Z"}
{"log":"I0624 23:00:16.800061 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002667630, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.80016665Z"}
{"log":"I0624 23:00:16.809482 3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:16.810761097Z"}
{"log":"I0624 23:00:16.811821 3568 operation_generator.go:663] MountVolume.SetUp succeeded for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:16.812026872Z"}
{"log":"I0624 23:00:17.009534 3568 kuberuntime_manager.go:425] No sandbox for pod \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\" can be found. Need to start a new one\n","stream":"stderr","time":"2021-06-24T23:00:17.009715553Z"}
{"log":"I0624 23:00:17.501848 3568 kubelet.go:1837] SyncLoop (UPDATE, \"api\"): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\"\n","stream":"stderr","time":"2021-06-24T23:00:17.502200708Z"}
{"log":"I0624 23:00:17.941428 3568 kubelet.go:1868] SyncLoop (PLEG): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\", event: \u0026pleg.PodLifecycleEvent{ID:\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\", Type:\"ContainerStarted\", Data:\"ba3366af8e95d11262c370dc1171ca1b92b46aacffad57716073777a9a5c1032\"}\n","stream":"stderr","time":"2021-06-24T23:00:17.941719981Z"}
{"log":"I0624 23:00:17.941497 3568 kubelet.go:1868] SyncLoop (PLEG): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\", event: \u0026pleg.PodLifecycleEvent{ID:\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\", Type:\"ContainerStarted\", Data:\"c1243bc4df3c2ca2fdc6f65dc808c2a54e2ed34bc3960c3bc8da0704045ade04\"}\n","stream":"stderr","time":"2021-06-24T23:00:17.941745252Z"}
{"log":"E0624 23:00:18.643466 3568 dns.go:135] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.1.42.11 10.1.42.12 10.10.0.7\n","stream":"stderr","time":"2021-06-24T23:00:18.643920992Z"}
{"log":"I0624 23:00:23.072398 3568 kubelet.go:1868] SyncLoop (PLEG): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\", event: \u0026pleg.PodLifecycleEvent{ID:\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\", Type:\"ContainerDied\", Data:\"ba3366af8e95d11262c370dc1171ca1b92b46aacffad57716073777a9a5c1032\"}\n","stream":"stderr","time":"2021-06-24T23:00:23.072547469Z"}
{"log":"I0624 23:00:23.072655 3568 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: ba3366af8e95d11262c370dc1171ca1b92b46aacffad57716073777a9a5c1032\n","stream":"stderr","time":"2021-06-24T23:00:23.07273424Z"}
{"log":"I0624 23:00:23.093082 3568 reflector.go:213] Stopping reflector *v1.Secret (0s) from object-\"rook-ceph\"/\"local\"\n","stream":"stderr","time":"2021-06-24T23:00:23.093263612Z"}
{"log":"I0624 23:00:23.093132 3568 reflector.go:213] Stopping reflector *v1.Secret (0s) from object-\"rook-ceph\"/\"default-token-75rjf\"\n","stream":"stderr","time":"2021-06-24T23:00:23.093286641Z"}
{"log":"I0624 23:00:23.093162 3568 reflector.go:213] Stopping reflector *v1.ConfigMap (0s) from object-\"rook-ceph\"/\"ceph-snapshot-schedule\"\n","stream":"stderr","time":"2021-06-24T23:00:23.093291584Z"}
{"log":"I0624 23:00:23.169781 3568 reconciler.go:196] operationExecutor.UnmountVolume started for volume \"default-token-75rjf\" (UniqueName: \"kubernetes.io/secret/18e0a947-1b51-4b6b-81f5-25ce75a276ed-default-token-75rjf\") pod \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:23.170047527Z"}
{"log":"I0624 23:00:23.170028 3568 reconciler.go:196] operationExecutor.UnmountVolume started for volume \"cephfs-volume\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:23.170136534Z"}
{"log":"I0624 23:00:23.170368 3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:23.170507608Z"}
{"log":"I0624 23:00:23.170472 3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:23.170515955Z"}
{"log":"W0624 23:00:23.170545 3568 empty_dir.go:469] Warning: Failed to clear quota on /var/lib/kubelet/pods/18e0a947-1b51-4b6b-81f5-25ce75a276ed/volumes/kubernetes.io~configmap/schedule: ClearQuota called, but quotas disabled\n","stream":"stderr","time":"2021-06-24T23:00:23.170759433Z"}
{"log":"I0624 23:00:23.170690 3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.cephfs.csi.ceph.com/csi.sock \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:23.170848643Z"}
{"log":"I0624 23:00:23.170748 3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:23.17086775Z"}
{"log":"I0624 23:00:23.170829 3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:23.171009697Z"}
{"log":"I0624 23:00:23.170987 3568 kubelet.go:1837] SyncLoop (UPDATE, \"api\"): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\"\n","stream":"stderr","time":"2021-06-24T23:00:23.171080255Z"}
{"log":"I0624 23:00:23.171048 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc00086f080, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:23.171172188Z"}
{"log":"I0624 23:00:23.172231 3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc00086f080, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:23.172374644Z"}
{"log":"I0624 23:00:23.173657 3568 operation_generator.go:788] UnmountVolume.TearDown succeeded for volume \"kubernetes.io/secret/18e0a947-1b51-4b6b-81f5-25ce75a276ed-default-token-75rjf\" (OuterVolumeSpecName: \"default-token-75rjf\") pod \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\"). InnerVolumeSpecName \"default-token-75rjf\". PluginName \"kubernetes.io/secret\", VolumeGidValue \"\"\n","stream":"stderr","time":"2021-06-24T23:00:23.173775218Z"}
{"log":"I0624 23:00:23.177282 3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:23.177436718Z"}
{"log":"I0624 23:00:23.177338 3568 operation_generator.go:788] UnmountVolume.TearDown succeeded for volume \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\" (OuterVolumeSpecName: \"cephfs-volume\") pod \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\"). InnerVolumeSpecName \"cephfs-snapshot\". PluginName \"kubernetes.io/csi\", VolumeGidValue \"\"\n","stream":"stderr","time":"2021-06-24T23:00:23.177463897Z"}
{"log":"I0624 23:00:23.270789 3568 reconciler.go:319] Volume detached for volume \"default-token-75rjf\" (UniqueName: \"kubernetes.io/secret/18e0a947-1b51-4b6b-81f5-25ce75a276ed-default-token-75rjf\") on node \"k8s-worker-5\" DevicePath \"\"\n","stream":"stderr","time":"2021-06-24T23:00:23.270918375Z"}
{"log":"I0624 23:00:23.270847 3568 reconciler.go:312] operationExecutor.UnmountDevice started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") on node \"k8s-worker-5\" \n","stream":"stderr","time":"2021-06-24T23:00:23.271024161Z"}
{"log":"E0624 23:00:23.273372 3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:23.773314049 +0000 UTC m=+766066.471433424 (durationBeforeRetry 500ms). Error: \"GetDeviceMountRefs check failed for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") on node \\\"k8s-worker-5\\\" : The device mount path \\\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\\\" is still mounted by other references [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount /var/lib/kubelet/pods/7a6446c6-23ff-4fab-83ec-82ca8734212e/volumes/kubernetes.io~csi/offsite-backup/mount]\"\n","stream":"stderr","time":"2021-06-24T23:00:23.273554569Z"}
{"log":"I0624 23:00:23.873342 3568 reconciler.go:312] operationExecutor.UnmountDevice started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") on node \"k8s-worker-5\" \n","stream":"stderr","time":"2021-06-24T23:00:23.873669334Z"}
{"log":"E0624 23:00:23.876231 3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:24.876203861 +0000 UTC m=+766067.574323244 (durationBeforeRetry 1s). Error: \"GetDeviceMountRefs check failed for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") on node \\\"k8s-worker-5\\\" : The device mount path \\\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\\\" is still mounted by other references [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount /var/lib/kubelet/pods/7a6446c6-23ff-4fab-83ec-82ca8734212e/volumes/kubernetes.io~csi/offsite-backup/mount]\"\n","stream":"stderr","time":"2021-06-24T23:00:23.876403236Z"}
{"log":"I0624 23:00:24.106389 3568 kubelet.go:1868] SyncLoop (PLEG): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\", event: \u0026pleg.PodLifecycleEvent{ID:\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\", Type:\"ContainerDied\", Data:\"c1243bc4df3c2ca2fdc6f65dc808c2a54e2ed34bc3960c3bc8da0704045ade04\"}\n","stream":"stderr","time":"2021-06-24T23:00:24.106657417Z"}
{"log":"W0624 23:00:24.106499 3568 pod_container_deletor.go:79] Container \"c1243bc4df3c2ca2fdc6f65dc808c2a54e2ed34bc3960c3bc8da0704045ade04\" not found in pod's containers\n","stream":"stderr","time":"2021-06-24T23:00:24.10670478Z"}
{"log":"I0624 23:00:24.877177 3568 reconciler.go:312] operationExecutor.UnmountDevice started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") on node \"k8s-worker-5\" \n","stream":"stderr","time":"2021-06-24T23:00:24.877464859Z"}
{"log":"E0624 23:00:24.879452 3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:26.879426251 +0000 UTC m=+766069.577545621 (durationBeforeRetry 2s). Error: \"GetDeviceMountRefs check failed for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") on node \\\"k8s-worker-5\\\" : The device mount path \\\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\\\" is still mounted by other references [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount /var/lib/kubelet/pods/7a6446c6-23ff-4fab-83ec-82ca8734212e/volumes/kubernetes.io~csi/offsite-backup/mount]\"\n","stream":"stderr","time":"2021-06-24T23:00:24.879652128Z"}
{"log":"I0624 23:00:25.678865 3568 log.go:181] http: superfluous response.WriteHeader call from k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).WriteHeader (httplog.go:217)\n","stream":"stderr","time":"2021-06-24T23:00:25.679154228Z"}
{"log":"I0624 23:00:26.884764 3568 reconciler.go:312] operationExecutor.UnmountDevice started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") on node \"k8s-worker-5\" \n","stream":"stderr","time":"2021-06-24T23:00:26.88499332Z"}
{"log":"E0624 23:00:26.887115 3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:30.887062205 +0000 UTC m=+766073.585181621 (durationBeforeRetry 4s). Error: \"GetDeviceMountRefs check failed for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") on node \\\"k8s-worker-5\\\" : The device mount path \\\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\\\" is still mounted by other references [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount /var/lib/kubelet/pods/7a6446c6-23ff-4fab-83ec-82ca8734212e/volumes/kubernetes.io~csi/offsite-backup/mount]\"\n","stream":"stderr","time":"2021-06-24T23:00:26.887284993Z"}
{"log":"I0624 23:00:27.754074 3568 kubelet.go:1846] SyncLoop (DELETE, \"api\"): \"offsite-backup-t8fwm_local-system(7a6446c6-23ff-4fab-83ec-82ca8734212e)\"\n","stream":"stderr","time":"2021-06-24T23:00:27.754365613Z"}
{"log":"I0624 23:00:27.754397 3568 kuberuntime_container.go:635] Killing container \"docker://80aa928fd4a820b22de51a5b4aeeeb2e26fe116b6d85e992930b7d6cb47f58a9\" with a 30 second grace period\n","stream":"stderr","time":"2021-06-24T23:00:27.754525375Z"}
{"log":"I0624 23:00:30.834883 3568 kubelet.go:1846] SyncLoop (DELETE, \"api\"): \"cephfs-snapshot-1624564800-nr5dc_rook-ceph(fcf13dcb-d099-4112-a816-6ebe50910070)\"\n","stream":"stderr","time":"2021-06-24T23:00:30.835180164Z"}
{"log":"I0624 23:00:30.845859 3568 kubelet.go:1840] SyncLoop (REMOVE, \"api\"): \"cephfs-snapshot-1624564800-nr5dc_rook-ceph(fcf13dcb-d099-4112-a816-6ebe50910070)\"\n","stream":"stderr","time":"2021-06-24T23:00:30.84602745Z"}
{"log":"I0624 23:00:30.845914 3568 kubelet.go:2034] Failed to delete pod \"cephfs-snapshot-1624564800-nr5dc_rook-ceph(fcf13dcb-d099-4112-a816-6ebe50910070)\", err: pod not found\n","stream":"stderr","time":"2021-06-24T23:00:30.846045057Z"}
{"log":"I0624 23:00:30.899760 3568 reconciler.go:312] operationExecutor.UnmountDevice started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") on node \"k8s-worker-5\" \n","stream":"stderr","time":"2021-06-24T23:00:30.899891514Z"}
{"log":"E0624 23:00:30.901883 3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:38.901854069 +0000 UTC m=+766081.599973434 (durationBeforeRetry 8s). Error: \"GetDeviceMountRefs check failed for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") on node \\\"k8s-worker-5\\\" : The device mount path \\\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\\\" is still mounted by other references [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount /var/lib/kubelet/pods/7a6446c6-23ff-4fab-83ec-82ca8734212e/volumes/kubernetes.io~csi/offsite-backup/mount]\"\n","stream":"stderr","time":"2021-06-24T23:00:30.902065359Z"}
Additional Context
The Ceph cluster is deployed and managed by Proxmox and configured in k8s by Rook. Here’s the cluster CRD’s status:
status:
ceph:
capacity:
bytesAvailable: 42977436372992
bytesTotal: 76110897348608
bytesUsed: 33133460975616
lastUpdated: "2021-07-01T15:16:19Z"
health: HEALTH_OK
lastChanged: "2021-07-01T08:47:01Z"
lastChecked: "2021-07-01T15:16:19Z"
previousHealth: HEALTH_WARN
versions:
mds:
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable): 5
mgr:
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable): 5
mon:
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable): 3
osd:
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable): 16
overall:
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable): 29
conditions:
- lastHeartbeatTime: "2021-07-01T07:41:52Z"
lastTransitionTime: "2021-07-01T07:41:52Z"
message: Attempting to connect to an external Ceph cluster
reason: ClusterConnecting
status: "True"
type: Connecting
- lastHeartbeatTime: "2021-07-01T15:16:20Z"
lastTransitionTime: "2021-04-26T03:27:20Z"
message: Cluster connected successfully
reason: ClusterConnected
status: "True"
type: Connected
message: Cluster connected successfully
phase: Connected
state: Connected
version:
image: ceph/ceph:v15.2.13
version: 15.2.13-0
While rook is using an admin secret, I’ve added another secret for lower privilege access for CephFS CSI mounting, here’s one of the volumes from the logs above. The other is nearly identical except for the name, uid, handle, and claimRef:
apiVersion: v1
kind: PersistentVolume
annotations:
pv.kubernetes.io/bound-by-controller: "yes"
metadata:
finalizers:
- kubernetes.io/pv-protection
labels:
app: cephfs-snapshot
name: cephfs-snapshot
resourceVersion: "194940716"
selfLink: /api/v1/persistentvolumes/cephfs-snapshot
uid: 7cc3698d-f4b4-4d05-8259-476bf47ffb22
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 50Ti
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: cephfs-snapshot
namespace: rook-ceph
resourceVersion: "189756705"
uid: 056edc02-c1f5-44fe-b3f7-f416139eb638
csi:
driver: rook-ceph.cephfs.csi.ceph.com
nodeStageSecretRef:
name: rook-csi-cephfs-node-static
namespace: rook-ceph
volumeAttributes:
clusterID: rook-ceph
fsName: clusterfs-capacity
rootPath: /
staticVolume: "true"
volumeHandle: cephfs-snapshot
persistentVolumeReclaimPolicy: Retain
volumeMode: Filesystem
status:
phase: Bound
And here is the scrubbed node secret:
apiVersion: v1
data:
userID: rook-k8s
userKey: foo
kind: Secret
metadata:
creationTimestamp: "2021-06-06T04:49:56Z"
name: rook-csi-cephfs-node-static
namespace: rook-ceph
resourceVersion: "182934314"
selfLink: /api/v1/namespaces/rook-ceph/secrets/rook-csi-cephfs-node-static
uid: 823a7b2f-9ea6-4a99-ad89-dc052ddc4910
type: kubernetes.io/rook
And finally, the cephx permissions:
client.rook-k8s
key: foo
caps: [mds] allow rws
caps: [mon] allow r
caps: [osd] allow rw tag cephfs data=clusterfs-capacity
For now, I’ve worked around this by using pod antiaffinity to ensure the jobs don’t run on the same node but I’ll start running out of nodes eventually.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 32 (2 by maintainers)
Commits related to this issue
- Call MountDevice only once Cann MountDevice only when the volume was not device-mounted before. — committed to kubernetes/kubernetes by jsafrane 4 years ago
@ikogan am planning to work on this one. please provide the below details for now
While I can re-use the same PVC if my workloads are in the same namespace, since a PVC is a namespaced object, I cannot use it across namespaces. I can also only have one PVC bound to a single PV. The only way to have two different namespaces use the same underlying filesystem path is to have two different PVs.
I tested this one with shallow support PR i don’t see any issue. good to get some confirmation on this one.
As the stagingPaths are different as per the CSI standard we need to make sure the volume is mounted to the given staging path. if it was a single stagingPath we don’t have any problem.
Yes opened an issue in kubernetes to check Is this expected or not. but looks like no response
@ikogan Thanks for reverting here!! much appreciated… I wanted to ask you that, is this cluster running in production? because recently I have been revisiting the ‘locking operation’ we do in various code paths and working on some improvements in the same area as mentioned or tracked under #2149.
I have a feeling that, this bug fix could take care of this scenario. so just wanted to ask if it possible to give a try with test image (quay.io/humble/cephcsi:publish) with the fix at your end.
Meanwhile I am taking some AIs here to reproduce it in my end and also RCAing it . Please revert with your thoughts.