ceph-csi: CephFS PVs cannot unmount when the same volume is mounted twice on the same node.

Describe the bug

When two (or possibly more) ReadWriteMany PVs are used to attach the same CephFS volume or subvolume on the same node, these volumes will be stuck unable to unmount due to:

3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-25 01:12:35.184818115 +0000 UTC m=+773997.882937495 (durationBeforeRetry 2m2s). Error: \"GetDeviceMountRefs check failed for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") on node \\\"k8s-worker-5\\\" : The device mount path \\\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\\\" is still mounted by other references [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount /var/lib/kubelet/pods/89b628b6-b78a-4cb3-bea4-ada4ce6d0ea2/volumes/kubernetes.io~csi/offsite-backup/mount

In my case, here are the two mounts:

10.10.0.1:6789,10.10.0.2:6789,10.10.0.3:6789:/ on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount type ceph (rw,relatime,name=k8s-static,secret=<hidden>,acl,mds_namespace=clusterfs-capacity,_netdev)
10.10.0.1:6789,10.10.0.2:6789,10.10.0.3:6789:/ on /var/lib/kubelet/pods/89b628b6-b78a-4cb3-bea4-ada4ce6d0ea2/volumes/kubernetes.io~csi/cephfs-snapshot/globalmount type ceph (rw,relatime,name=k8s-static,secret=<hidden>,acl,mds_namespace=clusterfs-capacity,_netdev)

This also happens with subvolumes. So when I try to mount /volumes/shared twice on the same node, the same problem occurs when one of those workloads stops. This is most obvious with cronjobs but replicas will do this too the moment the pod using the PV is stopped.

Environment details

  • Image/version of Ceph CSI driver : quay.io/cephcsi/cephcsi:v3.3.1
  • Helm chart version : rook-ceph:1.6.5
  • Kernel version : Linux k8s-worker-5 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Mounter used for mounting PVC (for cephfs its fuse or kernel. for rbd its krbd or rbd-nbd) : kernel
  • Kubernetes cluster version : Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:15:20Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
  • Ceph cluster version : ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)

Steps to reproduce

Steps to reproduce the behavior:

  1. Create a CephFS Volume
  2. Create two ReadWriteMany PVs and associated PVCs and workloads such that they both schedule on the same node.
  3. Delete the workloads. You shouldn’t have to delete the PVCs and PVs but deleting them will not fix the issue.
  4. Check the mounts on the node and notice that they’re still there. Look in the kubelet log and notice the “The device mount path X is still mounted by other references”.

Actual results

The CephFS mounts on the node remain forever and the log message appears continually as the kubelet tries to unmount each volume repeatedly. Notably, this does not prevent starting these workloads again on the same node as far as I’ve seen. The VolumeAttachment the PVs were using remain forever as well.

Expected behavior

The CephFS mounts should be properly unmounted from the node when they are no longer needed and the VolumeAttachment objects should be properly removed. Errors should not continue to appear forever in logs.

Logs

Unfortunately I don’t have the csi logs at the moment but I’ll try to update with those when I have a chance to replicate this. Here are some relevant Kubelet logs:

{"log":"I0624 23:00:00.377256    3568 topology_manager.go:233] [topologymanager] Topology Admit Handler\n","stream":"stderr","time":"2021-06-24T23:00:00.377925868Z"}
{"log":"I0624 23:00:00.454567    3568 kubelet.go:1830] SyncLoop (ADD, \"api\"): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\"\n","stream":"stderr","time":"2021-06-24T23:00:00.454795089Z"}
{"log":"I0624 23:00:00.454724    3568 topology_manager.go:233] [topologymanager] Topology Admit Handler\n","stream":"stderr","time":"2021-06-24T23:00:00.455041084Z"}
{"log":"I0624 23:00:00.454763    3568 reflector.go:207] Starting reflector *v1.Secret (0s) from object-\"rook-ceph\"/\"local\"\n","stream":"stderr","time":"2021-06-24T23:00:00.45506415Z"}
{"log":"I0624 23:00:00.454884    3568 reflector.go:207] Starting reflector *v1.ConfigMap (0s) from object-\"rook-ceph\"/\"ceph-snapshot-schedule\"\n","stream":"stderr","time":"2021-06-24T23:00:00.455257507Z"}
{"log":"I0624 23:00:00.454777    3568 reflector.go:207] Starting reflector *v1.Secret (0s) from object-\"rook-ceph\"/\"default-token-75rjf\"\n","stream":"stderr","time":"2021-06-24T23:00:00.455272909Z"}
{"log":"I0624 23:00:01.078168    3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"default-token-75rjf\" (UniqueName: \"kubernetes.io/secret/18e0a947-1b51-4b6b-81f5-25ce75a276ed-default-token-75rjf\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:01.078440245Z"}
{"log":"I0624 23:00:01.078404    3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:01.078512328Z"}
{"log":"E0624 23:00:01.078569    3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:01.578480045 +0000 UTC m=+766044.276599415 (durationBeforeRetry 500ms). Error: \"Volume has not been added to the list of VolumesInUse in the node's volume status for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") pod \\\"cephfs-snapshot-1624575600-v8ntn\\\" (UID: \\\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\\\") \"\n","stream":"stderr","time":"2021-06-24T23:00:01.078742012Z"}
{"log":"I0624 23:00:01.178864    3568 reconciler.go:269] operationExecutor.MountVolume started for volume \"default-token-75rjf\" (UniqueName: \"kubernetes.io/secret/18e0a947-1b51-4b6b-81f5-25ce75a276ed-default-token-75rjf\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:01.179141074Z"}
{"log":"I0624 23:00:01.181734    3568 operation_generator.go:663] MountVolume.SetUp succeeded for volume \"default-token-75rjf\" (UniqueName: \"kubernetes.io/secret/18e0a947-1b51-4b6b-81f5-25ce75a276ed-default-token-75rjf\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:01.181873756Z"}
{"log":"I0624 23:00:01.581055    3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:01.581311826Z"}
{"log":"E0624 23:00:01.581300    3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:02.581228609 +0000 UTC m=+766045.279347994 (durationBeforeRetry 1s). Error: \"Volume has not been added to the list of VolumesInUse in the node's volume status for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") pod \\\"cephfs-snapshot-1624575600-v8ntn\\\" (UID: \\\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\\\") \"\n","stream":"stderr","time":"2021-06-24T23:00:01.581403135Z"}
{"log":"E0624 23:00:01.643637    3568 dns.go:135] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.1.42.11 10.1.42.12 10.10.0.7\n","stream":"stderr","time":"2021-06-24T23:00:01.643878511Z"}
{"log":"I0624 23:00:02.586441    3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:02.586703322Z"}
{"log":"E0624 23:00:02.586745    3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:04.586676834 +0000 UTC m=+766047.284796230 (durationBeforeRetry 2s). Error: \"Volume has not been added to the list of VolumesInUse in the node's volume status for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") pod \\\"cephfs-snapshot-1624575600-v8ntn\\\" (UID: \\\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\\\") \"\n","stream":"stderr","time":"2021-06-24T23:00:02.586899124Z"}
{"log":"I0624 23:00:03.207012    3568 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF\n","stream":"stderr","time":"2021-06-24T23:00:03.207553539Z"}
{"log":"I0624 23:00:03.590178    3568 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 18896cc9a298dcfe65634509eb6deb59923f0484b5097b115aa6ba43a3052df3\n","stream":"stderr","time":"2021-06-24T23:00:03.59032184Z"}
{"log":"I0624 23:00:04.594904    3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:04.595143553Z"}
{"log":"E0624 23:00:04.595066    3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:08.595004779 +0000 UTC m=+766051.293124152 (durationBeforeRetry 4s). Error: \"Volume has not been added to the list of VolumesInUse in the node's volume status for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") pod \\\"cephfs-snapshot-1624575600-v8ntn\\\" (UID: \\\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\\\") \"\n","stream":"stderr","time":"2021-06-24T23:00:04.595195735Z"}
{"log":"I0624 23:00:05.656736    3568 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 18896cc9a298dcfe65634509eb6deb59923f0484b5097b115aa6ba43a3052df3\n","stream":"stderr","time":"2021-06-24T23:00:05.656820926Z"}
{"log":"I0624 23:00:05.656907    3568 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: cb709eb507aa01e55c9d02395f209322dff81b73d9d54ba2778b246c2f67033e\n","stream":"stderr","time":"2021-06-24T23:00:05.657041529Z"}
{"log":"I0624 23:00:05.801135    3568 reconciler.go:196] operationExecutor.UnmountVolume started for volume \"default-token-8c5pm\" (UniqueName: \"kubernetes.io/secret/e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3-default-token-8c5pm\") pod \"e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3\" (UID: \"e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3\") \n","stream":"stderr","time":"2021-06-24T23:00:05.8013881Z"}
{"log":"I0624 23:00:05.804318    3568 operation_generator.go:788] UnmountVolume.TearDown succeeded for volume \"kubernetes.io/secret/e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3-default-token-8c5pm\" (OuterVolumeSpecName: \"default-token-8c5pm\") pod \"e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3\" (UID: \"e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3\"). InnerVolumeSpecName \"default-token-8c5pm\". PluginName \"kubernetes.io/secret\", VolumeGidValue \"\"\n","stream":"stderr","time":"2021-06-24T23:00:05.804518659Z"}
{"log":"I0624 23:00:05.902020    3568 reconciler.go:319] Volume detached for volume \"default-token-8c5pm\" (UniqueName: \"kubernetes.io/secret/e8b91d55-3f78-4cbc-85ff-be24eb5ed6a3-default-token-8c5pm\") on node \"k8s-worker-5\" DevicePath \"\"\n","stream":"stderr","time":"2021-06-24T23:00:05.902186937Z"}
{"log":"I0624 23:00:06.648903    3568 kuberuntime_container.go:635] Killing container \"docker://cb709eb507aa01e55c9d02395f209322dff81b73d9d54ba2778b246c2f67033e\" with a 30 second grace period\n","stream":"stderr","time":"2021-06-24T23:00:06.649028417Z"}
{"log":"W0624 23:00:06.683357    3568 pod_container_deletor.go:79] Container \"72f62eaeaa73627aa0b59a2bcf063da9e9e59b917f06939ca7508a4b24fab131\" not found in pod's containers\n","stream":"stderr","time":"2021-06-24T23:00:06.683536023Z"}
{"log":"I0624 23:00:08.611624    3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:08.61186974Z"}
{"log":"E0624 23:00:08.611764    3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:16.611705925 +0000 UTC m=+766059.309825395 (durationBeforeRetry 8s). Error: \"Volume has not been added to the list of VolumesInUse in the node's volume status for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") pod \\\"cephfs-snapshot-1624575600-v8ntn\\\" (UID: \\\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\\\") \"\n","stream":"stderr","time":"2021-06-24T23:00:08.61189224Z"}
{"log":"I0624 23:00:15.872682    3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:15.873009605Z"}
{"log":"I0624 23:00:15.872704    3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:15.873049345Z"}
{"log":"I0624 23:00:15.872743    3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.rbd.csi.ceph.com/csi.sock  \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:15.873057553Z"}
{"log":"I0624 23:00:15.872753    3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:15.873064906Z"}
{"log":"I0624 23:00:15.872794    3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:15.873069483Z"}
{"log":"I0624 23:00:15.872812    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002328c30, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:15.873073407Z"}
{"log":"I0624 23:00:15.873155    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002328c30, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:15.873318882Z"}
{"log":"I0624 23:00:15.873887    3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:15.874025185Z"}
{"log":"I0624 23:00:15.873907    3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:15.874048718Z"}
{"log":"I0624 23:00:15.873924    3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.rbd.csi.ceph.com/csi.sock  \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:15.874056883Z"}
{"log":"I0624 23:00:15.873932    3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:15.874064326Z"}
{"log":"I0624 23:00:15.873955    3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:15.874070659Z"}
{"log":"I0624 23:00:15.873893    3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:15.874081657Z"}
{"log":"I0624 23:00:15.874100    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0023290b0, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:15.874218667Z"}
{"log":"I0624 23:00:15.874274    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0023290b0, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:15.874396242Z"}
{"log":"I0624 23:00:15.882636    3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:15.882778379Z"}
{"log":"I0624 23:00:16.644814    3568 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:16.645100842Z"}
{"log":"I0624 23:00:16.649652    3568 operation_generator.go:1348] Controller attach succeeded for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") device path: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:16.649792261Z"}
{"log":"I0624 23:00:16.745284    3568 reconciler.go:269] operationExecutor.MountVolume started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:16.745544683Z"}
{"log":"I0624 23:00:16.745626    3568 operation_generator.go:558] MountVolume.WaitForAttach entering for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") DevicePath \"\"\n","stream":"stderr","time":"2021-06-24T23:00:16.745794718Z"}
{"log":"I0624 23:00:16.750017    3568 operation_generator.go:567] MountVolume.WaitForAttach succeeded for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") DevicePath \"csi-225af1b82c928c791922af9189702d68cbd12844b13a32dd8458a635aff88dff\"\n","stream":"stderr","time":"2021-06-24T23:00:16.750220225Z"}
{"log":"I0624 23:00:16.750272    3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:16.750467649Z"}
{"log":"I0624 23:00:16.750303    3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:16.750485603Z"}
{"log":"I0624 23:00:16.750320    3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.cephfs.csi.ceph.com/csi.sock  \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.750492187Z"}
{"log":"I0624 23:00:16.750330    3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:16.750513769Z"}
{"log":"I0624 23:00:16.750374    3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:16.750519618Z"}
{"log":"I0624 23:00:16.750412    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002666a40, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.750524834Z"}
{"log":"I0624 23:00:16.750983    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002666a40, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.751150652Z"}
{"log":"I0624 23:00:16.751918    3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:16.752029379Z"}
{"log":"I0624 23:00:16.759246    3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:16.759427933Z"}
{"log":"I0624 23:00:16.759279    3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:16.75944273Z"}
{"log":"I0624 23:00:16.759294    3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.cephfs.csi.ceph.com/csi.sock  \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.759446981Z"}
{"log":"I0624 23:00:16.759300    3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:16.759451276Z"}
{"log":"I0624 23:00:16.759356    3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:16.759467142Z"}
{"log":"I0624 23:00:16.759410    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0026670b0, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.759481715Z"}
{"log":"I0624 23:00:16.759773    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0026670b0, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.759883849Z"}
{"log":"I0624 23:00:16.789406    3568 operation_generator.go:596] MountVolume.MountDevice succeeded for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") device mount path \"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\"\n","stream":"stderr","time":"2021-06-24T23:00:16.789641707Z"}
{"log":"I0624 23:00:16.789516    3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:16.789681224Z"}
{"log":"I0624 23:00:16.794440    3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:16.794656349Z"}
{"log":"I0624 23:00:16.794463    3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:16.794680954Z"}
{"log":"I0624 23:00:16.794497    3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.cephfs.csi.ceph.com/csi.sock  \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.794685778Z"}
{"log":"I0624 23:00:16.794506    3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:16.794690818Z"}
{"log":"I0624 23:00:16.794576    3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:16.79476636Z"}
{"log":"I0624 23:00:16.794774    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0031e3440, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.794804153Z"}
{"log":"I0624 23:00:16.795164    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0031e3440, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.795427525Z"}
{"log":"I0624 23:00:16.795808    3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:16.795985696Z"}
{"log":"I0624 23:00:16.799595    3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:16.799801855Z"}
{"log":"I0624 23:00:16.799630    3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:16.799829354Z"}
{"log":"I0624 23:00:16.799647    3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.cephfs.csi.ceph.com/csi.sock  \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.799833817Z"}
{"log":"I0624 23:00:16.799654    3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:16.799838692Z"}
{"log":"I0624 23:00:16.799720    3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:16.799842903Z"}
{"log":"I0624 23:00:16.799758    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002667630, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.799860207Z"}
{"log":"I0624 23:00:16.800061    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002667630, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:16.80016665Z"}
{"log":"I0624 23:00:16.809482    3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:16.810761097Z"}
{"log":"I0624 23:00:16.811821    3568 operation_generator.go:663] MountVolume.SetUp succeeded for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"cephfs-snapshot-1624575600-v8ntn\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:16.812026872Z"}
{"log":"I0624 23:00:17.009534    3568 kuberuntime_manager.go:425] No sandbox for pod \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\" can be found. Need to start a new one\n","stream":"stderr","time":"2021-06-24T23:00:17.009715553Z"}
{"log":"I0624 23:00:17.501848    3568 kubelet.go:1837] SyncLoop (UPDATE, \"api\"): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\"\n","stream":"stderr","time":"2021-06-24T23:00:17.502200708Z"}
{"log":"I0624 23:00:17.941428    3568 kubelet.go:1868] SyncLoop (PLEG): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\", event: \u0026pleg.PodLifecycleEvent{ID:\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\", Type:\"ContainerStarted\", Data:\"ba3366af8e95d11262c370dc1171ca1b92b46aacffad57716073777a9a5c1032\"}\n","stream":"stderr","time":"2021-06-24T23:00:17.941719981Z"}
{"log":"I0624 23:00:17.941497    3568 kubelet.go:1868] SyncLoop (PLEG): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\", event: \u0026pleg.PodLifecycleEvent{ID:\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\", Type:\"ContainerStarted\", Data:\"c1243bc4df3c2ca2fdc6f65dc808c2a54e2ed34bc3960c3bc8da0704045ade04\"}\n","stream":"stderr","time":"2021-06-24T23:00:17.941745252Z"}
{"log":"E0624 23:00:18.643466    3568 dns.go:135] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.1.42.11 10.1.42.12 10.10.0.7\n","stream":"stderr","time":"2021-06-24T23:00:18.643920992Z"}
{"log":"I0624 23:00:23.072398    3568 kubelet.go:1868] SyncLoop (PLEG): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\", event: \u0026pleg.PodLifecycleEvent{ID:\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\", Type:\"ContainerDied\", Data:\"ba3366af8e95d11262c370dc1171ca1b92b46aacffad57716073777a9a5c1032\"}\n","stream":"stderr","time":"2021-06-24T23:00:23.072547469Z"}
{"log":"I0624 23:00:23.072655    3568 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: ba3366af8e95d11262c370dc1171ca1b92b46aacffad57716073777a9a5c1032\n","stream":"stderr","time":"2021-06-24T23:00:23.07273424Z"}
{"log":"I0624 23:00:23.093082    3568 reflector.go:213] Stopping reflector *v1.Secret (0s) from object-\"rook-ceph\"/\"local\"\n","stream":"stderr","time":"2021-06-24T23:00:23.093263612Z"}
{"log":"I0624 23:00:23.093132    3568 reflector.go:213] Stopping reflector *v1.Secret (0s) from object-\"rook-ceph\"/\"default-token-75rjf\"\n","stream":"stderr","time":"2021-06-24T23:00:23.093286641Z"}
{"log":"I0624 23:00:23.093162    3568 reflector.go:213] Stopping reflector *v1.ConfigMap (0s) from object-\"rook-ceph\"/\"ceph-snapshot-schedule\"\n","stream":"stderr","time":"2021-06-24T23:00:23.093291584Z"}
{"log":"I0624 23:00:23.169781    3568 reconciler.go:196] operationExecutor.UnmountVolume started for volume \"default-token-75rjf\" (UniqueName: \"kubernetes.io/secret/18e0a947-1b51-4b6b-81f5-25ce75a276ed-default-token-75rjf\") pod \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:23.170047527Z"}
{"log":"I0624 23:00:23.170028    3568 reconciler.go:196] operationExecutor.UnmountVolume started for volume \"cephfs-volume\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") pod \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\") \n","stream":"stderr","time":"2021-06-24T23:00:23.170136534Z"}
{"log":"I0624 23:00:23.170368    3568 clientconn.go:106] parsed scheme: \"\"\n","stream":"stderr","time":"2021-06-24T23:00:23.170507608Z"}
{"log":"I0624 23:00:23.170472    3568 clientconn.go:106] scheme \"\" not registered, fallback to default scheme\n","stream":"stderr","time":"2021-06-24T23:00:23.170515955Z"}
{"log":"W0624 23:00:23.170545    3568 empty_dir.go:469] Warning: Failed to clear quota on /var/lib/kubelet/pods/18e0a947-1b51-4b6b-81f5-25ce75a276ed/volumes/kubernetes.io~configmap/schedule: ClearQuota called, but quotas disabled\n","stream":"stderr","time":"2021-06-24T23:00:23.170759433Z"}
{"log":"I0624 23:00:23.170690    3568 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rook-ceph.cephfs.csi.ceph.com/csi.sock  \u003cnil\u003e 0 \u003cnil\u003e}] \u003cnil\u003e \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:23.170848643Z"}
{"log":"I0624 23:00:23.170748    3568 clientconn.go:948] ClientConn switching balancer to \"pick_first\"\n","stream":"stderr","time":"2021-06-24T23:00:23.17086775Z"}
{"log":"I0624 23:00:23.170829    3568 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick\n","stream":"stderr","time":"2021-06-24T23:00:23.171009697Z"}
{"log":"I0624 23:00:23.170987    3568 kubelet.go:1837] SyncLoop (UPDATE, \"api\"): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\"\n","stream":"stderr","time":"2021-06-24T23:00:23.171080255Z"}
{"log":"I0624 23:00:23.171048    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc00086f080, {CONNECTING \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:23.171172188Z"}
{"log":"I0624 23:00:23.172231    3568 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc00086f080, {READY \u003cnil\u003e}\n","stream":"stderr","time":"2021-06-24T23:00:23.172374644Z"}
{"log":"I0624 23:00:23.173657    3568 operation_generator.go:788] UnmountVolume.TearDown succeeded for volume \"kubernetes.io/secret/18e0a947-1b51-4b6b-81f5-25ce75a276ed-default-token-75rjf\" (OuterVolumeSpecName: \"default-token-75rjf\") pod \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\"). InnerVolumeSpecName \"default-token-75rjf\". PluginName \"kubernetes.io/secret\", VolumeGidValue \"\"\n","stream":"stderr","time":"2021-06-24T23:00:23.173775218Z"}
{"log":"I0624 23:00:23.177282    3568 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = \"transport is closing\"\n","stream":"stderr","time":"2021-06-24T23:00:23.177436718Z"}
{"log":"I0624 23:00:23.177338    3568 operation_generator.go:788] UnmountVolume.TearDown succeeded for volume \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\" (OuterVolumeSpecName: \"cephfs-volume\") pod \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\" (UID: \"18e0a947-1b51-4b6b-81f5-25ce75a276ed\"). InnerVolumeSpecName \"cephfs-snapshot\". PluginName \"kubernetes.io/csi\", VolumeGidValue \"\"\n","stream":"stderr","time":"2021-06-24T23:00:23.177463897Z"}
{"log":"I0624 23:00:23.270789    3568 reconciler.go:319] Volume detached for volume \"default-token-75rjf\" (UniqueName: \"kubernetes.io/secret/18e0a947-1b51-4b6b-81f5-25ce75a276ed-default-token-75rjf\") on node \"k8s-worker-5\" DevicePath \"\"\n","stream":"stderr","time":"2021-06-24T23:00:23.270918375Z"}
{"log":"I0624 23:00:23.270847    3568 reconciler.go:312] operationExecutor.UnmountDevice started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") on node \"k8s-worker-5\" \n","stream":"stderr","time":"2021-06-24T23:00:23.271024161Z"}
{"log":"E0624 23:00:23.273372    3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:23.773314049 +0000 UTC m=+766066.471433424 (durationBeforeRetry 500ms). Error: \"GetDeviceMountRefs check failed for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") on node \\\"k8s-worker-5\\\" : The device mount path \\\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\\\" is still mounted by other references [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount /var/lib/kubelet/pods/7a6446c6-23ff-4fab-83ec-82ca8734212e/volumes/kubernetes.io~csi/offsite-backup/mount]\"\n","stream":"stderr","time":"2021-06-24T23:00:23.273554569Z"}
{"log":"I0624 23:00:23.873342    3568 reconciler.go:312] operationExecutor.UnmountDevice started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") on node \"k8s-worker-5\" \n","stream":"stderr","time":"2021-06-24T23:00:23.873669334Z"}
{"log":"E0624 23:00:23.876231    3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:24.876203861 +0000 UTC m=+766067.574323244 (durationBeforeRetry 1s). Error: \"GetDeviceMountRefs check failed for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") on node \\\"k8s-worker-5\\\" : The device mount path \\\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\\\" is still mounted by other references [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount /var/lib/kubelet/pods/7a6446c6-23ff-4fab-83ec-82ca8734212e/volumes/kubernetes.io~csi/offsite-backup/mount]\"\n","stream":"stderr","time":"2021-06-24T23:00:23.876403236Z"}
{"log":"I0624 23:00:24.106389    3568 kubelet.go:1868] SyncLoop (PLEG): \"cephfs-snapshot-1624575600-v8ntn_rook-ceph(18e0a947-1b51-4b6b-81f5-25ce75a276ed)\", event: \u0026pleg.PodLifecycleEvent{ID:\"18e0a947-1b51-4b6b-81f5-25ce75a276ed\", Type:\"ContainerDied\", Data:\"c1243bc4df3c2ca2fdc6f65dc808c2a54e2ed34bc3960c3bc8da0704045ade04\"}\n","stream":"stderr","time":"2021-06-24T23:00:24.106657417Z"}
{"log":"W0624 23:00:24.106499    3568 pod_container_deletor.go:79] Container \"c1243bc4df3c2ca2fdc6f65dc808c2a54e2ed34bc3960c3bc8da0704045ade04\" not found in pod's containers\n","stream":"stderr","time":"2021-06-24T23:00:24.10670478Z"}
{"log":"I0624 23:00:24.877177    3568 reconciler.go:312] operationExecutor.UnmountDevice started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") on node \"k8s-worker-5\" \n","stream":"stderr","time":"2021-06-24T23:00:24.877464859Z"}
{"log":"E0624 23:00:24.879452    3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:26.879426251 +0000 UTC m=+766069.577545621 (durationBeforeRetry 2s). Error: \"GetDeviceMountRefs check failed for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") on node \\\"k8s-worker-5\\\" : The device mount path \\\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\\\" is still mounted by other references [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount /var/lib/kubelet/pods/7a6446c6-23ff-4fab-83ec-82ca8734212e/volumes/kubernetes.io~csi/offsite-backup/mount]\"\n","stream":"stderr","time":"2021-06-24T23:00:24.879652128Z"}
{"log":"I0624 23:00:25.678865    3568 log.go:181] http: superfluous response.WriteHeader call from k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).WriteHeader (httplog.go:217)\n","stream":"stderr","time":"2021-06-24T23:00:25.679154228Z"}
{"log":"I0624 23:00:26.884764    3568 reconciler.go:312] operationExecutor.UnmountDevice started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") on node \"k8s-worker-5\" \n","stream":"stderr","time":"2021-06-24T23:00:26.88499332Z"}
{"log":"E0624 23:00:26.887115    3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:30.887062205 +0000 UTC m=+766073.585181621 (durationBeforeRetry 4s). Error: \"GetDeviceMountRefs check failed for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") on node \\\"k8s-worker-5\\\" : The device mount path \\\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\\\" is still mounted by other references [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount /var/lib/kubelet/pods/7a6446c6-23ff-4fab-83ec-82ca8734212e/volumes/kubernetes.io~csi/offsite-backup/mount]\"\n","stream":"stderr","time":"2021-06-24T23:00:26.887284993Z"}
{"log":"I0624 23:00:27.754074    3568 kubelet.go:1846] SyncLoop (DELETE, \"api\"): \"offsite-backup-t8fwm_local-system(7a6446c6-23ff-4fab-83ec-82ca8734212e)\"\n","stream":"stderr","time":"2021-06-24T23:00:27.754365613Z"}
{"log":"I0624 23:00:27.754397    3568 kuberuntime_container.go:635] Killing container \"docker://80aa928fd4a820b22de51a5b4aeeeb2e26fe116b6d85e992930b7d6cb47f58a9\" with a 30 second grace period\n","stream":"stderr","time":"2021-06-24T23:00:27.754525375Z"}
{"log":"I0624 23:00:30.834883    3568 kubelet.go:1846] SyncLoop (DELETE, \"api\"): \"cephfs-snapshot-1624564800-nr5dc_rook-ceph(fcf13dcb-d099-4112-a816-6ebe50910070)\"\n","stream":"stderr","time":"2021-06-24T23:00:30.835180164Z"}
{"log":"I0624 23:00:30.845859    3568 kubelet.go:1840] SyncLoop (REMOVE, \"api\"): \"cephfs-snapshot-1624564800-nr5dc_rook-ceph(fcf13dcb-d099-4112-a816-6ebe50910070)\"\n","stream":"stderr","time":"2021-06-24T23:00:30.84602745Z"}
{"log":"I0624 23:00:30.845914    3568 kubelet.go:2034] Failed to delete pod \"cephfs-snapshot-1624564800-nr5dc_rook-ceph(fcf13dcb-d099-4112-a816-6ebe50910070)\", err: pod not found\n","stream":"stderr","time":"2021-06-24T23:00:30.846045057Z"}
{"log":"I0624 23:00:30.899760    3568 reconciler.go:312] operationExecutor.UnmountDevice started for volume \"cephfs-snapshot\" (UniqueName: \"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\") on node \"k8s-worker-5\" \n","stream":"stderr","time":"2021-06-24T23:00:30.899891514Z"}
{"log":"E0624 23:00:30.901883    3568 nestedpendingoperations.go:301] Operation for \"{volumeName:kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot podName: nodeName:}\" failed. No retries permitted until 2021-06-24 23:00:38.901854069 +0000 UTC m=+766081.599973434 (durationBeforeRetry 8s). Error: \"GetDeviceMountRefs check failed for volume \\\"cephfs-snapshot\\\" (UniqueName: \\\"kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^cephfs-snapshot\\\") on node \\\"k8s-worker-5\\\" : The device mount path \\\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-snapshot/globalmount\\\" is still mounted by other references [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/offsite-backup/globalmount /var/lib/kubelet/pods/7a6446c6-23ff-4fab-83ec-82ca8734212e/volumes/kubernetes.io~csi/offsite-backup/mount]\"\n","stream":"stderr","time":"2021-06-24T23:00:30.902065359Z"}

Additional Context

The Ceph cluster is deployed and managed by Proxmox and configured in k8s by Rook. Here’s the cluster CRD’s status:

status:
  ceph:
    capacity:
      bytesAvailable: 42977436372992
      bytesTotal: 76110897348608
      bytesUsed: 33133460975616
      lastUpdated: "2021-07-01T15:16:19Z"
    health: HEALTH_OK
    lastChanged: "2021-07-01T08:47:01Z"
    lastChecked: "2021-07-01T15:16:19Z"
    previousHealth: HEALTH_WARN
    versions:
      mds:
        ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable): 5
      mgr:
        ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable): 5
      mon:
        ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable): 3
      osd:
        ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable): 16
      overall:
        ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable): 29
  conditions:
  - lastHeartbeatTime: "2021-07-01T07:41:52Z"
    lastTransitionTime: "2021-07-01T07:41:52Z"
    message: Attempting to connect to an external Ceph cluster
    reason: ClusterConnecting
    status: "True"
    type: Connecting
  - lastHeartbeatTime: "2021-07-01T15:16:20Z"
    lastTransitionTime: "2021-04-26T03:27:20Z"
    message: Cluster connected successfully
    reason: ClusterConnected
    status: "True"
    type: Connected
  message: Cluster connected successfully
  phase: Connected
  state: Connected
  version:
    image: ceph/ceph:v15.2.13
    version: 15.2.13-0 

While rook is using an admin secret, I’ve added another secret for lower privilege access for CephFS CSI mounting, here’s one of the volumes from the logs above. The other is nearly identical except for the name, uid, handle, and claimRef:

apiVersion: v1
kind: PersistentVolume
annotations:
  pv.kubernetes.io/bound-by-controller: "yes"
metadata:
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    app: cephfs-snapshot
  name: cephfs-snapshot
  resourceVersion: "194940716"
  selfLink: /api/v1/persistentvolumes/cephfs-snapshot
  uid: 7cc3698d-f4b4-4d05-8259-476bf47ffb22
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 50Ti
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: cephfs-snapshot
    namespace: rook-ceph
    resourceVersion: "189756705"
    uid: 056edc02-c1f5-44fe-b3f7-f416139eb638
  csi:
    driver: rook-ceph.cephfs.csi.ceph.com
    nodeStageSecretRef:
      name: rook-csi-cephfs-node-static
      namespace: rook-ceph
    volumeAttributes:
      clusterID: rook-ceph
      fsName: clusterfs-capacity
      rootPath: /
      staticVolume: "true"
    volumeHandle: cephfs-snapshot
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem
status:
  phase: Bound

And here is the scrubbed node secret:

apiVersion: v1
data:
  userID: rook-k8s
  userKey: foo
kind: Secret
metadata:
  creationTimestamp: "2021-06-06T04:49:56Z"
  name: rook-csi-cephfs-node-static
  namespace: rook-ceph
  resourceVersion: "182934314"
  selfLink: /api/v1/namespaces/rook-ceph/secrets/rook-csi-cephfs-node-static
  uid: 823a7b2f-9ea6-4a99-ad89-dc052ddc4910
type: kubernetes.io/rook

And finally, the cephx permissions:

client.rook-k8s
	key: foo
	caps: [mds] allow rws
	caps: [mon] allow r
	caps: [osd] allow rw tag cephfs data=clusterfs-capacity

For now, I’ve worked around this by using pod antiaffinity to ensure the jobs don’t run on the same node but I’ll start running out of nodes eventually.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 32 (2 by maintainers)

Commits related to this issue

Most upvoted comments

@ikogan am planning to work on this one. please provide the below details for now

  • The issue is only with the static PVC not with dynamic PVC?
  • If the issue exists for dynamic PVC what are the reproducer steps?
  • Did you try to reproduce the latest cephcsi v3.4.0 version?

Okay, one thing am failing to understand is why do you need multiple PV with the same RootPath: / as CephFS supports RWX volumes you need only one PVC/PV with RootPath: / if the RootPath is pointing to different paths then you need different PVC/PV can you be more specific about creating of multiple PV with same RootPath?

While I can re-use the same PVC if my workloads are in the same namespace, since a PVC is a namespaced object, I cannot use it across namespaces. I can also only have one PVC bound to a single PV. The only way to have two different namespaces use the same underlying filesystem path is to have two different PVs.

@Madhu-1 sure, I’ll give it a go tomorrow.

I tested this one with shallow support PR i don’t see any issue. good to get some confirmation on this one.

I think it makes sense to close this if we think that it must be fixed in the kubelet. I was wondering why Ceph doesn’t choose to stage a CephFS mount once on a given node rather than mounting it multiple times? That seems a bit more logical given what it is. Wouldn’t that also work around this issue?

As the stagingPaths are different as per the CSI standard we need to make sure the volume is mounted to the given staging path. if it was a single stagingPath we don’t have any problem.

Looks like this is being talked about on the k8s GitHub here: kubernetes/kubernetes#105323.

Yes opened an issue in kubernetes to check Is this expected or not. but looks like no response

@ikogan Thanks for reverting here!! much appreciated… I wanted to ask you that, is this cluster running in production? because recently I have been revisiting the ‘locking operation’ we do in various code paths and working on some improvements in the same area as mentioned or tracked under #2149.

I have a feeling that, this bug fix could take care of this scenario. so just wanted to ask if it possible to give a try with test image (quay.io/humble/cephcsi:publish) with the fix at your end.

Meanwhile I am taking some AIs here to reproduce it in my end and also RCAing it . Please revert with your thoughts.