kubernetes: Kubelet cannot finish terminating a pod that uses a PVC with `volumeMode: Block` if the underlying block device is accidentally disconnected from the node

What happened?

A pod uses a PVC with volumeMode: Block. The block volume is backed by a block device on the node (which is created by some CSI storage providers). If the underlying block device corresponding to the volume is accidentally disconnected from the node (due to network, hardware problem, etc…), the user cannot delete the pod. The pod would be stuck in terminating forever since Kubelet refuses to unmount the block volume. This prevents the user from cleaning up the pod and spinning up a new replacement pod thus leading to a long service degradation. For example, if the pod is part of a statefulset, the replacement pod cannot come up due to the old pod being stuck terminating.

What did you expect to happen?

Kubelet should be able to finish terminating a pod when the corresponding block device is already removed from the node.

How can we reproduce it (as minimally and precisely as possible)?

  1. Install a CSI storage provider that supports volumeMode: Block. For example, Longhorn

  2. Deloy a statefulset that has volumeClaimTemplates with volumeMode: Block. For example:

    click to open yaml manifest
    apiVersion: v1
    kind: Service
    metadata:
      name: test-bv
      labels:
        app: test-bv
    spec:
      ports:
      - port: 80
        name: test-bv
      selector:
        app: test-bv
      type: NodePort
    ---
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: test-bv
      namespace: default
    spec:
      selector:
        matchLabels:
          app: test-bv
      serviceName: "test-bv"
      template:
        metadata:
          labels:
            app: test-bv
        spec:
          restartPolicy: Always
          terminationGracePeriodSeconds: 10
          containers:
          - name: test-bv
            image: k8s.gcr.io/nginx-slim:0.8
            command:
            - sleep
            - infinity
            ports:
            - containerPort: 80
              name: test-bv
            volumeDevices:
            - devicePath: /dev/longhorn/test-blk
              name: test-bv-volume
      volumeClaimTemplates:
      - metadata:
          name: test-bv-volume
        spec:
          accessModes: [ "ReadWriteOnce" ]
          storageClassName: "longhorn"
          volumeMode: Block
          resources:
            requests:
              storage: 1Gi
    
  3. Simulate a network/hardware problem in which the backing block device is removed from the host unexpectedly. In our example, we delete a Longhorn engine instance manager which leads to the underlying block device being removed from the host.

  4. Now that the pod cannot access the block volume, try to delete the pod.

  5. The pod is stuck in terminating state forever. Logs from kubelet:

    Click to open logs
    I0324 04:21:58.069151    1232 kubelet.go:1948] "SyncLoop DELETE" source="api" pods=[default/test-bv-0]
    2022-03-24 04:21:58.068 [INFO][4186656] k8s.go 563: Cleaning up netns ContainerID="98992de02709e5051a3c4f503f5e7ae797143218b286f89edd3d1fb3b03156ba"
    2022-03-24 04:21:58.068 [INFO][4186656] dataplane_linux.go 473: Calico CNI deleting device in netns /proc/4183752/ns/net ContainerID="98992de02709e5051a3c4f503f5e7ae797143218b286f89edd3d1fb3b03156ba"
    2022-03-24 04:21:58.090 [INFO][4186656] dataplane_linux.go 490: Calico CNI deleted device in netns /proc/4183752/ns/net ContainerID="98992de02709e5051a3c4f503f5e7ae797143218b286f89edd3d1fb3b03156ba"
    2022-03-24 04:21:58.090 [INFO][4186656] k8s.go 570: Releasing IP address(es) ContainerID="98992de02709e5051a3c4f503f5e7ae797143218b286f89edd3d1fb3b03156ba"
    2022-03-24 04:21:58.090 [INFO][4186656] utils.go 196: Calico CNI releasing IP address ContainerID="98992de02709e5051a3c4f503f5e7ae797143218b286f89edd3d1fb3b03156ba"
    2022-03-24 04:21:58.090 [INFO][4186656] utils.go 212: Using a dummy podCidr to release the IP ContainerID="98992de02709e5051a3c4f503f5e7ae797143218b286f89edd3d1fb3b03156ba" podCidr="0.0.0.0/0"
    2022-03-24 04:21:58.090 [INFO][4186656] utils.go 331: Calico CNI fetching podCidr from Kubernetes ContainerID="98992de02709e5051a3c4f503f5e7ae797143218b286f89edd3d1fb3b03156ba"
    2022-03-24 04:21:58.090 [INFO][4186656] utils.go 337: Fetched podCidr ContainerID="98992de02709e5051a3c4f503f5e7ae797143218b286f89edd3d1fb3b03156ba" podCidr="0.0.0.0/0"
    2022-03-24 04:21:58.090 [INFO][4186656] utils.go 340: Calico CNI passing podCidr to host-local IPAM: 0.0.0.0/0 ContainerID="98992de02709e5051a3c4f503f5e7ae797143218b286f89edd3d1fb3b03156ba"
    2022-03-24 04:21:58.098 [INFO][4186656] k8s.go 576: Teardown processing complete. ContainerID="98992de02709e5051a3c4f503f5e7ae797143218b286f89edd3d1fb3b03156ba"
    I0324 04:21:58.763182    1232 kubelet.go:1970] "SyncLoop (PLEG): event for pod" pod="default/test-bv-0" event=&{ID:7c5b2ecb-cda5-4e72-95db-4e133ba3a563 Type:ContainerDied Data:a698192f7a83a0abf1dfc8f8f863134d3eb8a732ea9952f82bcf35316622292b}
    I0324 04:21:58.763283    1232 kubelet.go:1970] "SyncLoop (PLEG): event for pod" pod="default/test-bv-0" event=&{ID:7c5b2ecb-cda5-4e72-95db-4e133ba3a563 Type:ContainerDied Data:98992de02709e5051a3c4f503f5e7ae797143218b286f89edd3d1fb3b03156ba}
    I0324 04:21:58.763325    1232 scope.go:111] "RemoveContainer" containerID="a698192f7a83a0abf1dfc8f8f863134d3eb8a732ea9952f82bcf35316622292b"
    I0324 04:21:58.786653    1232 scope.go:111] "RemoveContainer" containerID="a698192f7a83a0abf1dfc8f8f863134d3eb8a732ea9952f82bcf35316622292b"
    E0324 04:21:58.787657    1232 remote_runtime.go:334] "ContainerStatus from runtime service failed" err="rpc error: code = Unknown desc = Error: No such container: a698192f7a83a0abf1dfc8f8f863134d3eb8a732ea9952f82bcf35316622292b" containerID="a698192f7a83a0abf1dfc8f8f863134d3eb8a732ea9952f82bcf35316622292b"
    I0324 04:21:58.787730    1232 pod_container_deletor.go:52] "DeleteContainer returned error" containerID={Type:docker ID:a698192f7a83a0abf1dfc8f8f863134d3eb8a732ea9952f82bcf35316622292b} err="failed to get container status \"a698192f7a83a0abf1dfc8f8f863134d3eb8a732ea9952f82bcf35316622292b\": rpc error: code = Unknown desc = Error: No such container: a698192f7a83a0abf1dfc8f8f863134d3eb8a732ea9952f82bcf35316622292b"
    I0324 04:21:58.921232    1232 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"kube-api-access-8mctj\" (UniqueName: \"kubernetes.io/projected/7c5b2ecb-cda5-4e72-95db-4e133ba3a563-kube-api-access-8mctj\") pod \"7c5b2ecb-cda5-4e72-95db-4e133ba3a563\" (UID: \"7c5b2ecb-cda5-4e72-95db-4e133ba3a563\") "
    I0324 04:21:58.921523    1232 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"test-bv-volume\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-417a59df-2c2b-44a1-9350-07b108bf96ec\") pod \"7c5b2ecb-cda5-4e72-95db-4e133ba3a563\" (UID: \"7c5b2ecb-cda5-4e72-95db-4e133ba3a563\") "
    I0324 04:21:58.925924    1232 operation_generator.go:829] UnmountVolume.TearDown succeeded for volume "kubernetes.io/projected/7c5b2ecb-cda5-4e72-95db-4e133ba3a563-kube-api-access-8mctj" (OuterVolumeSpecName: "kube-api-access-8mctj") pod "7c5b2ecb-cda5-4e72-95db-4e133ba3a563" (UID: "7c5b2ecb-cda5-4e72-95db-4e133ba3a563"). InnerVolumeSpecName "kube-api-access-8mctj". PluginName "kubernetes.io/projected", VolumeGidValue ""
    W0324 04:21:58.982588    1232 volume_path_handler_linux.go:61] couldn't find loopback device which takes file descriptor lock. Skip detaching device. device path: "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec/dev/7c5b2ecb-cda5-4e72-95db-4e133ba3a563"
    I0324 04:21:58.982793    1232 mount_linux.go:197] Detected OS without systemd
    E0324 04:21:58.987894    1232 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-417a59df-2c2b-44a1-9350-07b108bf96ec podName:7c5b2ecb-cda5-4e72-95db-4e133ba3a563 nodeName:}" failed. No retries permitted until 2022-03-24 04:21:59.487840694 +0000 UTC m=+1184606.200514039 (durationBeforeRetry 500ms). Error: "UnmapVolume.UnmapBlockVolume failed for volume \"test-bv-volume\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-417a59df-2c2b-44a1-9350-07b108bf96ec\") pod \"7c5b2ecb-cda5-4e72-95db-4e133ba3a563\" (UID: \"7c5b2ecb-cda5-4e72-95db-4e133ba3a563\") : blkUtil.DetachFileDevice failed. globalUnmapPath:/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec/dev, podUID: 7c5b2ecb-cda5-4e72-95db-4e133ba3a563, bindMount: true: failed to unmount linkPath /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec/dev/7c5b2ecb-cda5-4e72-95db-4e133ba3a563: unmount failed: exit status 32\nUnmounting arguments: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec/dev/7c5b2ecb-cda5-4e72-95db-4e133ba3a563\nOutput: umount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec/dev/7c5b2ecb-cda5-4e72-95db-4e133ba3a563: target is busy.\n"
    I0324 04:21:59.038425    1232 reconciler.go:319] "Volume detached for volume \"kube-api-access-8mctj\" (UniqueName: \"kubernetes.io/projected/7c5b2ecb-cda5-4e72-95db-4e133ba3a563-kube-api-access-8mctj\") on node \"shuo-k8s-worker3\" DevicePath \"\""
    I0324 04:21:59.541916    1232 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"test-bv-volume\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-417a59df-2c2b-44a1-9350-07b108bf96ec\") pod \"7c5b2ecb-cda5-4e72-95db-4e133ba3a563\" (UID: \"7c5b2ecb-cda5-4e72-95db-4e133ba3a563\") "
    W0324 04:21:59.594618    1232 volume_path_handler_linux.go:61] couldn't find loopback device which takes file descriptor lock. Skip detaching device. device path: "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec/dev/7c5b2ecb-cda5-4e72-95db-4e133ba3a563"
    W0324 04:21:59.594663    1232 volume_path_handler.go:219] Warning: Unmap skipped because symlink does not exist on the path: /var/lib/kubelet/pods/7c5b2ecb-cda5-4e72-95db-4e133ba3a563/volumeDevices/kubernetes.io~csi/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec
    I0324 04:21:59.594782    1232 mount_linux.go:197] Detected OS without systemd
    E0324 04:21:59.599253    1232 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-417a59df-2c2b-44a1-9350-07b108bf96ec podName:7c5b2ecb-cda5-4e72-95db-4e133ba3a563 nodeName:}" failed. No retries permitted until 2022-03-24 04:22:00.599185856 +0000 UTC m=+1184607.311859193 (durationBeforeRetry 1s). Error: "UnmapVolume.UnmapBlockVolume failed for volume \"test-bv-volume\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-417a59df-2c2b-44a1-9350-07b108bf96ec\") pod \"7c5b2ecb-cda5-4e72-95db-4e133ba3a563\" (UID: \"7c5b2ecb-cda5-4e72-95db-4e133ba3a563\") : blkUtil.DetachFileDevice failed. globalUnmapPath:/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec/dev, podUID: 7c5b2ecb-cda5-4e72-95db-4e133ba3a563, bindMount: true: failed to unmount linkPath /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec/dev/7c5b2ecb-cda5-4e72-95db-4e133ba3a563: unmount failed: exit status 32\nUnmounting arguments: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec/dev/7c5b2ecb-cda5-4e72-95db-4e133ba3a563\nOutput: umount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec/dev/7c5b2ecb-cda5-4e72-95db-4e133ba3a563: target is busy.\n"
    

    Please pay attention to the log W0324 04:21:59.594618 1232 volume_path_handler_linux.go:61] couldn't find loopback device which takes file descriptor lock. Skip detaching device. device path: "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec/dev/7c5b2ecb-cda5-4e72-95db-4e133ba3a563". Kubelet is refusing to terminate the pod because the underlying block device is removed from the host. If we run losetup on the host, we can see that the backing file of the corresponding loop device is marked as deleted:

    root@shuo-k8s-worker3:~# losetup
    NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE                                                                                                                                            DIO LOG-SEC
    /dev/loop1         0      0         1  1 /var/lib/snapd/snaps/lxd_21835.snap                                                                                                                    0     512
    /dev/loop8         0      0         0  0 / (deleted)                                                                                                                                            0     512
    /dev/loop6         0      0         0  0 / (deleted)                                                                                                                                            0     512
    /dev/loop4         0      0         1  1 /var/lib/snapd/snaps/snapd_14978.snap                                                                                                                  0     512
    /dev/loop2         0      0         1  1 /var/lib/snapd/snaps/core20_1376.snap                                                                                                                  0     512
    /dev/loop0         0      0         1  1 /var/lib/snapd/snaps/core20_1361.snap                                                                                                                  0     512
    /dev/loop9         0      0         0  0 /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec/dev/3447f8b4-b54a-49df-b677-ffab62fc68be (deleted)   0     512
    /dev/loop5         0      0         1  1 /var/lib/snapd/snaps/snapd_15177.snap                                                                                                                  0     512
    /dev/loop3         0      0         1  1 /var/lib/snapd/snaps/lxd_22526.snap                                                                                                                    0     512
    
    root@shuo-k8s-worker3:~# cat /sys/block/loop9/loop/backing_file
    /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-417a59df-2c2b-44a1-9350-07b108bf96ec/dev/3447f8b4-b54a-49df-b677-ffab62fc68be (deleted)
    

    In our humble opinion, we think that Kubelet should process to remove the loop device corresponding to the deleted block device so that pod can finish terminating as detailed in this PR: https://github.com/kubernetes/kubernetes/pull/109083. We have tested that PR and can confirm that it fixes the issue.

Anything else we need to know?

We have prepared a PR at : https://github.com/kubernetes/kubernetes/pull/109083. We have tested that PR and can confirm that it fixes the issue.

Kubernetes version

v1.22.7

Cloud provider

Rancher RKE2

OS version

NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Install tools

Rancher RKE2

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (2 by maintainers)

Most upvoted comments

kubelet log

13:31:57 wanguo-72 kubelet[1131]: E0428 13:31:57.597111 1131 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/topolvm.cybozu.com^a27f2f53-54fa-4068-bbad-cfbe70c0815c podName:8519283b-9521-4aed-9bfe-8f6687f99691 nodeName:}" failed. No retries permitted until 2023-04-28 13:33:59.597061528 +0800 CST m=+8838.750882463 (durationBeforeRetry 2m2s). Error: UnmapVolume.UnmapBlockVolume failed for volume "pvc-e2bc365a-a4e4-48a7-9057-af87b303f8fe" (UniqueName: "kubernetes.io/csi/topolvm.cybozu.com^a27f2f53-54fa-4068-bbad-cfbe70c0815c") pod "8519283b-9521-4aed-9bfe-8f6687f99691" (UID: "8519283b-9521-4aed-9bfe-8f6687f99691") : blkUtil.DetachFileDevice failed. globalUnmapPath:, podUID: 8519283b-9521-4aed-9bfe-8f6687f99691, bindMount: true: failed to unmap device from map path. mapPath is empty

node poweroff restart , pod use block mode volume, and pod stuck in terminating