kubernetes: kubelet fails to unmount CSI volumes, floods log with errors and has to be restarted

What happened:

On pod termination, kubelet sometimes fails to unmount CSI volumes with the following error:

E0422 13:28:12.214727    1861 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "persistentvolumemount-6wg4ywt2xwwgkyygj6tyo2xhrm" (UniqueName: "kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-97172dfa-9e15-11eb-8ecd-26ee97ae76aa") pod "dc3f11b7-d671-464c-a734-65e1760ed7df" (UID: "dc3f11b7-d671-464c-a734-65e1760ed7df") : UnmountVolume.NewUnmounter failed for volume "persistentvolumemount-6wg4ywt2xwwgkyygj6tyo2xhrm" (UniqueName: "kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-97172dfa-9e15-11eb-8ecd-26ee97ae76aa") pod "dc3f11b7-d671-464c-a734-65e1760ed7df" (UID: "dc3f11b7-d671-464c-a734-65e1760ed7df") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/dc3f11b7-d671-464c-a734-65e1760ed7df/volumes/kubernetes.io~csi/pvc-b27ae1f5-bfa6-4951-aebb-ff33280796df/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/dc3f11b7-d671-464c-a734-65e1760ed7df/volumes/kubernetes.io~csi/pvc-b27ae1f5-bfa6-4951-aebb-ff33280796df/vol_data.json]: open /var/lib/kubelet/pods/dc3f11b7-d671-464c-a734-65e1760ed7df/volumes/kubernetes.io~csi/pvc-b27ae1f5-bfa6-4951-aebb-ff33280796df/vol_data.json: no such file or directory

Once this bug is triggered the Pod is stuck in Terminating state and the above error message starts repeating hundreds of times per minute in the log. The os-level mounts themselves are not removed, as reported by mount on the host:

10.115.38.19:6789,10.115.38.14:6789,10.115.38.17:6789:/volumes/csi/csi-vol-97172dfa-9e15-11eb-8ecd-26ee97ae76aa/44564d2b-e10b-4f25-a668-701552549ca0 on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b27ae1f5-bfa6-4951-aebb-ff33280796df/globalmount type ceph (rw,relatime,name=csi-cephfs-node,secret=<hidden>,acl,mds_namespace=storefs,_netdev)
10.115.38.19:6789,10.115.38.14:6789,10.115.38.17:6789:/volumes/csi/csi-vol-97172dfa-9e15-11eb-8ecd-26ee97ae76aa/44564d2b-e10b-4f25-a668-701552549ca0 on /var/lib/kubelet/pods/dc3f11b7-d671-464c-a734-65e1760ed7df/volumes/kubernetes.io~csi/pvc-b27ae1f5-bfa6-4951-aebb-ff33280796df/mount type ceph (rw,relatime,name=csi-cephfs-node,secret=<hidden>,acl,mds_namespace=storefs,_netdev)

Only solution after this point seems to be to restart the kubelet and clean up the pod by hand (by force-deleting the Pods and unmounting the volume on the host).

PVCs are being supplied by Ceph Rook, but see “Anything else we need to know?” section.

What you expected to happen:

kubelet should be able to unmount CSI volumes on pod termination.

How to reproduce it (as minimally and precisely as possible):

We launch a couple of hundred Pods and then stop them. About 5-10% of them fail to terminate with this problem. Pods can have anywhere from 1 to 13 containers, with every container mounting the same PVC (“cephfs-config” in the example below). Pod declarations are all some variation of this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tracker-group
  namespace: default
spec:
  selector:
    matchLabels:
      app: tracker-group
  template:
    metadata:
      labels:
        app: tracker-group
    spec:
      containers:
      - command: [...] #redacted
        env: [...] #redacted
        image: "my-image" #redacted
        imagePullPolicy: Always
        name: tracker-crop-producer
        volumeMounts:
        - mountPath: /app/params
          name: tracker-crop-producer--params
          readOnly: true
        - mountPath: /app/config
          name: persistentvolumemount-6wg4ywt2xwwgkyygj6tyo2xhrm
          readOnly: true
      - command: [...] #redacted
        env: [...] #redacted
        image: "my-image" #redacted
        imagePullPolicy: Always
        name: tracker-work-assigner
        volumeMounts:
        - mountPath: /app/params
          name: tracker-work-assigner--params
          readOnly: true
        - mountPath: /app/config
          name: persistentvolumemount-6wg4ywt2xwwgkyygj6tyo2xhrm
          readOnly: true
      dnsPolicy: ClusterFirstWithHostNet
      hostNetwork: true
      volumes:
      - configMap:
          name: tracker-crop-producer--params
        name: tracker-crop-producer--params
      - configMap:
          name: tracker-work-assigner--params
        name: tracker-work-assigner--params
      - name: persistentvolumemount-6wg4ywt2xwwgkyygj6tyo2xhrm
        persistentVolumeClaim:
          claimName: cephfs-config
          readOnly: true

Anything else we need to know?:

This may be the same bug as #85280 which was closed by the bot without resolution. (That bug does not mention how the problem might occur naturally, only how to trigger it by intentionally messing with the /var/lib/kubelet/pods/… directory.)

While we are experiencing this problem with Ceph Rook, we found this bug filed in the longhorn project, showing the exact same problem happening with their CSI.

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"clean", BuildDate:"2021-03-18T01:10:43Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"clean", BuildDate:"2021-03-18T01:02:01Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
    
  • Cloud provider or hardware configuration: Bare metal 10 node cluster
  • OS (e.g: cat /etc/os-release): Ubuntu 20.04.2 LTS
  • Kernel (e.g. uname -a): Linux appl-1-8 5.4.0-72-generic #80-Ubuntu SMP Mon Apr 12 17:35:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: kubeadm
  • Network plugin and version (if this is a network-related bug):
  • Others: Rook Ceph operator v1.5.9 (Ceph v15.2.9)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 24 (4 by maintainers)

Most upvoted comments

Can we get more attention here please ? it is hurting our production systems

The fix for orphaned pod volume cleanup is in 1.22 and being backported to 1.21, 1.20 and 1.19. #102576

@msau - Do you know which patch versions of 1.21 & 1.20 has the fix? I couldn’t find the PR #102576 in changelog for 1.20 & 1.21.