democratic-csi: All Persistant Volumes fail permanently after NAS reboot

Whenever I reboot the OS on the NAS that hosts my ISCSI democratic-csi volumes, all containers that rely on those volumes fail consistently even after the NAS comes back online with the following error:

  Warning  FailedMount  37s               kubelet            MountVolume.MountDevice failed for volume "pvc-da280e70-9bcb-41ba-bbbd-cbf973580c6e" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount  34s               kubelet            Unable to attach or mount volumes: unmounted volumes=[config], unattached volumes=[config media transcode kube-api-access-2c2w7 backup]: timed out waiting for the condition
  Warning  FailedMount  5s (x6 over 37s)  kubelet            MountVolume.MountDevice failed for volume "pvc-da280e70-9bcb-41ba-bbbd-cbf973580c6e" : rpc error: code = Aborted desc = operation locked due to in progress operation(s): ["volume_id_pvc-da280e70-9bcb-41ba-bbbd-cbf973580c6e"]

I have tried suspending all pods with kubectl scale -n media deploy/plex --replicas 0 to try and ensure that nothing is using the volume during the reboot.

Unfortunately I know almost nothing about ISCSI, so it’s entirely possible this is 100% my fault. What is the proper process with ISCSI for rebooting either the NAS, or the nodes using PVs on the NAS to prevent this lockup? Is there an iscsiadm command I can use to remove this deadlock and let the new container access the PV?

my democratic-csi config is:

---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: csi-iscsi
  namespace: storage
spec:
  interval: 5m
  chart:
    spec:
      chart: democratic-csi
      version: 0.13.4
      sourceRef:
        kind: HelmRepository
        name: democratic-csi-charts
        namespace: flux-system
      interval: 5m
  values:
    csiDriver:
      name: "org.democratic-csi.iscsi"

    storageClasses:
    - name: tank-iscsi-csi
      defaultClass: true
      reclaimPolicy: Delete
      ## For testing
      # reclaimPolicy: Retain
      volumeBindingMode: Immediate
      allowVolumeExpansion: true
      parameters:
        fsType: ext4

    driver:
      image: docker.io/democraticcsi/democratic-csi:v1.7.6
      imagePullPolicy: IfNotPresent
      config:
        driver: zfs-generic-iscsi
      existingConfigSecret: zfs-generic-iscsi-config

and the driver config is:

apiVersion: v1
kind: Secret
metadata:
    name: zfs-generic-iscsi-config
    namespace: storage
stringData:
    driver-config-file.yaml: |
        driver: zfs-generic-iscsi
        sshConnection:
            host: ${UIHARU_IP}
            port: 22
            username: root
            privateKey: |
                -----BEGIN OPENSSH PRIVATE KEY-----
                ...
                -----END OPENSSH PRIVATE KEY-----
        zfs:
            datasetParentName: sltank/k8s/iscsiv
            detachedSnapshotsDatasetParentName: sltank/k8s/iscsis
        iscsi:
            shareStrategy: "targetCli"
            shareStrategyTargetCli:
                basename: "iqn.2016-04.com.open-iscsi:a6b73d4196"
                tpg:
                    attributes:
                        authentication: 0
                        generate_node_acls: 1
                        cache_dynamic_acls: 1
                        demo_mode_write_protect: 0
            targetPortal: "${UIHARU_IP}"

Not sure what other info is important, but I’d be happy to provide anything else that might help troubleshoot the issue.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 1
  • Comments: 16 (8 by maintainers)

Most upvoted comments

Interesting, something to consider for sure. I think this could be handled by the health service endpoint. I am hesitant to get into such a thing but think it merits some discussion for sure.

Yeah, that’s a dangerous situation (which is why when iscsi goes down the volumes go into ro mode). 2 nodes using the same block device simultaneously is not something you want happening. I would use something like kured (https://github.com/weaveworks/kured) or similar to simply trigger alll your nodes to cycle so the workloads shift around and everything comes up clean.

Ah this is a tricky one and I’m glad you opened this. So there are a couple issues at play here:

  • democratic-csi ensures no 2 (possibly conflicting) operations happen at the same time and thus creates an in-memory lock
  • iscsi as a protocol will generally not handle this situation well and actually would require all your pods using iscsi volumes to restart

The first can be remedied by deleting all the democratic-csi pods and just letting them restart. The latter requires you to handle each workload in a case by case basis.

Essentially if the nas goes down and comes back up the iscsi sessions on the node (assuming they recover) go to read-only. The only way to remedy that (via k8s) is to just restart the pods as appropriate…and even then in some cases that may not be enough and would require forcing the workload to a new node. I’ll do some research on possible ways to just go to the cli of the nodes directly and get them back into a rw state manually without any other intervention at the k8s layer.