longhorn: [BUG] filesystem corrupted after delete instance-manager-r for a locality best-effort volume

Describe the bug (🐛 if you encounter this issue)

Filesystem corrupted after delete instance-manager-r for a locality best-effort volume.

Try to read file from the recovered volume:

cat: read error: I/O error

To Reproduce

Steps to reproduce the behavior:

  1. From Longhorn UI, add tag node-1 to node-1
  2. From Longhorn UI, create a volume with 1 replica, data-locality set to best-effort, and tag set to node-1
  3. From Longhorn UI, create PV/PVC for the volume
  4. Create a deployment with a pod uses the PVC, but set node selector to node-2 for the pod, so it will be scheduled to node-2, and causes failed to schedule to local replica, like:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-deployment
  labels:
    name: test-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      name: test-deployment
  template:
    metadata:
      labels:
        name: test-deployment
    spec:
      nodeSelector:
        kubernetes.io/hostname: node-2
      containers:
        - name: test-deployment
          image: nginx:stable-alpine
          volumeMounts:
            - name: test-pod
              mountPath: /data
      volumes:
        - name: test-pod
          persistentVolumeClaim:
            claimName: test-1
  1. Wait for the pod to be in running state and write some data and sync filesystem:
/ # dd if=/dev/urandom of=/data/test-1 bs=1M count=256
256+0 records in
256+0 records out
/ # sync
/ # sync /data/test-1
/ # cksum /data/test-1 
1372275509 268435456 /data/test-1
  1. Kill the instance-manager-r on node-1
  2. Wait for the volume re-attaching and the pod restarting and running
  3. Exec into the pod and try to read data, failed with the following error:
/ # cksum /data/test-1
cksum: /data/test-1: I/O error

Expected behavior

A clear and concise description of what you expected to happen.

Log or Support bundle

If applicable, add the Longhorn managers’ log or support bundle when the issue happens. You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment

  • Longhorn version: master-head
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
    • Number of management node in the cluster:
    • Number of worker node in the cluster:
  • Node config
    • OS type and version:
    • CPU per node:
    • Memory per node:
    • Disk type(e.g. SSD/NVMe):
    • Network bandwidth between the nodes:
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
  • Number of Longhorn volumes in the cluster:

Additional context

Add any other context about the problem here.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 16 (15 by maintainers)

Most upvoted comments

Verified passed on master-head (longhorn-manager a9d6289) and v1.5.x-head (longhorn-manager 60b5368). Running test case test_autosalvage_with_data_locality_enabled for more than 20 times, no failure occurred.

Ref: https://github.com/longhorn/longhorn/issues/4814

We might be able to check the mount point status by creating/deleting a test file. And this is what I am working on: https://ci.longhorn.io/job/private/job/longhorn-tests-regression/4389/console

Quick update what I found: The problem was occurred at Step 6. Kill the instance-manager-r on node-1 and Step 7. Wait for the volume re-attaching and the pod restarting and running.

  1. When we delete the instance-manager pod to crash the replica and start a rebuilding, it will detach and re-attach the volume but at CSI side it did not umount/remount (NodeUnstageVolume/NodeStageVolume) the globalmount mount point. (At this time globalmount should be an invalid mount point for the volume but not yet)
  2. Then CSI-plugin get a NodePublishVolume request to mount bind globalmount mount point to target path when the pod is restarting and running. (We still could read the directory of globalmount by os.Readir so we don’t start NodeUnstageVolume/NodeStageVolume and soon it will become an invalid mount point)
  3. If detaching/re-attaching procedure take a long time for some reasons (rebuilding or others), then globalmount could be considered a corrupted mount point by the check statement. The check statement is here https://github.com/longhorn/longhorn-manager/blob/master/csi/util.go#L253-L260

Got it, I will handle it in this sprint.

@yangchiu Can you provide the support bundle?

supportbundle_24b8f4fd-d894-48e0-acd5-aef5b8eddb65_2023-04-24T11-19-57Z.zip

Can you help detach the problematic volume, attach again and check if the file content is expected? If yes, it’s probably due to the connection between the engine and the replicas.

Yes, detach and re-attach can make the file content normal again.

@yangchiu Can you provide the support bundle? Can you help detach the problematic volume, attach again and check if the file content is expected? If yes, it’s probably due to the connection between the engine and the replicas.

@innobead Wait for the update from @yangchiu. @ChanYiLin can help check this issue.