longhorn: [QUESTION] How to rescue Faulted volumes
Question
After a node crash, I end up with two faulted volumes that can not be attached, they can’t be mounted to /dev/longhorn/, but the longhorn pods works fine, and the image files looks fine to me.
time="2022-07-22T09:01:17Z" level=info msg="All replicas are failed, set engine salvageRequested to true"
accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=node1 owner=node1
state=detached volume=pvc-3cc715b2-aaa2-4c1d-a788-ffc71905874c
time="2022-07-22T09:01:17Z" level=info msg="All replicas are failed, set engine salvageRequested to true"
accessMode=rwx controller=longhorn-volume frontend=blockdev migratable=false node=node1 owner=node1
shareEndpoint= shareState=stopped state=detached volume=pvc-04e953eb-5411-4433-82a4-e6e54aa7fb92
I wonder if I can try to fsck or somehow repair the filesystem?
Environment
- Longhorn version: 1.2.4 (recently upgraded from 1.2.3)
- Kubernetes version: v1.24.2+k3s2
- Node config
- OS type and version debian bulleye
- CPU per node: 20
- Memory per node: 128G
- Disk type RAID10 HDD
- Network bandwidth and latency between the nodes: singleton node
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): K3S on debian
Additional context
I have another volume can be mounted without errors.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (7 by maintainers)
After lower the minimal storage percentage, the node is back to schedulable, and the faulted disks back to degraded but attachable,
now I can see it is attached to
/dev/longhorn, but I can not use it in my pods, the pods output is@PhanLe1010 Oh yes, I have a lot of other data using local-path provision, I am evaluating switching to longhorn.
The disk is not schedulable, it alerts this message:
However 322122547200/4531630899200 = 0.071, which is enough for the minimal.
When try to salvage the replica, the output is
And the
Automatic salvagesetting is ON.Thank you for your help.
Can you manually try to salvage the volume by:
node1and the disk/data/longhornare schedulablelab-shihsto 0lab-shihsto 0On the other hand, can you check the setting
Automatic salvagein Longhorn UI -> setting -> general setting to see if it is ON