longhorn: [BUG] Unable to export RAID1 bdev in degraded state
Describe the bug (🐛 if you encounter this issue)
Unable to export RAID1 bdev in degrade state. The RAID1 bdev should be exportable if there is at least one healthy lvol.
To Reproduce
Steps to reproduce the behavior:
- Launch SPDK target
- Prepare a bdev lvol
- Create a bdev raid based on the newly created lvol and a non-existing lvol:
sudo ~/go/src/github.com/longhorn/spdk/scripts/rpc.py bdev_raid_create -n raid-degraded -r raid1 -b "<A Valid Lvol> <A Non-existing Lvol>" - Create a nvmf and use the bdev raid as the ns:
sudo ~/go/src/github.com/longhorn/spdk/scripts/rpc.py nvmf_create_subsystem nqn.2023-01.io.spdk:testvol -a -s SPDK00000000000020 -d SPDK_Controller
sudo ~/go/src/github.com/longhorn/spdk/scripts/rpc.py nvmf_subsystem_add_ns nqn.2023-01.io.spdk:testvol raid-degraded
The ns add cmd will error out.
Expected behavior
The degraded raid can be added as a nvmf subsystem NS
Log or Support bundle
If applicable, add the Longhorn managers’ log or support bundle when the issue happens. You can generate a Support Bundle using the link at the footer of the Longhorn UI.
Environment
- Longhorn version:
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
- Number of management node in the cluster:
- Number of worker node in the cluster:
- Node config
- OS type and version:
- CPU per node:
- Memory per node:
- Disk type(e.g. SSD/NVMe):
- Network bandwidth between the nodes:
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
- Number of Longhorn volumes in the cluster:
Additional context
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 30 (28 by maintainers)
If I call
bdev_nvme_attach_controllerwithctrlr_loss_timeout_secandreconnect_delay_secvalorized, on local node after a while the remote controller is detached and:Check
In SPDK Gerrit there is this development under review: https://review.spdk.io/gerrit/c/spdk/spdk/+/16167 It is the last commit of the relation chain, so it contains all the last development made over Raid module. @shuo-wu If you want you can start to work with this version, until it will be merged over master branch it will be our base version for replica rebuilding development
Suppose this is the situation before the down: we have a remote lvol bdev
Nvme2n1and the raid1
At one time the remote node goes down and the local nvme controller, after 10 seconds of reconnections, is deleted. When the remote node comes up again, and it export its lvol via nvmf, if on local node we attach again to the remote lvol and so the local bdev
Nvme2n1reappears, it is not handled anymore from the raid. Once a base bdev has been removed from raid, it is not automatically readded. So, we remain with only one base bdev even if bdevNvme2n1is present:@DamiaSan Just notice it is based on running IO?
We still need an extra method rather than running io used for monitoring the connectivity of lvols. If one lvol is somehow down, the rebuilding can be triggered.
The remaining issue
When there is a remote base bdev down, we cannot add this RAID bdev as a nvmf subsystem namespace. Besides, if we already create a device (nvme-cli initiator) and apply IO based on a RAID bdev, suddenly shutting down a remote base bdev may be a problem as well.
The reproducing step
Test 1 (Verified):
spdk_tgton node 1, which leads to the remote lvol on node 0 downTest 2 (Unverified):
spdk_tgton node 1, which leads to the remote lvol on node 0 downnote: “expose a bdev” means creating nvmf subsystem, namespace, and a listener for the bdev
The related test result
For Test 1, after stopping the remote node
spdk_tgt, getting the raid does not tell users that there is an invalid base bdev:And trying to expose the RAID will show the below weird error:
But trying to get the down remote lvol (which is considered as a nvme bdev on node0) with API
bdev_nvme_get_controllersrather thanbdev_get_bdevswill show the statefailed:Expected behaviors
num_base_bdevs_operational.@shuo-wu I have verified this issue in the version https://review.spdk.io/gerrit/c/spdk/spdk/+/16167 and still persists. The raid bdev is not registered and so nvmf can’t add it to the subsystem. In this commit have been made some works over operational base bdevs, I left a comment in Gerrit because I think that
min_base_bdevs_operationalshould be used (in raid1 it is equal to 1) to determine if to configure the raid bdev