longhorn: [BUG] Cannot detach volume

Describe the bug (🐛 if you encounter this issue)

No pod is using the volume, but it cannot be detached because it immediately gets reattached automatically.

From what I can see, the corresponding VolumeAttachment object has an attachment ticket that cannot be deleted (if deleted, it’s immediately recreated):

spec:
  attachmentTickets:
    volume-rebuilding-controller-pvc-150c0bbc-25d6-4fe2-b76b-6bce680d6975:
      generation: 0
      id: volume-rebuilding-controller-pvc-150c0bbc-25d6-4fe2-b76b-6bce680d6975
      nodeID: rkemetal1
      parameters:
        disableFrontend: 'true'
      type: volume-rebuilding-controller
  volume: pvc-150c0bbc-25d6-4fe2-b76b-6bce680d6975

To Reproduce

Don’t know.

Expected behavior

The volume should stay in the “detached” state.

Support bundle for troubleshooting

Please remove the bundle once you’ve downloaded it.

Environment

Longhorn 1.5.1.

About this issue

Original URL
State: closed
Created 10 months ago
Comments: 18 (13 by maintainers)

Most upvoted comments

In the meantime, I think we got all the data we possibly get from the user so we should proceed to get the user out of this stuck situation. You can try this workaround @h-e-l-o:

Workaround:

Get all problematic volumes by kubectl get volumes.longhorn.io -o json -n longhorn-system | jq -r '.items[] | select(.status.offlineReplicaRebuildingRequired == true) | .metadata.name'
Update each volume by kubectl patch volumes.longhorn.io <VOLUME-NAME> --type=merge --subresource status --patch 'status: {offlineReplicaRebuildingRequired: false}' -n longhorn-system

PhanLe1010 on Sep 14, 2023

Ok I’ve found that using:

kubectl patch lhv -n longhorn-system pvc-150c0bbc-25d6-4fe2-b76b-6bce680d6975 --type=merge --subresource status --patch 'status: {offlineReplicaRebuildingRequired: false}'

allows me to set the offlineReplicaRebuildingRequired status to false and I’m able to detach the volume afterwards.

h-e-l-o on Sep 14, 2023

We did a live code analysis with @ejweber and @james-munson and have a theory of what could have gone wrong. I am trying to reproduce that theory

PhanLe1010 on Sep 14, 2023

Just in case, I’ve disabled the Offline Replica Rebuilding option, which for some reason was enabled, as it seems to be used/related only to v2 engine.

Yeah, feel free to disable it.

derekbit on Sep 14, 2023

It looks like 21 volumes in this cluster have volume.status.offLineReplicaRebuildingRequired == true, and that has caused most (all?) of them to gain similar attachmentTickets.

From a quick check of the code, we should not ever set volume.status.offLineReplicaRebuildingRequired = true unless a volume is using the v2 engine. There is no evidence that any volumes are using the v2 engine, so further investigation is required.

ejweber on Sep 13, 2023