longhorn: [BUG] Orphan snapshot attachment tickets prevent volume from detaching
Describe the bug (🐛 if you encounter this issue)
As described in https://github.com/longhorn/longhorn/discussions/6281, I have a few volumes that are shown as “Healthy”, even though they are not attached to any running pod. I believe this is one of the reason for the problems that I am describing as part of https://github.com/longhorn/longhorn/issues/6552 - an ability to properly drain the node.
I would like to understand the reason for the problem as I have about 8 such volumes (out of 42).
I’ve already tried a few things in the UI:
- Force Detach: that will detach and immediately reattach the volume
- Deleting all snapshots and backups. Not observable difference.
- Deleting all snapshots and force detaching. Again, the volume gets reattached right away.
At this point in time, for the volume I did all of the above, there are 12 volumeattachments, all named backup-controller-backup-*
To Reproduce
No idea.
Expected behavior
Volumes that are not attached to a pod eventually transition into Detached state. More importantly, volumes that are not attached to a pod (and haven’t been in month) should not interfere with node draining.
Support bundle for troubleshooting
Please see https://github.com/longhorn/longhorn/issues/6552 though I’d be happy to provide need bundles when required.
Environment
- Longhorn version: 1.5.1
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): via ArgoCD based on Helm Chart.
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s 1.24.5
- Number of management node in the cluster: 3
- Number of worker node in the cluster: 8
- Node config
- OS type and version: OpenSUSE MicroOS
- Kernel version: 6.4.12
- CPU per node: 4 Cores
- Memory per node: 8GB
- Disk type(e.g. SSD/NVMe/HDD): NVMe
- Network bandwidth between the nodes: 1GBit
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremental and KVM
- Number of Longhorn volumes in the cluster: 42
- Impacted Longhorn resources: 8
Additional context
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 37 (16 by maintainers)
Since this is in some ways an internal implementation detail and not widely documented, I want to give a brief explanation for why your volumes are not detaching. Each Longhorn volume (in v1.5.0+) has an associated
volumeattachment.longhorn.io. Included in this object is a list ofattachmentTickets. Each one is essentially a reason for the Longhorn volume to be attached to a node. The most obvious (and generally highest priority) reason is a CSI attachment for a workload. If the CSI plugin wants to attach a volume to a particular node, it almost always gets its way. However, a volume may also be attached to a node so that it can take a snapshot or a backup, etc. If the CSI driver allows the volume to detach, it may remain attached for these other operations.In your cluster, we see
volumeattachment.longhorn.ioobjects with manyattachmentTickets. Included in these are backup controller tickets that are never deleted because the backups are infinitely failing. So, when a volume is detached, longhorn-manager immediately reattaches it to try to complete these backups. #6358 should prevent this behavior, though understanding how the current recurring job configuration leads to the failed backups in the first place (so we don’t have a large number on the cluster) would be good.The snapshots in question are system-generated ones created when a new replica comes online (probably during replica auto-balance). It appears that the snapshot controller creates an
attachmentTicketin the VolumeAttachment CR, then deletes the Snapshot CR without cleaning up theattachmentTicket. The finalizing logic in the attachment controller looks sound, but there must be a race somewhere.I AM able to reproduce this rarely with this script:
Essentially the script deletes a replica every thirty seconds. Given enough time, it manages to orphan an attachment ticket.
OK, found the root cause by running the recreate script while using a longhorn-manager with a bunch of additional logging.
Our code assumes that once we remove the finalizer on a snapshot, the snapshot controller will never reconcile it again. Before removing the finalizer, we ensure any
attachmentTicketswe created have been deleted. However, it is possible that we DO reconcile again. In the example above, the first reconciliation creates anattachmentTicket, successfully cleans it up, and removes the finalizer. The second reconciliation (which we don’t expect to do) is on the SAME resource version. It creates anattachmentTicket, fails to clean it up (usually no big deal, since we can just get it on the next reconciliation), and errors out. We don’t reconcile again (the snapshot object is gone), so the snapshotattachmentTicketremains.It is known that we may reconcile an object that is previously deleted. We either need to:
@longhorn/dev-control-plane, any ideas on the first one? I am assuming this is just an aspect of the Kubernetes controller machinery that we have to live with, but I may be missing something.
@docbobo, I think this issue is generally more likely to occur when there is churn on the cluster, especially related to snapshots being created/deleted in quick succession. However, given the root cause, I do not think there is anything in particular you can do to avoid it for now. (After it occurs, you can clean up
attachmentTicketsas you have been doing.) I will backport the fix tov1.5.2and add upgrade logic to remove orphaned tickets, so hopefully users won’t hit the issue after upgrading.Watching both the Snapshot and VolumeAttachment APIs, we see the following string of events:
@ejweber after cleaning up the snapshots as suggested, I just had another node stuck. Looking at the volume attachments, I can see about a dozen
snapshot-controller-*entries. Given that the last recurring job was running a few hours ago, I find that a little suspicious; it is my understanding that the bug leading to those is fixed in 1.4.1. Will send you a new support bundle.One more thing @docbobo. @PhanLe1010 and I dove pretty deep into your support bundle, and we are also seeing:
We’ve been through the code with a fine-toothed comb, and can’t see how this can be the case except if you hit https://github.com/longhorn/longhorn/issues/5762 while you were still running Longhorn v1.4.1 or earlier. The most recent snapshot object we see like this was created on January 11, 2023, so it seems a likely scenario. We think you’ll continue to run into issues getting volumes to fully detach unless the attachmentTickets are cleaned up, and it’s probably a good idea to clean up the snapshot objects as well.
To clean up the attachmentTickets:
kubectl edit -n longhorn-system lhvaspec.attachmentTicketswith name likesnapshot-controller-*.To clean up snapshot objects:
Will do. Should be able to see if that makes a differences pretty soon.