longhorn: [BUG] Orphan snapshot attachment tickets prevent volume from detaching

Describe the bug (🐛 if you encounter this issue)

As described in https://github.com/longhorn/longhorn/discussions/6281, I have a few volumes that are shown as “Healthy”, even though they are not attached to any running pod. I believe this is one of the reason for the problems that I am describing as part of https://github.com/longhorn/longhorn/issues/6552 - an ability to properly drain the node.

I would like to understand the reason for the problem as I have about 8 such volumes (out of 42).

I’ve already tried a few things in the UI:

Force Detach: that will detach and immediately reattach the volume
Deleting all snapshots and backups. Not observable difference.
Deleting all snapshots and force detaching. Again, the volume gets reattached right away.

At this point in time, for the volume I did all of the above, there are 12 volumeattachments, all named backup-controller-backup-*

To Reproduce

No idea.

Expected behavior

Volumes that are not attached to a pod eventually transition into Detached state. More importantly, volumes that are not attached to a pod (and haven’t been in month) should not interfere with node draining.

Support bundle for troubleshooting

Please see https://github.com/longhorn/longhorn/issues/6552 though I’d be happy to provide need bundles when required.

Environment

Longhorn version: 1.5.1
Installation method (e.g. Rancher Catalog App/Helm/Kubectl): via ArgoCD based on Helm Chart.
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s 1.24.5
- Number of management node in the cluster: 3
- Number of worker node in the cluster: 8
Node config
- OS type and version: OpenSUSE MicroOS
- Kernel version: 6.4.12
- CPU per node: 4 Cores
- Memory per node: 8GB
- Disk type(e.g. SSD/NVMe/HDD): NVMe
- Network bandwidth between the nodes: 1GBit
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremental and KVM
Number of Longhorn volumes in the cluster: 42
Impacted Longhorn resources: 8

Additional context

About this issue

Original URL
State: closed
Created 10 months ago
Comments: 37 (16 by maintainers)

Most upvoted comments

Since this is in some ways an internal implementation detail and not widely documented, I want to give a brief explanation for why your volumes are not detaching. Each Longhorn volume (in v1.5.0+) has an associated volumeattachment.longhorn.io. Included in this object is a list of attachmentTickets. Each one is essentially a reason for the Longhorn volume to be attached to a node. The most obvious (and generally highest priority) reason is a CSI attachment for a workload. If the CSI plugin wants to attach a volume to a particular node, it almost always gets its way. However, a volume may also be attached to a node so that it can take a snapshot or a backup, etc. If the CSI driver allows the volume to detach, it may remain attached for these other operations.

In your cluster, we see volumeattachment.longhorn.io objects with many attachmentTickets. Included in these are backup controller tickets that are never deleted because the backups are infinitely failing. So, when a volume is detached, longhorn-manager immediately reattaches it to try to complete these backups. #6358 should prevent this behavior, though understanding how the current recurring job configuration leads to the failed backups in the first place (so we don’t have a large number on the cluster) would be good.

ejweber on Sep 8, 2023

The snapshots in question are system-generated ones created when a new replica comes online (probably during replica auto-balance). It appears that the snapshot controller creates an attachmentTicket in the VolumeAttachment CR, then deletes the Snapshot CR without cleaning up the attachmentTicket. The finalizing logic in the attachment controller looks sound, but there must be a race somewhere.

I AM able to reproduce this rarely with this script:

for i in {1..1000}; do kubectl get replica -n longhorn-system -o custom-columns=Name:.metadata.name --no-headers | head -n 1 | xargs kubectl delete -n longhorn-system replica; sleep 30; done

Essentially the script deletes a replica every thirty seconds. Given enough time, it manages to orphan an attachment ticket.

ejweber on Sep 12, 2023

OK, found the root cause by running the recreate script while using a longhorn-manager with a bunch of additional logging.

time="2023-09-13T17:12:26Z" level=info msg="Reconciling snapshotName: 500290d4-e383-4e50-b871-04ebfa7731e2" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="Reconciling snapshot: 500290d4-e383-4e50-b871-04ebfa7731e2 while finalizer exists: true" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="Am reconciling on: &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:500290d4-e383-4e50-b871-04ebfa7731e2 GenerateName: Namespace:longhorn-system SelfLink: UID:257ceb79-b140-457b-9b4e-9cdc03f7bc0f ResourceVersion:50705090 Generation:2 CreationTimestamp:2023-09-13 17:11:48 +0000 UTC DeletionTimestamp:2023-09-13 17:12:26 +0000 UTC DeletionGracePeriodSeconds:0xc008fe8698 Labels:map[longhornvolume:test] Annotations:map[] OwnerReferences:[{APIVersion:longhorn.io/v1beta2 Kind:Volume Name:test UID:84218a8f-a6d9-44b9-ad68-401dedee5fb2 Controller:<nil> BlockOwnerDeletion:<nil>}] Finalizers:[longhorn.io] ManagedFields:[{Manager:longhorn-manager Operation:Update APIVersion:longhorn.io/v1beta2 Time:2023-09-13 17:11:48 +0000 UTC FieldsType:FieldsV1 FieldsV1:{\"f:spec\":{\".\":{},\"f:createSnapshot\":{},\"f:labels\":{},\"f:volume\":{}}} Subresource:} {Manager:longhorn-manager Operation:Update APIVersion:longhorn.io/v1beta2 Time:2023-09-13 17:12:22 +0000 UTC FieldsType:FieldsV1 FieldsV1:{\"f:status\":{\".\":{},\"f:checksum\":{},\"f:children\":{\".\":{},\"f:f413016d-fe19-4a59-b868-e3cbe2fc56f0\":{}},\"f:creationTime\":{},\"f:labels\":{},\"f:markRemoved\":{},\"f:ownerID\":{},\"f:parent\":{},\"f:readyToUse\":{},\"f:restoreSize\":{},\"f:size\":{},\"f:userCreated\":{}}} Subresource:status}]} Spec:{Volume:test CreateSnapshot:false Labels:map[]} Status:{Parent: Children:map[f413016d-fe19-4a59-b868-e3cbe2fc56f0:true] MarkRemoved:true UserCreated:false CreationTime:2023-09-13T17:11:43Z Size:153804800 Labels:map[] OwnerID: Error: RestoreSize:5368709120 ReadyToUse:false Checksum:}}" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="Updating attachmentTicket because old: {AttachmentTickets:map[csi-9fdc3a3f8765082a1ff2c0d6cd1cba8c4e3f806bc12c02249440f87e2e67738d:0xc008b44040] Volume:test} new: {AttachmentTickets:map[csi-9fdc3a3f8765082a1ff2c0d6cd1cba8c4e3f806bc12c02249440f87e2e67738d:0xc008991fc0 snapshot-controller-500290d4-e383-4e50-b871-04ebfa7731e2:0xc008b440c0] Volume:test}" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="Decided to delete attachmentTicket for snapshot: 500290d4-e383-4e50-b871-04ebfa7731e2" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="Deleting attachmentTicket because old: {AttachmentTickets:map[csi-9fdc3a3f8765082a1ff2c0d6cd1cba8c4e3f806bc12c02249440f87e2e67738d:0xc0091c0480 snapshot-controller-500290d4-e383-4e50-b871-04ebfa7731e2:0xc0091c04c0] Volume:test} new: {AttachmentTickets:map[csi-9fdc3a3f8765082a1ff2c0d6cd1cba8c4e3f806bc12c02249440f87e2e67738d:0xc0091c02c0] Volume:test}" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="Decided to remove finalizer for snapshot: 500290d4-e383-4e50-b871-04ebfa7731e2 since attachmentTicket is deleted" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="RemoveFinalizerForSnapshot called for snapshot: 500290d4-e383-4e50-b871-04ebfa7731e2"
time="2023-09-13T17:12:26Z" level=info msg="Actually succeeded in removing finalizer for snapshot: 500290d4-e383-4e50-b871-04ebfa7731e2"

time="2023-09-13T17:12:26Z" level=info msg="Reconciling snapshotName: 500290d4-e383-4e50-b871-04ebfa7731e2" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="Reconciling snapshot: 500290d4-e383-4e50-b871-04ebfa7731e2 while finalizer exists: true" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="Am reconciling on: &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:500290d4-e383-4e50-b871-04ebfa7731e2 GenerateName: Namespace:longhorn-system SelfLink: UID:257ceb79-b140-457b-9b4e-9cdc03f7bc0f ResourceVersion:50705090 Generation:2 CreationTimestamp:2023-09-13 17:11:48 +0000 UTC DeletionTimestamp:2023-09-13 17:12:26 +0000 UTC DeletionGracePeriodSeconds:0xc0094b2750 Labels:map[longhornvolume:test] Annotations:map[] OwnerReferences:[{APIVersion:longhorn.io/v1beta2 Kind:Volume Name:test UID:84218a8f-a6d9-44b9-ad68-401dedee5fb2 Controller:<nil> BlockOwnerDeletion:<nil>}] Finalizers:[longhorn.io] ManagedFields:[{Manager:longhorn-manager Operation:Update APIVersion:longhorn.io/v1beta2 Time:2023-09-13 17:11:48 +0000 UTC FieldsType:FieldsV1 FieldsV1:{\"f:spec\":{\".\":{},\"f:createSnapshot\":{},\"f:labels\":{},\"f:volume\":{}}} Subresource:} {Manager:longhorn-manager Operation:Update APIVersion:longhorn.io/v1beta2 Time:2023-09-13 17:12:22 +0000 UTC FieldsType:FieldsV1 FieldsV1:{\"f:status\":{\".\":{},\"f:checksum\":{},\"f:children\":{\".\":{},\"f:f413016d-fe19-4a59-b868-e3cbe2fc56f0\":{}},\"f:creationTime\":{},\"f:labels\":{},\"f:markRemoved\":{},\"f:ownerID\":{},\"f:parent\":{},\"f:readyToUse\":{},\"f:restoreSize\":{},\"f:size\":{},\"f:userCreated\":{}}} Subresource:status}]} Spec:{Volume:test CreateSnapshot:false Labels:map[]} Status:{Parent: Children:map[f413016d-fe19-4a59-b868-e3cbe2fc56f0:true] MarkRemoved:true UserCreated:false CreationTime:2023-09-13T17:11:43Z Size:153804800 Labels:map[] OwnerID: Error: RestoreSize:5368709120 ReadyToUse:false Checksum:}}" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="Updating attachmentTicket because old: {AttachmentTickets:map[csi-9fdc3a3f8765082a1ff2c0d6cd1cba8c4e3f806bc12c02249440f87e2e67738d:0xc008afb340] Volume:test} new: {AttachmentTickets:map[csi-9fdc3a3f8765082a1ff2c0d6cd1cba8c4e3f806bc12c02249440f87e2e67738d:0xc008afb280 snapshot-controller-500290d4-e383-4e50-b871-04ebfa7731e2:0xc008afb400] Volume:test}" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="Request (user: system:serviceaccount:longhorn-system:longhorn-service-account, longhorn.io/v1beta2, Kind=VolumeAttachment, namespace: longhorn-system, name: test, operation: UPDATE) patchOps: [{\"op\": \"replace\", \"path\": \"/spec/attachmentTickets\", \"value\": {\"csi-9fdc3a3f8765082a1ff2c0d6cd1cba8c4e3f806bc12c02249440f87e2e67738d\":{\"id\":\"csi-9fdc3a3f8765082a1ff2c0d6cd1cba8c4e3f806bc12c02249440f87e2e67738d\",\"type\":\"csi-attacher\",\"nodeID\":\"eweber-v125-worker-e472db53-xjmzr\",\"parameters\":{\"disableFrontend\":\"false\",\"lastAttachedBy\":\"\"},\"generation\":0},\"snapshot-controller-500290d4-e383-4e50-b871-04ebfa7731e2\":{\"id\":\"snapshot-controller-500290d4-e383-4e50-b871-04ebfa7731e2\",\"type\":\"snapshot-controller\",\"nodeID\":\"eweber-v125-worker-e472db53-xjmzr\",\"parameters\":{\"disableFrontend\":\"any\"},\"generation\":1}}}]" service=admissionWebhook
time="2023-09-13T17:12:26Z" level=info msg="Decided to delete attachmentTicket for snapshot: 500290d4-e383-4e50-b871-04ebfa7731e2" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="Deleting attachmentTicket because old: {AttachmentTickets:map[csi-9fdc3a3f8765082a1ff2c0d6cd1cba8c4e3f806bc12c02249440f87e2e67738d:0xc009211740 snapshot-controller-500290d4-e383-4e50-b871-04ebfa7731e2:0xc009211780] Volume:test} new: {AttachmentTickets:map[csi-9fdc3a3f8765082a1ff2c0d6cd1cba8c4e3f806bc12c02249440f87e2e67738d:0xc009211640] Volume:test}" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=error msg="Failed to sync Longhorn snapshot longhorn-system/500290d4-e383-4e50-b871-04ebfa7731e2" controller=longhorn-snapshot error="longhorn-snapshot: failed to sync snapshot longhorn-system/500290d4-e383-4e50-b871-04ebfa7731e2: handleAttachmentTicketDeletion: failed to clean up attachment: Operation cannot be fulfilled on volumeattachments.longhorn.io \"test\": the object has been modified; please apply your changes to the latest version and try again" node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="Reconciling snapshotName: 500290d4-e383-4e50-b871-04ebfa7731e2" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr
time="2023-09-13T17:12:26Z" level=info msg="Did not find snapshot: 500290d4-e383-4e50-b871-04ebfa7731e2" controller=longhorn-snapshot node=eweber-v125-worker-e472db53-xjmzr

Our code assumes that once we remove the finalizer on a snapshot, the snapshot controller will never reconcile it again. Before removing the finalizer, we ensure any attachmentTickets we created have been deleted. However, it is possible that we DO reconcile again. In the example above, the first reconciliation creates an attachmentTicket, successfully cleans it up, and removes the finalizer. The second reconciliation (which we don’t expect to do) is on the SAME resource version. It creates an attachmentTicket, fails to clean it up (usually no big deal, since we can just get it on the next reconciliation), and errors out. We don’t reconcile again (the snapshot object is gone), so the snapshot attachmentTicket remains.

It is known that we may reconcile an object that is previously deleted. We either need to:

Deal with that issue (is there a way to wait for the cache to sync before continuing to reconcile?), or
Rewrite the logic here.

@longhorn/dev-control-plane, any ideas on the first one? I am assuming this is just an aspect of the Kubernetes controller machinery that we have to live with, but I may be missing something.

ejweber on Sep 13, 2023

@docbobo, I think this issue is generally more likely to occur when there is churn on the cluster, especially related to snapshots being created/deleted in quick succession. However, given the root cause, I do not think there is anything in particular you can do to avoid it for now. (After it occurs, you can clean up attachmentTickets as you have been doing.) I will backport the fix to v1.5.2 and add upgrade logic to remove orphaned tickets, so hopefully users won’t hit the issue after upgrading.

ejweber on Sep 14, 2023

Watching both the Snapshot and VolumeAttachment APIs, we see the following string of events:

A Snapshot gets its deletionTimestamp.
The corresponding VolumeAttachment gets a corresponding attachmentTicket.
The corresponding VolumeAttachment gets a corresponding attachmentTicketStatus.
The corresponding VolumeAttachment loses a corresponding attachmentTicket, but still has a corresponding attachmentTicketStatus.
The corresponding VolumeAttachment loses a corresponding attachmentTicketStatus.
The Snapshot is deleted.
The corresponding VolumeAttachment gets a corresponding attachmentTicket.
The corresponding VolumeAttachment gets a corresponding attachmentTicketStatus.

ejweber on Sep 12, 2023

@ejweber after cleaning up the snapshots as suggested, I just had another node stuck. Looking at the volume attachments, I can see about a dozen snapshot-controller-* entries. Given that the last recurring job was running a few hours ago, I find that a little suspicious; it is my understanding that the bug leading to those is fixed in 1.4.1. Will send you a new support bundle.

docbobo on Sep 12, 2023

One more thing @docbobo. @PhanLe1010 and I dove pretty deep into your support bundle, and we are also seeing:

AttachmentTickets for snapshots that no longer exist in the cluster, and
Snapshot objects without a finalizer or label representing snapshots that cannot be found in their corresponding engine.

We’ve been through the code with a fine-toothed comb, and can’t see how this can be the case except if you hit https://github.com/longhorn/longhorn/issues/5762 while you were still running Longhorn v1.4.1 or earlier. The most recent snapshot object we see like this was created on January 11, 2023, so it seems a likely scenario. We think you’ll continue to run into issues getting volumes to fully detach unless the attachmentTickets are cleaned up, and it’s probably a good idea to clean up the snapshot objects as well.

To clean up the attachmentTickets:

Wait for a time of relative stability on the cluster (NOT while recurring jobs are running).
kubectl edit -n longhorn-system lhva
Remove any tickets from spec.attachmentTickets with name like snapshot-controller-*.

To clean up snapshot objects:

Identify the snapshots with no finalizers. For example (in my cluster):

-> kubectl get -n longhorn-system -o custom-columns=Name:metadata.name,Finalizers:.metadata.finalizers --no-headers snapshot
1a5ec6a0-6c72-4528-9ae2-d952bc0a36f8   <none>
6d20e643-1fb8-49a2-9adb-b2cb442454b7   <none>
b283737b-f95d-4349-83b5-57192a74cacb   <none>
f37f4692-86bd-4a4b-b18e-b2845e6db1e6   [longhorn.io]
fd267349-70fc-48a7-bbb6-9a8b1978d712   [longhorn.io]
fe1e92d7-2f3c-4c56-8cc1-eea9a854e218   <none>

-> kubectl get -n longhorn-system -o custom-columns=Name:metadata.name,Finalizers:.metadata.finalizers --no-headers snapshot | grep \<none\>
1a5ec6a0-6c72-4528-9ae2-d952bc0a36f8   <none>
6d20e643-1fb8-49a2-9adb-b2cb442454b7   <none>
b283737b-f95d-4349-83b5-57192a74cacb   <none>
fe1e92d7-2f3c-4c56-8cc1-eea9a854e218   <none>

Delete the snapshots with no finalizers. For example (in my cluster):

-> kubectl get -n longhorn-system -o custom-columns=Name:metadata.name,Finalizers:.metadata.finalizers --no-headers snapshot | grep \<none\> | awk '{print $1}' | xargs kubectl delete snapshot -n longhorn-system
snapshot.longhorn.io "1a5ec6a0-6c72-4528-9ae2-d952bc0a36f8" deleted
snapshot.longhorn.io "6d20e643-1fb8-49a2-9adb-b2cb442454b7" deleted
snapshot.longhorn.io "b283737b-f95d-4349-83b5-57192a74cacb" deleted
snapshot.longhorn.io "fe1e92d7-2f3c-4c56-8cc1-eea9a854e218" deleted

ejweber on Sep 8, 2023

Can you try to add enough time interval between the jobs?

Will do. Should be able to see if that makes a differences pretty soon.

docbobo on Sep 8, 2023