longhorn: [BUG] Longhorn Snapshots are not deleted after expired Backups (Velero)
Describe the bug (🐛 if you encounter this issue)
We are using Velero to create backups from the Kubernetes manifests and the persistent volumes (in our example we backup Harbor).
If we create a backup, Velero saves the K8s manifests to a Object Storage (MinIO) and creates snapshots resources to trigger Longhorn backups with the velero-plugin-for-csi. Longhorn writes the backups to another MinIO bucket.
If we delete a Velero backup or the backup is expired, the snapshot (snapshots.longhorn.io) are not deleted:

We are using Velero v1.9.4 with EnableCSI feature and the following plugins:
- velero/velero-plugin-for-csi:v0.4.0
- velero/velero-plugin-for-aws:v1.6.0
We have the same issue in Velero v1.11.0 with EnableCSI feature and the following plugins:
- velero/velero-plugin-for-csi:v0.5.0
- velero/velero-plugin-for-aws:v1.6.0
To Reproduce
Steps to reproduce the behavior:
- Install the newest version of Velero and Rancher-Longhorn
- In Longhorn configre a S3 Backup Target (we are usng MinIO for this)
- Enable CSI Snapshot Support for Longhorn.
- Create a backup (for example with the
Schedulebelow):velero backup create --from-schedule harbor-daily-0200 - Delete the backup
velero backup delete <BACKUPNAME> - The snapshot (
snapshots.longhorn.io) is not deleted.
Expected behavior
The snapshot is deleted.
Environment
- Longhorn version: 102.2.0+up1.4.1
- Velero version:
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Rancher-Longhorn Helm Chart
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: RKE2, v1.25.7+rke2r1
- Number of management node in the cluster: 1x
- Number of worker node in the cluster: 3x
- Node config
- OS type and version: Ubuntu
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): VMs on Proxmox
- Number of Longhorn volumes in the cluster: 17
Additional context
Velero Backup Schedule for Harbor
---
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: harbor-daily-0200
namespace: velero #Must be the namespace of the Velero server
spec:
schedule: 0 0 * * *
template:
includedNamespaces:
- 'harbor'
includedResources:
- '*'
snapshotVolumes: true
storageLocation: minio
volumeSnapshotLocations:
- longhorn
ttl: 168h0m0s #7 Days retention
defaultVolumesToRestic: false
hooks:
resources:
- name: postgresql
includedNamespaces:
- 'harbor'
includedResources:
- pods
excludedResources: []
labelSelector:
matchLabels:
statefulset.kubernetes.io/pod-name: harbor-database-0
pre:
- exec:
container: database
command:
- /bin/bash
- -c
- "psql -U postgres -c \"CHECKPOINT\";"
onError: Fail
timeout: 30s
VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: longhorn
namespace: longhorn-system
labels:
velero.io/csi-volumesnapshot-class: "true"
driver: driver.longhorn.io
deletionPolicy: Delete
VolumeSnapshotClass
In our second cluster, with Velero v1.11.0 installed, we created the following resource (but same issue here):
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: longhorn
namespace: longhorn-system
labels:
velero.io/csi-volumesnapshot-class: 'true'
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
type: bak
VolumeSnapshotLocation
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
name: longhorn
namespace: velero
spec:
provider: longhorn.io/longhorn
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 7
- Comments: 24 (3 by maintainers)
Thanks for the valuable info.
We will improve this, as it’s quite important for space efficiency.
@R-Studio @innobead
I think this issue can be simplified to completely exclude Velero.
At the core the issue here is that Longhorn does not delete snapshots or backups when the backing CSI
VolumeSnapshotresource is deleted.As a user of Longhorn that is interfacing with CSI and not native Longhorn resources, I expect the state of Longhorn resources to reflect the state of my CSI resources.
VolumeSnapshotI expect Longhorn to create a snapshot/backup/bi. This works!VolumeSnapshotI expect Longhorn to delete the backing snapshot/backup/bi that it created. This doesn’t work.Therefore I think it’s fair to state that Longhorn is currently only providing a partial implementation of the CSI interface/spec.
Velero is just using this common CSI interface as it is intended to be used and expecting it to have the desired effect. This is not a Velero issue.
Perhaps this should be opened as a new issue with a smaller scope (CSI spec conformance).
@innobead thanks for your reply. What I want: I have a Velero schedule that creates/triggers backup of my persistent volumes with a retention period of e.g. 7 days. After this retention period 7 days Velero deletes these backups, but the corresponding snapshots are not deleted and consumes disk space that I don’t want.
As a workaround, I have a recurring job that deletes these snapshots (retain 7), but there are two disadvantages.
I’m having almost the same setup and versions and the same issue! One interesting log line found on longhorn-csi-plugin:
longhorn-csi-plugin-5k8lg longhorn-csi-plugin time="2023-10-06T08:12:20Z" level=info msg="DeleteSnapshot: req: {\"snapshot_id\":\"bak://pvc-c57da450-ce82-44c8-ac83-0a039634a334/backup-04db0d0fe4ef49f1\"}" longhorn-csi-plugin-5k8lg longhorn-csi-plugin time="2023-10-06T08:12:20Z" level=info msg="DeleteSnapshot: rsp: {}" csi-snapshotter-5d899fdcfc-xv627 csi-snapshotter E1006 08:12:20.143392 1 snapshot_controller_base.go:265] could not sync content "snapcontent-55c4399b-1dec-4cf2-b9bd-a4eff27f315e": snapshot controller failed to update snapcontent-55c4399b-1dec-4cf2-b9bd-a4eff27f315e on API server: Operation cannot be fulfilled on volumesnapshotcontents.snapshot.storage.k8s.io "snapcontent-55c4399b-1dec-4cf2-b9bd-a4eff27f315e": StorageError: invalid object, Code: 4, Key: /registry/snapshot.storage.k8s.io/volumesnapshotcontents/snapcontent-55c4399b-1dec-4cf2-b9bd-a4eff27f315e, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: d4fa9b1d-416e-4df5-ad74-d3ac6bec3b66, UID in object meta:@tcoupin thanks but this is not a solution because if I use
--snapshot-volumes=falsethen velero does not trigger a backup for the persistent volumes. So Velero only backups the manifests/YAML’s.@weizhe0422 here the support bundle. Thanks for any help. Info
supportbundle_a3236774-99ca-4ab5-a2a5-74c925273bb4_2023-05-01T07-20-00Z.zip