velero: Removal of expired backups does not work
What steps did you take and what happened:
In our AWS-based setup, when scheduled backup reach their TTL, the deletion process is started but gets stuck in status Deleting
. The contents in S3 for the bucket are properly deleted, while volume snapshots stay (causing significant extra cost).
What did you expect to happen:
I expect backups to be cleanly removed when their TTL expires, including all backed up data, such as volume snapshots.
The output of the following commands will help us better understand what’s going on: (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
time="2020-11-18T08:23:47Z" level=info msg="Removing existing deletion requests for backup" backup=velero-nightly-backup-20201103034355 controller=backup-deletion logSource="pkg/controller/backup_deletion_controller.go:469" name=velero-nightly-backup-20201103034355-gt6b9 namespace=velero
time="2020-11-18T08:23:50Z" level=error msg="Error in syncHandler, re-adding item to queue" controller=backup-deletion error="error downloading backup: error copying Backup to temp file: rpc error: code = Unknown desc = error getting object backups/velero-nightly-backup-20201103034355/velero-nightly-backup-20201103034355.tar.gz: NoSuchKey: The specified key does not exist.\n\tstatus code: 404, request id: 01DBEB5FABBF40BD, host id: HKe3B0heM0NpUhxXbLEZp7THCXtsfDKJkYdR6Sg0bS3+j0ywshitElmEnG7mPdDNmq6ASEtKT6w=" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/restore_controller.go:558" error.function=github.com/vmware-tanzu/velero/pkg/controller.downloadToTempFile key=velero/velero-nightly-backup-20201103034355-gt6b9 logSource="pkg/controller/generic_controller.go:140"
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
Name: velero-nightly-backup-20201103034355
Namespace: velero
Labels: app.kubernetes.io/instance=velero
app.kubernetes.io/managed-by=Tiller
app.kubernetes.io/name=velero
helm.sh/chart=velero-2.0.3
velero.io/schedule-name=velero-nightly-backup
velero.io/storage-location=aws
Annotations: <none>
Phase: Deleting
Errors: 0
Warnings: 0
Namespaces:
Included: *
Excluded: <none>
Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto
Label selector: <none>
Storage Location: aws
Velero-Native Snapshot PVs: auto
TTL: 360h0m0s
Hooks: <none>
Backup Format Version:
Started: 2020-11-03 04:43:55 +0100 CET
Completed: 2020-11-03 04:52:17 +0100 CET
Expiration: 2020-11-18 04:43:55 +0100 CET
Velero-Native Snapshots: 2 of 2 snapshots completed successfully (specify --details for more information)
Deletion Attempts:
2020-11-18 06:30:25 +0100 CET: InProgress
velero backup logs <backupname>
Logs for backup "velero-nightly-backup-20201103034355" are not available until it's finished processing. Please wait until the backup has a phase of Completed or Failed and try again.
Anything else you would like to add:
My guess is that this at least loosely related to https://github.com/vmware-tanzu/velero/pull/2993.
Environment:
Velero version:
Client:
Version: v1.5.2
Git commit: -
Server:
Version: v1.5.2
Velero features:
features: <NOT SET>
Kubernetes version: 1.18.9
Kubernetes installer & version: kops 1.18.1
Cloud provider or hardware configuration: AWS (with aws plugin)
Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project’s top voted issues listed here.
Use the “reaction smiley face” up to the right of this comment to vote.
- 👍 for “I would like to see this bug fixed as soon as possible”
- 👎 for “There are more important bugs to focus on right now”
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 9
- Comments: 18 (9 by maintainers)
please add
velero backup delete --force
parameter. I do not thinkkubeclt delete backup
is a good idea.@billimek For me it was the only way to clean this up, and I did not encounter any problems afterwards so far. But still, it’s just guessing 😉
I think that I’m experiencing this as well. Is it ‘safe’ to manually delete the backups.velero.io objects that seem to be ‘stuck’ deleting (e.g.
k delete backups.velero.io -n velero velero-daily-backup-20201212060042
)