velero: velero kopia backup failure

What steps did you take and what happened: I ran the velero backup command to backup one of the namespaces in my kubernetes cluster. however the backup fails with PartiallyFailed error. the error comes for some PVC’s kopia backup. the error is not consistent and it fails for some other PVC when we run the backup next time. the backup fails with this error: error creating uploader: failed to connect repository: error to get repo options: error to get storage variables: error get s3 bucket region: unable to determine bucket’s region

What did you expect to happen: the velero backup should complete sucessfully and it should take all the PVC backup using kopia.

The following information will help us better understand what’s going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help bundle-2023-05-22-09-59-33.tar.gz

Anything else you would like to add: everytime we run the velero backup, its intermittently failing with different PVC’s every time. few times, it completes without any error also. we are not able to find the actual issue

Environment:

  • Velero version (use velero version): Client: v1.10.2 , Server v1.11.0
  • Velero features (use velero client config get features): features: <NOT SET>
  • Kubernetes version (use kubectl version): 1.22
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): linux

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project’s top voted issues listed here.
Use the “reaction smiley face” up to the right of this comment to vote.

  • 👍 for “I would like to see this bug fixed as soon as possible”
  • 👎 for “There are more important bugs to focus on right now”

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 5
  • Comments: 33 (19 by maintainers)

Most upvoted comments

@Lyndon-Li the issue we are seeing is intermittent in nature, every time we run the backup, it fails with different PVC’s, I have attached another bundle where the failure is for other PVC. so we are unable to determine why the backups are failing intermittently with different PVC’s each time. But in the log, we are seeing same error “unable to determine bucket’s region”. bundle-2023-05-22-10-26-45.tar.gz when i ran the get-bucket-location api, it gives the correct location from my local jumpbox. do I need to run this from node-agent pod? if yes, how can i run this? image

we have also created a schedule to backup the namespace every one hour and we are seeing few are completed without error, while some have errors on few PVC’s. if its failing to get bucket region, it should fail for all the PVC’s. image

we would like to understand the reason for this inconsistent behavior, as due to this we are not confidently able to use velero kopia for backup solution.

@amareshgreat 1.11.1 will be available in the next 1 or 2 weeks.

@danfengliu , The IAM role doesn’t have restrictions for region. also if you see bundle log, the backup works for most of the PVC, and fails intermittently for few PVC’s. i assume if there was a problem with permission or bucket region, it would have failed for all the PVC