velero: CSISnapshotTimeout parameter of backup spec not getting honored in velero-plugin-for-csi while it is getting honored in velero core.

What steps did you take and what happened:

I tried to change the CSISnapshotTimeout to 1 min in the backup CR, but still it waited for default CSI timeout(which is 10min).

What did you expect to happen: The CSISnapshotTimeout should have been honored.

The following information will help us better understand what’s going on: Code reference where CSI timeout is set [It is hardcoded as 10 min]: https://github.com/vmware-tanzu/velero-plugin-for-csi/blob/db448f3f3fb030735da686b489e1bf6e23b6ac1c/internal/util/util.go#L163

Code reference in the velero core code where CSISnapshotTimeout is set: https://github.com/vmware-tanzu/velero/blob/12a14d11e9996688aabd0f23848d8712cc73443b/pkg/controller/backup_controller.go#LL332C6-L332C6

Usage: https://github.com/vmware-tanzu/velero/blob/12a14d11e9996688aabd0f23848d8712cc73443b/pkg/controller/backup_controller.go#L657

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:

Environment:

  • Velero version (use velero version):
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project’s top voted issues listed here.
Use the “reaction smiley face” up to the right of this comment to vote.

  • 👍 for “I would like to see this bug fixed as soon as possible”
  • 👎 for “There are more important bugs to focus on right now”

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18 (13 by maintainers)

Most upvoted comments

@sseago Thanks for a very clear explanation of the issue at hand.

Here is the summary of the issue as I see it after going through the comments few times. Please feel free to correct me if I am wrong:

  1. There is a consensus that the polling in CSI plug-in that waits for snapshot handle to be set should honor “csiSnapshotTimeout” setting. This timeout is currently hard-coded to 10 minutes and PR from @dzaninovic addresses this issue. I think that it should be merged.
  2. There is polling in backup controller that waits for “readyToUse” flag to be set. The purpose of this is to make sure that post-hooks can be run. So we need to be really sure that snapshot is decoupled from live volume at this point. I do think that we need to wait for “readyToUse” to be set as there is no other setting that tells us whether snapshot is “decoupled” but still being uploaded. I will post in sig-storage to see if they have any ideas.
  3. “readyToUse” polling will remain in place and will use “csiSnapshotTimeout” for now. But once BIA V2 is adopted by CSI plugin, this polling will be moved to CSI plug-in and will honor operation timeouts of BIA V2.

The code currently has the timeout hard-coded to 10 minutes as you pointed out. I will work on a fix and submit a pull request.