vitess: Bug Report: v18.0.0-rc1 S3 backup failed to upload
Overview of the Issue
Upgraded from v17.0.3 to v18.0.0-rc1 our backup failed to be upload to S3 like storage (not AWS but more like Minio).
E1005 07:20:00.774018 14 main.go:56] rpc error: code = Unknown desc = TabletManager.Backup on commerce-0000001010 error: MultipartUpload: upload multipart failed
upload id: ZGQ5OWUxNWYtY2IzNS00MDdjLWI1OGItMWNkZDA4M2ZlYzRk
caused by: RequestCanceled: request context canceled
caused by: context canceled: MultipartUpload: upload multipart failed
upload id: ZGQ5OWUxNWYtY2IzNS00MDdjLWI1OGItMWNkZDA4M2ZlYzRk
caused by: RequestCanceled: request context canceled
caused by: context canceled
Reproduction Steps
/vt/bin/vtctldclient --server=vtctld:15999 --logtostderr=true BackupShard --concurrency=1 "commerce/-"
Running vttablet and vtctld with following flag:
--backup_storage_implementation=s3 \
--backup_engine_implementation=xtrabackup \
--s3_backup_aws_region=us-east-1 \
--s3_backup_storage_bucket=default \
--s3_backup_force_path_style=true \
--s3_backup_aws_endpoint=https://* \
--s3_backup_storage_root=*
Binary Version
vtctldclient version Version: 18.0.0-rc1 (Git revision 6ab165ade925b35a00cf447827d874eba13998b6 branch 'heads/v18.0.0-rc1') built on Tue Oct 3 15:00:58 UTC 2023 by vitess@buildkitsandbox using go1.21.1 linux/amd64
Operating System and Environment details
kubernetes / docker official vitess image
Log Fragments
No response
About this issue
- Original URL
- State: open
- Created 9 months ago
- Comments: 30 (15 by maintainers)
This is a little bit of a gray area because on the one hand, we don’t “officially” support s3 like non-s3 storage backends. On the other hand, it has worked for a long time and you and others are relying on it. So for now, I’m willing for us to revert the change that is causing this error and go back to using
Upload
instead ofUploadWithContext
and then we can try and figure out why that is not working as expected. Will you be able to do a PR to make just that change? We don’t want to revert the stats changes from #12500cc @maxenglander
@L3o-pold there’s one more thing you should do. Can you run this again with some logging added? Log the context and its deadline in
s3.go
just before we callUpload
/UploadWithContext
, and at the beginning of the rpc call inrpc_backup.go
. That will tell us if we are somehow overriding the original context or its deadline.rpc_backup.go