vitess: Bug Report: v18.0.0-rc1 S3 backup failed to upload

Overview of the Issue

Upgraded from v17.0.3 to v18.0.0-rc1 our backup failed to be upload to S3 like storage (not AWS but more like Minio).

E1005 07:20:00.774018      14 main.go:56] rpc error: code = Unknown desc = TabletManager.Backup on commerce-0000001010 error: MultipartUpload: upload multipart failed
	upload id: ZGQ5OWUxNWYtY2IzNS00MDdjLWI1OGItMWNkZDA4M2ZlYzRk
caused by: RequestCanceled: request context canceled
caused by: context canceled: MultipartUpload: upload multipart failed
	upload id: ZGQ5OWUxNWYtY2IzNS00MDdjLWI1OGItMWNkZDA4M2ZlYzRk
caused by: RequestCanceled: request context canceled
caused by: context canceled

Reproduction Steps

/vt/bin/vtctldclient --server=vtctld:15999 --logtostderr=true BackupShard --concurrency=1 "commerce/-"

Running vttablet and vtctld with following flag:

--backup_storage_implementation=s3 \
--backup_engine_implementation=xtrabackup \
--s3_backup_aws_region=us-east-1 \
--s3_backup_storage_bucket=default \
--s3_backup_force_path_style=true \
--s3_backup_aws_endpoint=https://* \
--s3_backup_storage_root=*

Binary Version

vtctldclient version Version: 18.0.0-rc1 (Git revision 6ab165ade925b35a00cf447827d874eba13998b6 branch 'heads/v18.0.0-rc1') built on Tue Oct  3 15:00:58 UTC 2023 by vitess@buildkitsandbox using go1.21.1 linux/amd64

Operating System and Environment details

kubernetes / docker official vitess image

Log Fragments

No response

About this issue

  • Original URL
  • State: open
  • Created 9 months ago
  • Comments: 30 (15 by maintainers)

Most upvoted comments

This is a little bit of a gray area because on the one hand, we don’t “officially” support s3 like non-s3 storage backends. On the other hand, it has worked for a long time and you and others are relying on it. So for now, I’m willing for us to revert the change that is causing this error and go back to using Upload instead of UploadWithContext and then we can try and figure out why that is not working as expected. Will you be able to do a PR to make just that change? We don’t want to revert the stats changes from #12500

cc @maxenglander

@L3o-pold there’s one more thing you should do. Can you run this again with some logging added? Log the context and its deadline in s3.go just before we call Upload/UploadWithContext, and at the beginning of the rpc call in rpc_backup.go. That will tell us if we are somehow overriding the original context or its deadline.

rpc_backup.go

func (tm *TabletManager) Backup(ctx context.Context, logger logutil.Logger, req *tabletmanagerdatapb.BackupRequest) error {