rclone: s3: Multipart files not checked or syncronized based on Checksum (ETag, additional headers)
Multipart uploads to S3 do not have working checksum-based synchronization or validations. This can be implemented in two different ways:
- Set additional headers of the known checksums of the file (as a bonus, you can set multiple different checksums on a single file
- If the chunk size is consistent, the local ETag can be calculated for comparison. The chunk size can also be stored as a header, to ensure later calculations can be repeated, This is possible as documented here: http://www.perlmonks.org/?node_id=1140988
What is your rclone version (eg output from
rclone -V) rclone v1.29Which OS you are using and how many bits (eg Windows 7, 64 bit) Gentoo Linux, 64-bit
Which cloud storage system are you using? (eg Google Drive) S3 (Ceph Hammer)
The command you were trying to run (eg
rclone copy /tmp remote:tmp) rclone sync . $REMOTE_NAME:$BUCKET_NAME/ --checksumA log from the command with the
-vflag (eg output fromrclone -v copy /tmp remote:tmp)rclone sync . $REMOTE_NAME:$BUCKET_NAME/ --checksum --dump-headers --dump-bodies 2016/06/14 17:09:27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2016/06/14 17:09:27 HTTP REQUEST 2016/06/14 17:09:27 HEAD /$BUCKET_NAME HTTP/1.1 Host: objects-us-west-1.dream.io User-Agent: rclone/v1.29 Authorization: AWS XXXXXX:Ke4mVHLXl4WUSPoyfVYZ6LoPFiM= Date: Wed, 15 Jun 2016 00:09:27 UTC
2016/06/14 17:09:27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2016/06/14 17:09:27 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2016/06/14 17:09:27 HTTP RESPONSE 2016/06/14 17:09:27 HTTP/1.1 200 OK Date: Wed, 15 Jun 2016 00:09:27 GMT X-Amz-Request-Id: tx000000000000001bbb918-0057609cb7-d8d7311-default X-Rgw-Bytes-Used: 44040192 X-Rgw-Object-Count: 3 Content-Length: 0
2016/06/14 17:09:27 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2016/06/14 17:09:27 S3 bucket robjoh84-congress-test2: Building file list 2016/06/14 17:09:27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2016/06/14 17:09:27 HTTP REQUEST 2016/06/14 17:09:27 GET /$BUCKET_NAME?delimiter=&max-keys=1024&prefix= HTTP/1.1 Host: objects-us-west-1.dream.io User-Agent: rclone/v1.29 Authorization: AWS XXXXXXXX:lEMCCAZstZ5rJbzqUt25R59v5j8= Date: Wed, 15 Jun 2016 00:09:27 UTC Accept-Encoding: gzip
2016/06/14 17:09:27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2016/06/14 17:09:27 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2016/06/14 17:09:27 HTTP RESPONSE 2016/06/14 17:09:27 HTTP/1.1 200 OK Content-Length: 525 Content-Type: application/xml Date: Wed, 15 Jun 2016 00:09:27 GMT X-Amz-Request-Id: tx000000000000001bbb91b-0057609cb7-d8d7311-default
<?xml version="1.0" encoding="UTF-8"?><ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>$BUCKET_NAME</Name><Prefix></Prefix><Marker></Marker><MaxKeys>1024</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>test.mem</Key><LastModified>2016-06-15T00:01:37.000Z</LastModified><ETag>"4388daee7926e0b260eaf1bb51bdcb35-7"</ETag><Size>33554432</Size><StorageClass>STANDARD</StorageClass><Owner><ID>robjoh84</ID><DisplayName>$UID</DisplayName></Owner></Contents></ListBucketResult>2016/06/14 17:09:27 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2016/06/14 17:09:27 S3 bucket $BUCKET_NAME: Waiting for checks to finish 2016/06/14 17:09:27 Waiting for deletions to finish 2016/06/14 17:09:27 S3 bucket $BUCKET_NAME: Waiting for transfers to finish 2016/06/14 17:09:27 S3 bucket $BUCKET_NAME: Waiting for deletes to finish (during+after)
Transferred: 0 Bytes ( 0.00 kByte/s) Errors: 0 Checks: 1 Transferred: 0 Elapsed time: 400ms
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 23 (21 by maintainers)
Commits related to this issue
- s3: set/get the hash for multipart files - #523 — committed to rclone/rclone by chris-redekop 6 years ago
I don’t understand this. Using the S3 ETag seems like the better option since it does not require adding a custom header. The S3 ETag is returned in listings, so you don’t even have to HEAD the object to validate its checksum against the local file. You certainly don’t need to download the file, which would defeat the purpose of .
You don’t need to store the chunk size in a separate header since the number of parts is included in the ETag. The format is “HEX MD5 OF EACH PART’S MD5 - NUMBER OF PARTS”. The chunk size is simply the
file size / number of parts. with the remainder being the last part.