rclone: s3: Multipart files not checked or syncronized based on Checksum (ETag, additional headers)

Multipart uploads to S3 do not have working checksum-based synchronization or validations. This can be implemented in two different ways:

Set additional headers of the known checksums of the file (as a bonus, you can set multiple different checksums on a single file
If the chunk size is consistent, the local ETag can be calculated for comparison. The chunk size can also be stored as a header, to ensure later calculations can be repeated, This is possible as documented here: http://www.perlmonks.org/?node_id=1140988

What is your rclone version (eg output from rclone -V) rclone v1.29

Which OS you are using and how many bits (eg Windows 7, 64 bit) Gentoo Linux, 64-bit

Which cloud storage system are you using? (eg Google Drive) S3 (Ceph Hammer)

The command you were trying to run (eg rclone copy /tmp remote:tmp) rclone sync . $REMOTE_NAME:$BUCKET_NAME/ --checksum

A log from the command with the -v flag (eg output from rclone -v copy /tmp remote:tmp)
rclone  sync .  $REMOTE_NAME:$BUCKET_NAME/      --checksum --dump-headers --dump-bodies
2016/06/14 17:09:27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2016/06/14 17:09:27 HTTP REQUEST
2016/06/14 17:09:27 HEAD /$BUCKET_NAME HTTP/1.1
Host: objects-us-west-1.dream.io
User-Agent: rclone/v1.29
Authorization: AWS XXXXXX:Ke4mVHLXl4WUSPoyfVYZ6LoPFiM=
Date: Wed, 15 Jun 2016 00:09:27 UTC

2016/06/14 17:09:27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2016/06/14 17:09:27 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2016/06/14 17:09:27 HTTP RESPONSE 2016/06/14 17:09:27 HTTP/1.1 200 OK Date: Wed, 15 Jun 2016 00:09:27 GMT X-Amz-Request-Id: tx000000000000001bbb918-0057609cb7-d8d7311-default X-Rgw-Bytes-Used: 44040192 X-Rgw-Object-Count: 3 Content-Length: 0

2016/06/14 17:09:27 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2016/06/14 17:09:27 S3 bucket robjoh84-congress-test2: Building file list 2016/06/14 17:09:27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2016/06/14 17:09:27 HTTP REQUEST 2016/06/14 17:09:27 GET /$BUCKET_NAME?delimiter=&max-keys=1024&prefix= HTTP/1.1 Host: objects-us-west-1.dream.io User-Agent: rclone/v1.29 Authorization: AWS XXXXXXXX:lEMCCAZstZ5rJbzqUt25R59v5j8= Date: Wed, 15 Jun 2016 00:09:27 UTC Accept-Encoding: gzip

2016/06/14 17:09:27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2016/06/14 17:09:27 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2016/06/14 17:09:27 HTTP RESPONSE 2016/06/14 17:09:27 HTTP/1.1 200 OK Content-Length: 525 Content-Type: application/xml Date: Wed, 15 Jun 2016 00:09:27 GMT X-Amz-Request-Id: tx000000000000001bbb91b-0057609cb7-d8d7311-default

<?xml version="1.0" encoding="UTF-8"?><ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>$BUCKET_NAME</Name><Prefix></Prefix><Marker></Marker><MaxKeys>1024</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>test.mem</Key><LastModified>2016-06-15T00:01:37.000Z</LastModified><ETag>"4388daee7926e0b260eaf1bb51bdcb35-7"</ETag><Size>33554432</Size><StorageClass>STANDARD</StorageClass><Owner><ID>robjoh84</ID><DisplayName>$UID</DisplayName></Owner></Contents></ListBucketResult>

2016/06/14 17:09:27 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2016/06/14 17:09:27 S3 bucket $BUCKET_NAME: Waiting for checks to finish 2016/06/14 17:09:27 Waiting for deletions to finish 2016/06/14 17:09:27 S3 bucket $BUCKET_NAME: Waiting for transfers to finish 2016/06/14 17:09:27 S3 bucket $BUCKET_NAME: Waiting for deletes to finish (during+after)

Transferred: 0 Bytes ( 0.00 kByte/s) Errors: 0 Checks: 1 Transferred: 0 Elapsed time: 400ms

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 23 (21 by maintainers)

Commits related to this issue

s3: set/get the hash for multipart files - #523 — committed to rclone/rclone by chris-redekop 6 years ago

Most upvoted comments

Option 2 only lets you calcuate the etag when you download the file which is useful, but not as useful for rclone.

I don’t understand this. Using the S3 ETag seems like the better option since it does not require adding a custom header. The S3 ETag is returned in listings, so you don’t even have to HEAD the object to validate its checksum against the local file. You certainly don’t need to download the file, which would defeat the purpose of .

You don’t need to store the chunk size in a separate header since the number of parts is included in the ETag. The format is “HEX MD5 OF EACH PART’S MD5 - NUMBER OF PARTS”. The chunk size is simply the file size / number of parts. with the remainder being the last part.

jamshid on Jan 9, 2017