rclone: Google Cloud Storage: Can't download files with Content-Encoding: gzip
What is the problem you are having with rclone?
rclone is unable to download a file from Google Cloud Storage which has Content-Encoding: gzip due to size mismatch. (Or MD5 mismatch when copying to Azure.)
What is your rclone version (output from rclone version)
Reproduced with both the Debian-packaged version:
rclone v1.41
- os/arch: linux/amd64
- go version: go1.10.1
and built from current git master:
rclone v1.44-001-g67703a73-beta
- os/arch: linux/amd64
- go version: go1.10.4
Which OS you are using and how many bits (eg Windows 7, 64 bit)
Debian GNU/Linux 4.18.0 x86_64
Which cloud storage system are you using? (eg Google Drive)
Google Cloud Storage
The command you were trying to run (eg rclone copy /tmp remote:tmp)
echo 'Example content.' > file.txt
gsutil cp -Z file.txt gs://$bucket/file.txt.gz
rclone -vv copy $gcs_remote:$bucket/file.txt.gz file.txt.gz
A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp)
2018/10/15 15:46:26 DEBUG : rclone: Version "v1.44-001-g67703a73-beta" starting with parameters ["./rclone" "-vv" "copy" "gcs:9718a7ca-c0d4-41ac-a0dc-46922b9d541d/file.txt.gz" "file.txt.gz"]
2018/10/15 15:46:26 DEBUG : Using config file from "/home/kevin/.config/rclone/rclone.conf"
2018/10/15 15:46:27 DEBUG : file.txt.gz: Couldn't find file - need to transfer
2018/10/15 15:46:27 ERROR : file.txt.gz: corrupted on transfer: sizes differ 47 vs 17
2018/10/15 15:46:27 INFO : file.txt.gz: Removing failed copy
2018/10/15 15:46:27 ERROR : Attempt 1/3 failed with 1 errors and: corrupted on transfer: sizes differ 47 vs 17
2018/10/15 15:46:27 DEBUG : file.txt.gz: Couldn't find file - need to transfer
2018/10/15 15:46:28 ERROR : file.txt.gz: corrupted on transfer: sizes differ 47 vs 17
2018/10/15 15:46:28 INFO : file.txt.gz: Removing failed copy
2018/10/15 15:46:28 ERROR : Attempt 2/3 failed with 1 errors and: corrupted on transfer: sizes differ 47 vs 17
2018/10/15 15:46:28 DEBUG : file.txt.gz: Couldn't find file - need to transfer
2018/10/15 15:46:28 ERROR : file.txt.gz: corrupted on transfer: sizes differ 47 vs 17
2018/10/15 15:46:28 INFO : file.txt.gz: Removing failed copy
2018/10/15 15:46:28 ERROR : Attempt 3/3 failed with 1 errors and: corrupted on transfer: sizes differ 47 vs 17
2018/10/15 15:46:28 Failed to copy: corrupted on transfer: sizes differ 47 vs 17
I’m guessing that the problem is that GCS reports Content-Length: 47 (the length of the gzip-encoded content) while rclone is using 17 decompressed bytes. (Note: The compressed content is actually larger due to format overhead.) Perhaps a call to ReadCompressed(true) to disable decompression by the Google Client Library would be appropriate?
Thanks, Kevin
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 23 (18 by maintainers)
Commits related to this issue
- gcs: Allow compressed files to be downloaded - fixes #2658 Before this change, the go runtime would automatically decompress compressed objects leading to length mismatches. After this change rclone... — committed to rclone/rclone by ncw 6 years ago
- gcs: Allow compressed files to be downloaded - fixes #2658 Before this change, the go runtime would automatically decompress compressed objects leading to length mismatches. After this change rclone... — committed to rclone/rclone by ncw 6 years ago
- gcs: Allow compressed files to be downloaded - fixes #2658 Before this change, the go runtime would automatically decompress compressed objects leading to length mismatches. After this change rclone... — committed to rclone/rclone by ncw 6 years ago
- gzip: new backend for wrapping Content-Encoding: gzip backends FIXME WIP See: #2658 — committed to rclone/rclone by ncw 4 years ago
- gcs: Allow compressed files to be downloaded - fixes #2658 Before this change, the go runtime would automatically decompress compressed objects leading to length mismatches. After this change rclone... — committed to rclone/rclone by ncw 6 years ago
- gcs: allow uncompressed downloads of Content-Type encoding gzip #2658 — committed to rclone/rclone by ncw 3 years ago
- gcs: Fix download of "Content-Encoding: gzip" compressed objects Before this change, if an object compressed with "Content-Encoding: gzip" was downloaded, a length and hash mismatch would occur since... — committed to rclone/rclone by ncw 2 years ago
- gcs: Fix download of "Content-Encoding: gzip" compressed objects Before this change, if an object compressed with "Content-Encoding: gzip" was downloaded, a length and hash mismatch would occur since... — committed to cogniteev/rclone by ncw 2 years ago
- gcs: Fix download of "Content-Encoding: gzip" compressed objects Before this change, if an object compressed with "Content-Encoding: gzip" was downloaded, a length and hash mismatch would occur since... — committed to rclone/rclone by ncw 2 years ago
- s3: add --s3-decompress flag to download gzip-encoded files Before this change, if an object compressed with "Content-Encoding: gzip" was downloaded, a length and hash mismatch would occur since the ... — committed to rclone/rclone by ncw 2 years ago
- s3: add --s3-decompress flag to download gzip-encoded files Before this change, if an object compressed with "Content-Encoding: gzip" was downloaded, a length and hash mismatch would occur since the ... — committed to rclone/rclone by ncw 2 years ago
- s3: add --s3-decompress flag to download gzip-encoded files Before this change, if an object compressed with "Content-Encoding: gzip" was downloaded, a length and hash mismatch would occur since the ... — committed to rclone/rclone by ncw 2 years ago
I’ve merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.59
@panthony if you’d like to see this change for s3 &or azureblob then please make a new issue - thank you.
That is a good workaround and is equivalent to the patch above.
I had another idea about this here
v1.56.0-beta.5387.42a7efc4a.fix-2658-gcs-gzip-unknown-size on branch fix-2658-gcs-gzip-unknown-size (uploaded in 15-30 mins)
This modifies the patch above and if a gzipped object is detected then
This would let them be downloaded decompressed.
So there are two approaches
Do you think rclone should engage 1) automatically with 2) being an option?
😃
There are two sorts of metadata, general purpose key, value storage and what I’ll call HTTP metadata
Rclone deals with
Content-Typealready but it doesn’t deal with the other kinds of metadata.There is an issue already about custom metadata #111. That is quite an old issue but I think it would be much easier to implement now-a-days. You’ll see various other issues linked from there.
It would be nice if rclone could copy metadata from cloud to cloud, and also set it on upload. This would be a reasonably big project though!
The way it would work is that I’d give each Object an optional interface
ReadMetadataand that would be supplied. On upload this would be read and set on the object.That would be useful - can you put it in #111. I’ll move that issue up into the run Q since I think its time has come 😃
I think the only thing I’m not sure about is how to represent the http metadata and the non http metadata in a cross cloud sort of way. Perhaps ReadMetadata should return two dictionaries or one dictionary and one
http.HeaderThinking aloud this could also subsume the
Content-Typemechanism.Docs
This could also (with a flag) be implemented as attributes on local files.