gsutil: gsutil cp -Z always force adds `Cache-control: no-transform` and `Content-Encoding: gzip`. Breaks http protocol
Curl client should be able to receive the unzipped version, but GCS always returns content-encoding gzip. This breaks HTTP/1.1 protocol since the client didn’t give "Accept-Encoding: gzip, deflate, br"
header.
$ gsutil -m -h "Cache-Control: public,max-age=31536000" cp -Z foo.txt gs://somebucket/foo.txt
Copying file://foo.txt [Content-Type=text/plain]...
- [1/1 files][ 42.0 B/ 12.0 B] 100% Done
Operation completed over 1 objects/12.0 B.
$ curl -v somebucket.io/foo.txt
> GET /foo1.txt HTTP/1.1
> User-Agent: curl/7.37.0
> Host: somebucket.io
> Accept: */*
>
< HTTP/1.1 200 OK
< X-GUploader-UploadID: ...
< Date: Thu, 19 Oct 2017 18:04:05 GMT
< Expires: Fri, 19 Oct 2018 18:04:05 GMT
< Last-Modified: Thu, 19 Oct 2017 18:03:47 GMT
< ETag: "c35fdf2f0c2dcadc46333b0709c87e64"
< x-goog-generation: 1508436227151587
< x-goog-metageneration: 1
< x-goog-stored-content-encoding: gzip
< x-goog-stored-content-length: 42
< Content-Type: text/plain
< Content-Encoding: gzip
< x-goog-hash: crc32c=V/9tDw==
< x-goog-hash: md5=w1/fLwwtytxGMzsHCch+ZA==
< x-goog-storage-class: MULTI_REGIONAL
< Accept-Ranges: bytes
< Content-Length: 42
< Access-Control-Allow-Origin: *
* Server UploadServer is not blacklisted
< Server: UploadServer
< Age: 2681
< Cache-Control: public,max-age=31536000,no-transform
< ��Y�tmpG1oc6S�H���W(�/�I���9�
Seems to be happening here
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 1
- Comments: 36 (12 by maintainers)
@starsandskies thanks 😃
Yes, I see the problem on that nodejs-storage issue. I think though it breaks into two usecases:
using client libraries / gsutil to download files that have already been uploaded, where I can see transitive decompression is a problem for validating the checksums. Appreciate that’s probably blocked on a server-side fix.
using gsutil to upload files to a one-way bucket used for e.g. static website / asset hosting, where end-clients are accessing over http, so the checksum validation on download is not a problem but the forced override of cache headers at upload time is.
AFAICS the second usecase was working without any problems, until the gsutil behaviour was changed to fix the first case.
The key thing is that it’s obviously still valid to have gzipped files in the bucket with transitive decompression enabled - nothing stops you setting your own Cache-Control header after the initial upload. And that obviously fixes usecase 2 but breaks usecase 1. That being the case, I don’t think there’s any good reason why gsutil should silently prevent you from doing that in a single call, even if you want to keep the default behaviour as it is now.
The Object Transcoding documentation and
gsutil cp
documentation should probably be modified to indicate thatgsutil cp -z
disables decompressive transcoding.Thanks @dalbani - I think at this point the problem is well understood and we’re waiting for the Cloud Storage team to prioritize a fix (but to my knowledge it’s not currently prioritized).
@starsandskies thanks for the response - no, I can’t reopen, only core contributors/admins can reopen on Github.
I couldn’t see a branch / pull request relevant to
the underlying behaviour that necessitates the -z behaviour
, do you mean server-side on GCS as mentioned up the thread, or is there an issue / pull request open for that elsewhere that I could reference / add to?I’m happy to make a new issue, tho I think the issue description and first couple of comments here (e.g. https://github.com/GoogleCloudPlatform/gsutil/issues/480#issuecomment-338050378) capture the problem and IMO there’s an advantage to keeping this issue alive since there are already people watching it.
But if you’d prefer a new issue I’ll open one and reference this.
I am seeing a buggy behaviour too where setmeta Cache-Control overrides gzipping functionality
gsutil cp -Z foo.min.js gs://cdn-bucket/foo.min.js
After
gsutil setmeta -h "Cache-Control: public,max-age=31536000" gs://cdn-bucket/foo.min.js
So @thobrla it seems your recommendation of
setmeta
afterwards does not work.