nodejs-storage: CONTENT_DOWNLOAD_MISMATCH with successful file download
Environment details
- OS: Windows 10 Pro 1809
- Node.js version: v10.15.3 LTS
- npm version: 6.4.1
@google-cloud/storage
version: 2.5.0
Steps to reproduce
- Prepare a bucket with Cloud KMS Customer managed Key encryption. All files are private.
- Upload an JPEG image file using:
function UploadFile() {
storage.bucket("somebucket").upload("somefolder/someimage.jpg", {
gzip: true,
destination: "somefolder/someimage.jpg",
}).then(results => {
console.log("upload OK");
})
.catch(err => {
console.error("Error: ", err);
});
}
UploadFile()
// Upload works fine.
- Try downloading the img file using:
function DownloadFile() {
const file = storage.bucket("somebucket").file("somefolder/someimage.jpg");
file.download({
destination: "someimage.jpg",
// validation: false, // Why even disable validation?
}).then(res => {
console.log("DL OK");
}).catch(err => {
console.error(err); // Throws CONTENT_DOWNLOAD_MISMATCH
})
}
DownloadFile()
- Correct decompressed file is downloaded AND exception is thrown:
code=CONTENT_DOWNLOAD_MISMATCH message=The downloaded data did not match the data from the server. To be sure the content is the same, you should download the file again.
I’ve read through Issue 566, but seems like not a solution.
storage.bucket("somebucket").file("somefolder/someimage.jpg").download({validation: false});
works, but there’s no reason to or should disable validation.
To make sure if the hashes are actually a mismatch, I ran a local md5sum check on the downloaded and original image files.
$ md5sum downloaded.jpg original.jpg
a045a2e8e6b8d84aa8a319bcdba05419 downloaded.jpg
a045a2e8e6b8d84aa8a319bcdba05419 original.jpg
Downloaded file is a match to the original file.
BTW, This problem doesn’t happen if the image is not gzipped on upload.
Thanks
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 9
- Comments: 18 (13 by maintainers)
Validation error because of crc32c check failure. https://github.com/googleapis/nodejs-storage/blob/61eeb64a6bca194361aacf2a312d27d0e6d63b35/src/file.ts#L1372
x-goog-hash
has the crc32c value of gzip compressed data. (same as generated on upload ). Here crc32c value is mismatching because the client is calculating the crc32c value on uncompressed data.Same error if validation set to md5
So if gzip response header not found in download response then we should ignore crc32c check on uncompressed data.
That sounds like a good plan in the mean time.
Please move forward it and add a note that this is only temporary.
Here is the current condition. https://github.com/googleapis/nodejs-storage/blob/71a4f59343bbe9b0c00ebdf68f6fdf9d214727cc/src/file.ts#L1300-L1303
This condition can be change as per data is compress or not. If the gzip header is present then only validate CRC and MD5.
Same thing is implemented in go storage client.
https://github.com/googleapis/google-cloud-go/blob/71971b35976fc2f904ed2772536790a5458d9996/storage/reader.go#L205