dvc: ETag mismatch on MinIO external dependency add.
Assuming we have MinIO instance set up with two buckets (dvc-cache
, data
) on localhost:9000, and we try to add data from data
bucket as external dependency we will get ETag mismatch
error.
Example:
#!/bin/bash
rm -rf repo
mkdir repo
pushd repo
git init --quiet
dvc init -q
export AWS_ACCESS_KEY_ID="minioadmin"
export AWS_SECRET_ACCESS_KEY="minioadmin"
dvc remote add s3cache s3://dvc-cache/cache
dvc config cache.s3 s3cache
dvc remote modify s3cache endpointurl http://localhost:9000
dvc remote modify s3cache use_ssl False
dvc remote add miniodata s3://data
dvc remote modify miniodata endpointurl http://localhost:9000
dvc remote modify miniodata use_ssl False
dvc add remote://miniodata/file
Will result with:
ERROR: ETag mismatch detected when copying file to cache! (expected:
'4e102ec8d6ab714aae04d9e3c7b4c190-1', actual: 'ca9e5ed43f3fbee6edec
bb5ac6fba77e-1')
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 29 (11 by maintainers)
Here’s what I tried:
docker run -v $PWD:/data -p 9000:9000 minio/minio:RELEASE.2020-03-06T22-23-56Z server --compat /data
I still got a similar error:
https://github.com/minio/minio#caveats to understand more about why
--compat
might be needed here.No, it is not Parts can be uploaded in this manner
This will result in ETag as
md5hex(md5(5MiB) + md5(6MiB) + md5(1byte)
-3Now if you assume 3 parts content-length is 11MiB you have no idea what is the length used for 1st part, 2nd part - if you happen to choose 5MiB for both then you will result with an incorrect ETag which will mismatch. I can reproduce this right now with
dvc
using AWS S3. Of course I assume that this is not handled because it is a corner case and rare. Just so that you are aware I am clarifying this a bit.multipart ETAG is nothing but the
hexmd5(md5(part1) + md5(part2)...)-N
this is documented not in AWS S3 docs but found while talking to AWS support.The server-side copy of parts is called CopyObjectPart() - which I see that you are using when you see ETag as a
-
at the end.NOTE: This assumption will also fail for SSE-C encrypted objects as well because AWS S3 doesn’t return a proper ETag when you have SSE-C encrypted objects - meaning an SSE-C object will change its ETag automatically upon an overwrite.
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html
@shcheklein, there’s an issue which was closed as it was supposed to work as this on minio. See: https://github.com/minio/minio/issues/8012#issuecomment-519757286
We could however suggest user to use
--compat
or find a better way thanEtag
.