distribution-spec: Proposal: Add the ability to deduplicate uploads

Problem statement

We have many repositories. On the order of ~tens of thousands. These are all managed by independent internal teams. Some teams may be building from a common base layer like Ubuntu. When Ubuntu is revved, it forces a re-upload of that layer to every repository. The security requirement of this solution is that the distribution server must be able to prove that the user has the file they’re referring to. The distribution is NOT required to keep the information secret as to whether is has a given blob when the client attempts to prove its possession of the file, only access to those contents.

The current solution is to use “cross repository blob mounts”, but that’s non-trivial to implement for many users. Specifically, the:

  1. Keep track of which repo each layer originally came from
  2. Check if the blob that came from the relevant upstream repo exists in our private registry’s copy of that repo, if so, do a cross-repo mount.
  3. Otherwise, create the repo and upload a blob + manifests that were responsible for the existence of the previous blob
  4. Cross-mount

Rule #3 is difficult in a couple respects. (1) If a lineage of multiple images is used – say ubuntu -> titusoss/ubuntu -> titusoss/ubuntu-ping, it requires this logic of the “original” repository a blob came from, otherwise a dangling blob uploaded to another repo may be garbage collected without an associated manifest if the upload isn’t completed quickly enough. This is easy if you know all the manifests that the blobs, but that’s not in the image. (2) It also requires that any user can create a repository and read / write to any repo.

Proposal

When the user begins an upload session, and calls POST /v2/<name>/blobs/uploads/, they can optionally supply an argument, digest, with a digest value of the file that comes out at the end. If the registry would like to allow the user to prove that they have this file, it will begin a protocol to allow the user to prove it. The client may choose not to opt into the deduplication method.

The registry will respond with some data that allows the user to generate a random byte of strings of a particular length and value. This length should be of a reasonable size. The user’s job is to then append the value to the blob, and run the given digest algorithm over their new value, and send it to the server. This is made to be possible because many hash functions have a relatively small internal state that can be checkpointed and stored alongside the blob on the server. Client-side, the user may value CPU over bandwidth, or choose to store similar metadata themselves.

Properties

  • generator string This must be a XOF, which is used to generate the data to append to the original value. We can come up with the allowed values later.

  • length int An integer greater than 0 which is used to define the output length to generate from the given XOF in bytes.

  • seed string The input into the XOF

Example:

POST /v2/titusoss/ubuntu/blobs/uploads/?digest=sha256:d9014c4624844aa5bac314773d6b689ad467fa4e1d1a50a1b8a99d5a95f72ff5 HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 0
Host: registry.us-east-1.streamingtest.titus.netflix.net:7002
User-Agent: HTTPie/2.3.0



HTTP/1.1 202 Accepted
Connection: keep-alive
Content-Length: ...
Location: https://registry:7002/v2/titusoss/ubuntu/blobs/uploads/af8e1d5c-33cb-4e6b-a8e3-1c00418f0cfe?_state=mystate
Range: 0-0
Content-type: application/json

{
  "deduplicateUpload": {
    "length": 57,
    "seed": "foo",
    "generator": "blake3"
  }
}

The above defined request is asking the use to prove that they have contents, as described by sha256:d9014c4624844aa5bac314773d6b689ad467fa4e1d1a50a1b8a99d5a95f72ff5. In this case, that described the string Hello, world!\n

The user needs to do the following

  1. Perform a blake3 hash of “foo”.
  2. Generate a 50 bytes hash. In this case, the hex digest of that is 04e0bb39f30b1a3feb89f536c93be15055482df748674b00d26e5a75777702e9791074b7511b59d31c71c62f5a745689fa6c.
  3. Append that to Hello, world!\n.
  4. Take a hash of the new value.

That’s roughly described in the following python session:

In [1]: import blake3, hashlib

In [2]: data = b'Hello, world!\n'

In [3]: hashlib.sha256(data).hexdigest()
Out[3]: 'd9014c4624844aa5bac314773d6b689ad467fa4e1d1a50a1b8a99d5a95f72ff5'

In [4]: extra = blake3.blake3(b"foo").digest(length=50)

In [5]: extra
Out[5]: b'\x04\xe0\xbb9\xf3\x0b\x1a?\xeb\x89\xf56\xc9;\xe1PUH-\xf7HgK\x00\xd2nZuww\x02\xe9y\x10t\xb7Q\x1bY\xd3\x1cq\xc6/ZtV\x89\xfal'

In [6]: hashlib.sha256(data + extra).hexdigest()
Out[6]: '958864784c4661cd235c474a4105deedc00ba21ca372e39e03891a5c3d32696f'

This gave us the resultant hash 958864784c4661cd235c474a4105deedc00ba21ca372e39e03891a5c3d32696f. It must be the same type as the original hash type.

To complete the upload, put with the dedupe header:

PUT https://registry:7002/v2/titusoss/ubuntu/blobs/uploads/af8e1d5c-33cb-4e6b-a8e3-1c00418f0cfe?_state=mystate&dedupe=sha256:958864784c4661cd235c474a4105deedc00ba21ca372e39e03891a5c3d32696f
Content-Length: 0

202 Accepted

At this point, the registry will validate it, and accept it if deduped, or not.

Security implications

  • This will leak if the registry has the image, even if the regitry always adds a dedupe token whether or not it has it because of the timing attacks that can be waged against it during verification
  • The registry musn’t make the length too large, otherwise this can be a DoS vector for the client, and server. We should think about setting an upper limit
  • The “seed” must not be predictable
  • The XOF isn’t really all that important – the security attributes of it, since we’re using it as a random string generator.

FAQ

Should the PUT be a JSON document instead of the query parameter.

Maybe. I’m not sure.

Doesn’t this require that more state is kept in the server, what’s the implication there?

This state can probably be passed back and forth in the state query parameter, or embedded in the location object in another way. Alternatively, this information is pretty tiny (above example ~4 bytes).

Won’t rehashing be horribly expensive?

The SHA256’s hasher can be “checkpointed” and the state can be saved to disk on most implementations.

How do we deal with time, and CPUs becoming faster or one of our hash functions being weakened? (“security”)

You can keep cranking up the length output by the XOF to force more hashing. This also has no requirement to work with only one hash function.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 25 (22 by maintainers)

Most upvoted comments

This only requires that the client possesses an unfinalized hash of the content, which is not really much different from possessing a different hash function of the content. Most proof of data possession algorithms will choose a random piece of the content (eg from a precomputed set) so the client has to actually have the whole content, or some other less deterministic computation.

@shizhMSFT Have you read the updated proposal in the PR?

  1. It addresses #1 since it uses the mount endpoint, and not the start upload endpoint.
  2. The security model is important, and thus the proposal only applies to registries that have per-repository authz, and do not care about timing / disclosure attacks
  3. The PR does not try to propose a secure upload algorithm.

To capture some of the discussion from the call…

There are two opportunities to do this without really changing the spec (much):

  1. Do this during HEAD blob requests.
  2. Do this as part of the cross-repo mount request.

Doing this as part of a HEAD request is problematic:

  1. HEAD should be side-effect free, so we would violate HTTP RFCs.
  2. Doing this would confuse or break HTTP caches in unpredictable ways (due to 1).
  3. There’s no way to distinguish between a client just checking to see if a blob exists and a client that is checking to see if a blob exists because it wants to upload it to the registry.

If we want to do this during cross-repo mounting, we can just make the from parameter optional. Registries that expect from should return a 202 anyway, so the fallback path is fine. If a from is supplied, the registry can take that as a hint of where to look, but has the liberty to mount from anywhere that satisfies the registry’s auth model. Clients can supply the from parameter if they know it, but they can also just omit it. This can be used to skip a blob existence check as well, if a client wants, eliminating an extra roundtrip.

One drawback of this approach is that it is susceptible to timing attacks. Assuming that an existence check is faster than an existence check + an auth check, a client can (probabilisticly) detect the presence of a blob in a registry even if they don’t have access to it by comparing the latency of 202 responses to mount attempts. Certain registries will almost certainly not want to implement this, but fallback behavior is already well-specified.

Common use cases for this would be automatically mounting publicly available blobs (I took some liberties and implemented this with GCR a while ago, using our (partial) Docker Hub mirror if you want to test against a registry) or mounting across somewhat trusted boundaries (e.g. an org-wide registry where you have read access to everything).