distribution-spec: Allow registries to reject non-existent subjects in manifests
During conformance testing it was found that registries which require strong references between manifests and blobs fail conformance due to MUST
language in the spec requiring acceptance of a manifest referencing a non-existent subject manifest. While subject fields may be described as a weak reference, listing and querying them at large scale may require a strong reference (such as foreign key in a database) or may simply be inheriting the data model used in 1.0 which always had referenced objects (as viewed from the merkle DAG) uploaded first.
The arguments for MUST language was to (1) support registries which may have reference only repositories, storing content elsewhere, and (2) ensure referrers exist at manifest pull time since there is no atomic way to upload referrers with manifests.
For (1) the burden will be on the client to handle this case on upload, as a registry is not required to support such repositories.
For (2) clients can retry or check for freshness when validation is a requirement or clients can ensure tags are only updated once all content is available. Similar issues have occurred in the past with multi-platform images. If images were uploaded before all platforms were available, then clients could see a race condition between the platform they need being built and the image they pull having that platform available. The same solution could apply here, use push by digest or a temporary tag when pushing manifests that should not be considered fully available and “tag” it once complete (via upload of the manifest using tag reference).
Changing the language the MUST to MAY makes most sense here. Additionally we can add guidance in the spec on how to perform manifest uploads more transactionally. In the future we could consider a more explicit way to create and manage transactions.
Related to https://github.com/opencontainers/distribution-spec/issues/340 https://github.com/opencontainers/distribution-spec/pull/341
About this issue
- Original URL
- State: open
- Created 10 months ago
- Reactions: 3
- Comments: 52 (33 by maintainers)
I keep thinking about this problem, and I think we’re honestly really close to uploads that look transactional today. I’ll try to explain further, but hopefully this all makes sense. 🤞
For context, I’ll start with the main use case I think is really important for the issue at hand: being able to make sure a signature is available before users try to pull an image (so that policy can be enforced accurately). Without this, we’ll have frequent “brown outs” of image pulls as users race the push of the manifest vs the signature, and the frequency of users hitting those edge cases will increase with the number of users (cue my DOI maintainer hat where we’ve experienced exactly this with previous incarnations of our multi-architecture support and the angry users that generated).
Now my proposed (small, incremental) solution!
Uploads of all objects can occur entirely by digest, including manifests. The only way users can discover a manifest to pull (using only official OCI distribution-spec APIs) is by:
So if I push an object by digest, the chances of someone trying to pull it before I’m “ready” are very, very low. Thus, it’s only tagging that needs to be transactional, right? The only issue I see there is that “tagging” is not a direct action we can perform – it’s a side effect we can achieve by uploading a manifest by name instead of by digest (and manifests might not be small – possibly as much as 4194304 bytes, which is potentially a heavy upload just to update a pointer).
In other (shorter) words, I’m proposing a new (or updated) API endpoint for lightweight tagging of an existing uploaded manifest without having to upload the entire manifest contents again, and I believe this satisfies the need for a transactional API, hopefully in a way that’s easy for existing registries to implement.
In the signatures example, that means this could be our extended “transactional” flow:
(but not having to re-upload potentially 4MiB of useless extra data to accomplish this flow)
While I would prefer, from a client perspective, to keep this MUST to assist in clients having the order option, there was never an intention to break 1.0 image registry storage systems with this MUST requirement. Loosening now seems the only appropriate course of action.
The images and registry is designed around a merkle tree. The reference artifact ends being the top hash in the merkle tree (even if it may not be the first hash clients request). Requiring a registry to accept only the top hash breaks the ability of a registry to verify the merkle tree before accepting content. In terms of “wrong” or “right” way, the reference design has good reasons for this direction. The “MUST” language breaks existing registry design and best practice though.
Any updates here?
In a world where we are building a greenfield application, this is true. However, after we have a working model, any new use cases fundamentally must take into consideration the legacy models and account for changes necessary to support the work.
References are a great concept. Sparse manifests is an interesting concept. But in a world where we have an existing data model, there are many ramifications on the registry side and a lot of undefined behaviors to address. Even with a MUST, we cannot pretend like it’ll be some perfect world of interoperability - there are several gray areas in the spec that can and will lead to different implementations that clients need to account for.
This proposal hinges on multiple prerequisites, each of which would need to be true:
subject
descriptor was intentionally designed for a back reference API and not intended to be followed like other DAG content. If OCI wanted to create a block list manifest in the future, containing descriptors of known malicious content, this view would require that all of the malicious content is first pushed to a registry before the block list could be pushed. Instead of assuming every descriptor is part of the DAG, perhaps OCI should be providing better guidance when a descriptor is not part of that assumption.I don’t believe we have a solution that finds the common ground between the two views here, and we’re unlikely to reach that point with more discussion. We’ve put the issue up for a vote to see if there was consensus, and majority is leaning towards moving forward without a change to the spec. Given the community’s desire to get to a release, I’d suggest we either close this discussion or time box it to prevent it from continuing indefinitely.
I must point out once again that changing a
MUST
to aMAY
doesn’t forbid anything. The distribution-spec doesn’t disallow “sparse images” (index without all referenced content) today; it merely doesn’t mandate them.This discussion seems to stem from the fact that the artifact reference points the “wrong” way. Digests must exist before the manifest because the manifest points at the digests. Following the same logic, artifacts point at the manifest, so the manifest should exist before the artifact.
There are a number of reasons behind the decision to have artifacts function the way they do. But by keeping the MUST here, it comes across like we want artifacts to act like dependencies of the manifest despite functionally existing the other way around. The increase in requests is a downside, but also a consequence of the initial design.
data models we design follow use cases (vs) data models we design prescribe use cases.
+1 here, I want to point out again that this wasn’t a simple oversight or miss, it was a conscious decision to enable a common scenario that is in use today by real workloads - storing signatures/attestations/sboms/etc in a repository separate from the image.
This was captured as a core requirement in the earliest stages of this WG and the MUST was placed there for a reason, to support that scenario.
Looking over the suggestion, for myself this doesn’t offer any value over option 2, so my vote on the issue is unchanged. Concerns I have include:
OCI-Subject
header wouldn’t be used and so every push of a manifest with a subject requires the full fallback processing, adding to client overhead and additional API round trips.artifactType
were requesting the filter as an efficiency, and servers have the option of whether they want to support it. We had discussed whether this should be only a client side feature in the working group, and I don’t want to restart previously settled debates for something that a registry can opt out of.Given that, I’m opposed to the proposal. I think it would have been worth considering during the working group, but this late in the release cycle, I feel it’s too disruptive to the community that has already written so much code both in the working group and now against the RC releases. A registry can decide to not implement the 1.1 spec, sticking with the 1.0, where the subject field is not defined and not part of the DAG, and where clients push the fallback tag as an index. I think gives an identical result to the proposal (content addressability, no separate API, no filtering, all client managed) while allowing clients to push content in either order.
Since we have had a lengthy discussion on this, had a vote that’s been open for several weeks, and the vote is leaning against this request, my suggestion is to close this issue and move on. I say that with a lot of hesitancy because I’d much rather find a solution where everyone meets in the middle. But in this case, it’s been very contentious because there is no middle option that we’ve been able to find.
@sudo-bmitch that is always going to be the case with referrers, as they were created in order to be able to add references at any time in the future, in order to support workflows such as signatures for later approvals, versus build time. This does make mirroring essentially very difficult without an additional API to give a feed of referrer changes, as there is no real definition of “complete” anyway.
Correct; how would you sign the tag with the existing data model today?