buildx: build cache format in registry is incorrect

Following is an example of cache pushed to hub registry using --cache-to=type=registry flag:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
  "manifests": [
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "digest": "sha256:0057xxx",
      "size": 24532,
      "annotations": {
        "buildkit/createdat": "2019-10-21T12:27:45.739079211Z",
        "containerd.io/uncompressed": "sha256:b23fxxx"
      }
    },
   ...
}

The media type inside the manifests array should be either application/vnd.docker.distribution.manifest.v2+json or application/vnd.docker.distribution.manifest.v1+json according to the docker manifest list spec. Moreover while it is ok to use annotations with docker images since it is backward compatible the right place to use them should be OCI index/images. cc @dmcgowan

This is reported in https://github.com/docker/hub-feedback/issues/1906.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 2
  • Comments: 22 (9 by maintainers)

Most upvoted comments

As far as I’m aware, manifest lists were created to provide a list of assets that allow for a “shallow” pull (i.e., implementation to make a selection from the list based on their metadata and only pull what it needs / understands).

I understand that nowadays we have different needs, but the Docker spec is very clear and precise about this IMO, and registries have been following that:

The manifest list is the “fat manifest” which points to specific image manifests for one or more platforms.


Given the spec being explicit about how layers should be handled, I think “layer order doesn’t matter” is outside of the spec.

Fully agree. I guess the point is, while we don’t have an ultimate solution (which will take a long time to be adopted), what’s the easiest and smallest change that can:

  • Preserve UX (end-to-end, not just on the client/buildkit side)
  • Make this compatible with all or almost all OCI compliant registries

I might be biased, but changing how several registries work to conform with something out of spec (IMO) seems to be the hardest way.

Just chiming in here: a manifest can be any typed content. The spec does not restrict it to just “manifests”.

Yes, this object was changed to oci mediatype later, without revalidation with the spec. I think it was done to make Google’s Artifact registry (or maybe Quay) work.

@tonistiigi, thanks for providing some historical context. From what I see, it was due to Quay. However, today, it’s still not possible to push/pull cache images to/from Quay, just like for the majority of the SaaS registries (including ECR, GCR, Artifactory).

The underlying reason seems to be the same as the one we’re concerned about here. So I don’t think it’s fair to say this is “because someone wrote a new doc or didn’t understand the implications of code that they copy-pasted”.

Even if you disagree with manifest vs layer, the spec clearly says to ignore all objects with non-manifest mediatype unless registry has own implementation for supporting them

The spec says: “An encountered mediaType that is unknown to the implementation MUST be ignored.” The problem is that the media type being used (application/vnd.oci.image.layer.v1.tar+gzip) is not unknown to OCI registries. It’s well known, and it’s the media type of an OCI layer, not a manifest.

Still, even with a fake media type, this would fail against most registries because these are not uploaded through the manifest API but rather the blob API (because they are not manifests), so they are not linked in the right places then found when the index arrives. So this ties back to the manifest vs. layer conversation, but we don’t seem to agree on that.

Please note that the only reason this currently works against the Distribution registry (maintainer here) is because of an old workaround (which converged blobs and manifests links), for which there is a PR to revert it (https://github.com/distribution/distribution/pull/3365). So the existing compatibility with this specific registry is due to a side effect, not by design (IMO).


First, as far as I understand there are two quite different things, OCI Image manifest and OCI artifact support (or OCI Image manifest with artifact support?).

That’s correct, and this seems to be a great fit for artifacts. However, even for those, the manifest vs. layer is respected whenever applicable. Otherwise, it’s not compatible with most registries, like cache images. For Helm charts (layer order doesn’t matter as well), for example (source):

Index:

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:31fb454efb3c69fafe53672598006790122269a1b3b458607dbe106aba7059ef",
      "size": 354,
      "annotations": {
        "org.opencontainers.image.ref.name": "localhost:5000/myrepo/mychart:2.7.0"
      }
    }
  ]
}

Manifest:

{
  "schemaVersion": 2,
  "config": {
    "mediaType": "application/vnd.cncf.helm.config.v1+json",
    "digest": "sha256:8ec7c0f2f6860037c19b54c3cfbab48d9b4b21b485a93d87b64690fdb68c2111",
    "size": 117
  },
  "layers": [
    {
      "mediaType": "application/tar+gzip",
      "digest": "sha256:1b251d38cfe948dfc0a5745b7af5ca574ecb61e52aed10b19039db39af6e1617",
      "size": 2487
    }
  ]
}

This does not go against any of the relationship/reference rules, so it should work with most registries if they follow the spec and “ignore an unknown media type”, which applies perfectly to this case.


Cache manifests can’t be implemented with image manifests

For example, the spec has very strict requirements for the layers objects, how they need to be ordered and how they are applied on top of each other. As I mentioned, all cache manifest objects are independent of each other and pulled separately, there is no order, and they don’t apply on top of each other. This is quite important for compatibility with other clients that would just pull all of them and try to merge them.

Yeah, the order is important, but I’d argue that that only matters if we’re concerned about creating containers based on these images, which is not the case for cache images, as you mention.


Based on all of this, if a manifest was used instead of an index/list, the result would be:

  • Cache images could be pushed and pulled from all OCI compliant registries without any customizations in any of them. As of today, it does not work for most;
  • Pulling an image with Docker would still fail as it does today (just like it does for Helm images), just with a different error message;
  • Buildkit could still pick whatever layers it wants from the manifest. Their order in the JSON payload wouldn’t matter.
  • Anything else I’m missing here?

In regards to backward compatibility, maybe we could:

  • Build/export: From now onwards, a manifest would be used to represent all new cache images;
  • Import: If a pulled cache image is an index/list, then parse it as it was done before (the code would be kept around for backward compatibility). If it’s a manifest, parse it with the new logic.

Does this seem technically feasible?

OCI spec was written years after(by other people), doesn’t strictly forbid it but leaves it to the implementation.

I don’t believe that this is true reading the spec: https://github.com/opencontainers/image-spec/blob/master/image-index.md

manifests array of objects

This REQUIRED property contains a list of manifests for specific platforms. While this property MUST be present, the size of the array MAY be zero.

A layer is not a manifest, so having a layer within the manifests array here violates the spec from my understanding.