official-images: "no supported platform found in manifest list" / "no matching manifest for XXX in the manifest list entries"

TLDR: Not all architectures are created equal, but perhaps even more importantly, not all build servers we have access to are equal in performance, power, or ability to process builds reliably.

Important: Please do not post here with reports of individual image issues – we’re aware of the overall problem, and this issue is a discussion of solving it generally. Off-topic comments will be deleted.


When we merge an update PR to https://github.com/docker-library/official-images, it triggers Jenkins build jobs over in https://doi-janky.infosiftr.net/job/multiarch/ (see https://github.com/docker-library/official-images/issues/2289 for more details on our multiarch approach).

Sometimes, we’ll have non-amd64 image build jobs finish before their amd64 counterparts, and due to the way we push the manifest list objects to the library namespace on the Docker Hub, that results in amd64-using folks (our primary target users) getting errors of the form “no supported platform found in manifest list” or “no matching manifest for XXX in the manifest list entries” (see linked issues below for several reports from users of this variety).

Thus, manifest lists under the library are “eventually consistent” – once all arches complete successfully, the manifest lists get updated to include all the relevant sub-architectures.

Our current method for combating the main facet of this problem (missing amd64 images while other arches are successfully built and available) is to trigger amd64 build jobs within an hour after the update PR is merged, and all other arches only within 24 hours. This helps to some degree in ensuring that amd64 builds first, but not always. For example, our arm32vN servers are significantly faster than our AWS-based amd64 server, so if those jobs happen to get queued at the same time as existing amd64 jobs are, they’ll usually finish a lot more quickly. Additionally, given the slow IO speed of our AWS-based amd64 build server, the queue for amd64 build jobs piles up really quickly (which also doesn’t help with keeping our build window low).

As for triggering jobs more directly, the GitHub webhooks support in Jenkins makes certain assumptions about how jobs and pipelines are structured/triggered, and thus we can’t use GitHub’s webhooks to effectively trigger these jobs (without doing additional custom development to sit between the two systems), and thus rely on the built-in Jenkins polling mechanism. This has been fine (we haven’t noticed any scalability issues with how often we’re polling), and even if we were triggering builds more aggressively, that’s only half the problem (since then our build queues would just pile up faster).

One solution that has been proposed is to wait until all architectures successfully build before publishing the relevant manifest list. If a naïve version of this suggestion were implemented right now, we would have no image updates published because our s390x worker is currently down (as an example – we do frequently lose builder nodes given that all non-amd64 arches are using donated resources). Additionally, as noted above, some architectures build significantly slower than others (before we got our hyper-fast ARM hardware, arm32vN used to take days to build images like python), so it isn’t exactly fair to force all architectures to wait for the one slowpoke before providing updated images to our userbase. As a final thought on this solution, some architectures outright fail, and the maintainers don’t necessarily notice or even care (for example, mongo:3.6 on windows-amd64 has been failing consistently with a mysterious Windows+Docker graph driver error that we haven’t had a chance to look into or escalate, and wouldn’t be fair to block updated image availability on).

One compromise would be to use the Jenkins Node API (https://doi-janky.infosiftr.net/computer/multiarch-s390x/api/json) to determine whether a particular builder is down in order to determine whether to block on builds of that architecture. Additionally, we could try to get creative with checking pending builds / queue length for a particular architecture’s builds to determine whether or not a given architecture is significantly backlogged and thus a good candidate for not waiting.

We could also attempt to determine when a particular tag was added/merged, and set a time limit for some number of hours before we just assume it must be backlogged, failing, or down and move along without that tag, but this is slightly more complicated (since we don’t have a modification time for a particular tag directly, and really can only determine that information on an image level without complex Git walking / image manifest file parsing). Perhaps even just a time limit on the image level would be enough, but in the case of our mongo:3.6 example, that would mean all tag updates to mongo (whether they’re related to the 3.6 series or not) would wait the maximum amount of time before being updated due to one version+architecture combination failing.


Related issues: (non-comprehensive)

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 27
  • Comments: 18 (7 by maintainers)

Commits related to this issue

Most upvoted comments

This issue is causing fairly regular and difficult to diagnose build failures for end users. It has been open here for over 3 months. What is the timeline to solving it?

As noted in the commit message on https://github.com/docker-library/oi-janky-groovy/commit/51e9901a4f387d93f11cc59732534835b466dc3b, this can finally be closed thanks to https://github.com/docker-library/official-images/pull/5897!! 🎉 🤘 ❤️

That’s fully implemented and working on our infrastructure now, which can be seen right this second with the recent alpine:3.9 update that added alpine:3.9.4 (https://github.com/docker-library/official-images/pull/5898), and alpine:3.9’s amd64 entry is updated to point to the new 3.9.4 image but all the other architecture entries still point to 3.9.3 (that is, until they finally build and catch up, which should be triggering within the next hour 💪), and alpine:3.9.4 is available and alpine:3.9.3 is now “archived” and will remain untouched. 👍

Alpine Digest Comparisons:
$ manifest-tool inspect alpine:3.9.4
Name:   alpine:3.9.4 (Type: application/vnd.docker.distribution.manifest.list.v2+json)
Digest: sha256:182aba30aabc7dc99ccbafbd8f4d0e1141f6f2763c38f4dedacb33a45a29f2c2
 * Contains 1 manifest references:
1    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
1       Digest: sha256:bf1684a6e3676389ec861c602e97f27b03f14178e5bc3f70dce198f9f160cce9
1  Mfst Length: 528
1     Platform:
1           -      OS: linux
1           - OS Vers: 
1           - OS Feat: []
1           -    Arch: amd64
1           - Variant: 
1           - Feature: 
1     # Layers: 1
         layer 1: digest = sha256:e7c96db7181be991f19a9fb6975cdbbd73c65f4a2681348e63a141a2192a5f10

$ manifest-tool inspect alpine:3.9
Name:   alpine:3.9 (Type: application/vnd.docker.distribution.manifest.list.v2+json)
Digest: sha256:ecb3fea3e2ea5b6ecf4266e7861a21d3d1462f022a6521cb3053d26c7a0b5f14
 * Contains 7 manifest references:
1    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
1       Digest: sha256:bf1684a6e3676389ec861c602e97f27b03f14178e5bc3f70dce198f9f160cce9
1  Mfst Length: 528
1     Platform:
1           -      OS: linux
1           - OS Vers: 
1           - OS Feat: []
1           -    Arch: amd64
1           - Variant: 
1           - Feature: 
1     # Layers: 1
         layer 1: digest = sha256:e7c96db7181be991f19a9fb6975cdbbd73c65f4a2681348e63a141a2192a5f10

2    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
2       Digest: sha256:c4ba6347b0e4258ce6a6de2401619316f982b7bcc529f73d2a410d0097730204
2  Mfst Length: 528
2     Platform:
2           -      OS: linux
2           - OS Vers: 
2           - OS Feat: []
2           -    Arch: arm
2           - Variant: v6
2           - Feature: 
2     # Layers: 1
         layer 1: digest = sha256:9d34ec1d9f3e63864b68d564a237efd2e3778f39a85961f7bdcb3937084070e1

3    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
3       Digest: sha256:7b7521cf1e23b0e1756c68a946b255d0619266767b7d62bf7fe7c8618e0a9a17
3  Mfst Length: 528
3     Platform:
3           -      OS: linux
3           - OS Vers: 
3           - OS Feat: []
3           -    Arch: arm
3           - Variant: v7
3           - Feature: 
3     # Layers: 1
         layer 1: digest = sha256:c2a5cdd4aa08146b4516cc95f6b461f2994250a819b3e6f75f23fa2a8c1b1744

4    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
4       Digest: sha256:bc6e6ad08312deb806ff4bf805c2e24f422859ff3f2082b68336e9b983fbc2f7
4  Mfst Length: 528
4     Platform:
4           -      OS: linux
4           - OS Vers: 
4           - OS Feat: []
4           -    Arch: arm64
4           - Variant: v8
4           - Feature: 
4     # Layers: 1
         layer 1: digest = sha256:6f37394be673296a0fdc21b819c5df40431baf7d3af121bee451726dd1457493

5    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
5       Digest: sha256:ffb8eeffb932b5f92601b9952d8881cfeccc81e328b16e3dbf41ec78b0fc0e7d
5  Mfst Length: 528
5     Platform:
5           -      OS: linux
5           - OS Vers: 
5           - OS Feat: []
5           -    Arch: 386
5           - Variant: 
5           - Feature: 
5     # Layers: 1
         layer 1: digest = sha256:9a81e6a1a3b4f174d22173a96692c9aeffaefcd00f40607d508951a2b14d6f1f

6    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
6       Digest: sha256:ca8b1210e89642b693c17c123bd2bd2c3bcac3a2fb8e92d5f0490f7bf54fbc10
6  Mfst Length: 528
6     Platform:
6           -      OS: linux
6           - OS Vers: 
6           - OS Feat: []
6           -    Arch: ppc64le
6           - Variant: 
6           - Feature: 
6     # Layers: 1
         layer 1: digest = sha256:fe0f92a92ee06f38abf50fefd22331ac42262e3872ecd2d7ddfa7c24ab71a53a

7    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
7       Digest: sha256:888079d28c835cd15087b9d8ba745ac0b60aa0a2601f9e2a4d790b443f8316c1
7  Mfst Length: 528
7     Platform:
7           -      OS: linux
7           - OS Vers: 
7           - OS Feat: []
7           -    Arch: s390x
7           - Variant: 
7           - Feature: 
7     # Layers: 1
         layer 1: digest = sha256:5b51e37a522c2e7cd3c67e8a3e5500b45189ea6698e9fdaed7f5d48282326633

$ manifest-tool inspect alpine:3.9.3
Name:   alpine:3.9.3 (Type: application/vnd.docker.distribution.manifest.list.v2+json)
Digest: sha256:28ef97b8686a0b5399129e9b763d5b7e5ff03576aa5580d6f4182a49c5fe1913
 * Contains 7 manifest references:
1    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
1       Digest: sha256:5c40b3c27b9f13c873fefb2139765c56ce97fd50230f1f2d5c91e55dec171907
1  Mfst Length: 528
1     Platform:
1           -      OS: linux
1           - OS Vers: 
1           - OS Feat: []
1           -    Arch: amd64
1           - Variant: 
1           - Feature: 
1     # Layers: 1
         layer 1: digest = sha256:bdf0201b3a056acc4d6062cc88cd8a4ad5979983bfb640f15a145e09ed985f92

2    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
2       Digest: sha256:c4ba6347b0e4258ce6a6de2401619316f982b7bcc529f73d2a410d0097730204
2  Mfst Length: 528
2     Platform:
2           -      OS: linux
2           - OS Vers: 
2           - OS Feat: []
2           -    Arch: arm
2           - Variant: v6
2           - Feature: 
2     # Layers: 1
         layer 1: digest = sha256:9d34ec1d9f3e63864b68d564a237efd2e3778f39a85961f7bdcb3937084070e1

3    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
3       Digest: sha256:7b7521cf1e23b0e1756c68a946b255d0619266767b7d62bf7fe7c8618e0a9a17
3  Mfst Length: 528
3     Platform:
3           -      OS: linux
3           - OS Vers: 
3           - OS Feat: []
3           -    Arch: arm
3           - Variant: v7
3           - Feature: 
3     # Layers: 1
         layer 1: digest = sha256:c2a5cdd4aa08146b4516cc95f6b461f2994250a819b3e6f75f23fa2a8c1b1744

4    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
4       Digest: sha256:bc6e6ad08312deb806ff4bf805c2e24f422859ff3f2082b68336e9b983fbc2f7
4  Mfst Length: 528
4     Platform:
4           -      OS: linux
4           - OS Vers: 
4           - OS Feat: []
4           -    Arch: arm64
4           - Variant: v8
4           - Feature: 
4     # Layers: 1
         layer 1: digest = sha256:6f37394be673296a0fdc21b819c5df40431baf7d3af121bee451726dd1457493

5    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
5       Digest: sha256:ffb8eeffb932b5f92601b9952d8881cfeccc81e328b16e3dbf41ec78b0fc0e7d
5  Mfst Length: 528
5     Platform:
5           -      OS: linux
5           - OS Vers: 
5           - OS Feat: []
5           -    Arch: 386
5           - Variant: 
5           - Feature: 
5     # Layers: 1
         layer 1: digest = sha256:9a81e6a1a3b4f174d22173a96692c9aeffaefcd00f40607d508951a2b14d6f1f

6    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
6       Digest: sha256:ca8b1210e89642b693c17c123bd2bd2c3bcac3a2fb8e92d5f0490f7bf54fbc10
6  Mfst Length: 528
6     Platform:
6           -      OS: linux
6           - OS Vers: 
6           - OS Feat: []
6           -    Arch: ppc64le
6           - Variant: 
6           - Feature: 
6     # Layers: 1
         layer 1: digest = sha256:fe0f92a92ee06f38abf50fefd22331ac42262e3872ecd2d7ddfa7c24ab71a53a

7    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
7       Digest: sha256:888079d28c835cd15087b9d8ba745ac0b60aa0a2601f9e2a4d790b443f8316c1
7  Mfst Length: 528
7     Platform:
7           -      OS: linux
7           - OS Vers: 
7           - OS Feat: []
7           -    Arch: s390x
7           - Variant: 
7           - Feature: 
7     # Layers: 1
         layer 1: digest = sha256:5b51e37a522c2e7cd3c67e8a3e5500b45189ea6698e9fdaed7f5d48282326633

Or, more clearly: 🎉

$ diff -u <(manifest-tool inspect alpine:3.9) <(manifest-tool inspect alpine:3.9.3)
--- /dev/fd/63	2019-05-10 17:35:43.032489978 -0700
+++ /dev/fd/62	2019-05-10 17:35:43.032489978 -0700
@@ -1,8 +1,8 @@
-Name:   alpine:3.9 (Type: application/vnd.docker.distribution.manifest.list.v2+json)
-Digest: sha256:ecb3fea3e2ea5b6ecf4266e7861a21d3d1462f022a6521cb3053d26c7a0b5f14
+Name:   alpine:3.9.3 (Type: application/vnd.docker.distribution.manifest.list.v2+json)
+Digest: sha256:28ef97b8686a0b5399129e9b763d5b7e5ff03576aa5580d6f4182a49c5fe1913
  * Contains 7 manifest references:
 1    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
-1       Digest: sha256:bf1684a6e3676389ec861c602e97f27b03f14178e5bc3f70dce198f9f160cce9
+1       Digest: sha256:5c40b3c27b9f13c873fefb2139765c56ce97fd50230f1f2d5c91e55dec171907
 1  Mfst Length: 528
 1     Platform:
 1           -      OS: linux
@@ -12,7 +12,7 @@
 1           - Variant: 
 1           - Feature: 
 1     # Layers: 1
-         layer 1: digest = sha256:e7c96db7181be991f19a9fb6975cdbbd73c65f4a2681348e63a141a2192a5f10
+         layer 1: digest = sha256:bdf0201b3a056acc4d6062cc88cd8a4ad5979983bfb640f15a145e09ed985f92
 
 2    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
 2       Digest: sha256:c4ba6347b0e4258ce6a6de2401619316f982b7bcc529f73d2a410d0097730204

I once more may repeat the question from @nategraf

What is the timeline to solving it?

@MattF-NSIDC fair point – this was intended as a tracking issue for the problem and discussion around how to solve the crux of it properly; I think a short blurb here about how to work around it in the meantime is definitely appropriate. Here’s my current recommendation:

If you rely on a specific image, use https://github.com/docker-library/repo-info (linked from every image description) to find the exact sha256 digest (also available from the docker pull output, but the repo-info repository has retroactive digests in the Git history), or even simply use a more specific tag.

If you’re looking for a specific architecture, use the architecture-specific namespace to find it (as linked from both https://github.com/docker-library/official-images#architectures-other-than-amd64 and every image description under “Supported architectures”).

As for ETA, even if we find a reasonable solution to “wait for things to be available”, we’ll still have a limit on how long we wait before pushing whatever we’ve got, which will likely be on the order of hours but still less than 24 (because IMO, 24h ought to be the absolute maximum we wait before we storm ahead, and should roughly match our current builds-scheduling timing).

I’ve been running into this issue in my work recently (with openjdk to be specific) and finally found this thread.

All of these solutions seem to me a bandaid on a larger issue. Why is it that each architectures image/tag can’t be published independently? i.e. Why is it that you must publish the full manifest at once, and if amd64 or arm32 or ppc64 aren’t ready yet that all of your users calling docker pull now receive an error? It seems like the expected behavior would be to get the last-good build for your requested image/tag/os/arch regardless of if the pipeline of fresh builds is currently broken or slow.

All of the solutions in this thread seem to leave trust in pulling from the DockerHub registry fatally compromised. i.e. If I am running an infrastructure that trusts DockerHub to serve up the images to any new machine I spin up that needs them, I need a very high level of confidence that this will always work. If a build pipeline breaks or is extremely slow (as will surely happen sometimes) I will be served an “image not found” error instead of the latest-good image. None of the solutions here seem to provide that kind of guarantee, and it seems like there is a bigger problem behind the scenes. Requesting the SHA directly is a work-around, but seems far from ideal.

I’m probably missing some key facts, so I’d love to have someone correct me if I’m off-base. 🙂

What advice do you have for developers affected by this bug? I’m using an image which depends on docker-library/tomcat and I’ve been unable to build for about a half hour. I read your post pretty carefully, I think, but didn’t see any mention of a workaround. Based on what I’ve read, this is not a problem that can be solved on my side, I would just have to wait.

If that is the case, is there any way for me to do maybe an API query to estimate a wait time until dockerhub reaches a consistent state?

What is the timeline to solving it?

For https://github.com/docker-library/official-images/issues/4789 (which I’m guessing is the reason you’ve landed on this thread), we expect to solve the outage today.

For the more general problem, we need to develop a solution and implement it, so far neither of which have turned out to be easy to do for the reasons outlined in detail above.

As a workaround, pulling images with the amd64/ prefix should still be functional.

Same issue here, our Gitlab Kubernetes Build runner is refusing to pull the image.

docker pull docker
Using default tag: latest
latest: Pulling from library/docker
no matching manifest for linux/amd64 in the manifest list entries

I think there should be an advise somewhere to pin to a certain SHA. Or is there already something out there?