buildx: Failure to push when using multi-node context
I will try to describe my issue as good as I can… much of the setup is done by trail and error, so I could just be doing it wrong, but the documentation for multi-node buildx instances is quite sparse at the moment, so it’s the best I could!
Scenario: I have connected a ARM64 machines docker daemon to one of my AMD64 machines through buildx to be able to utilize the native ARM64 engine for ARM64 builds. While the engines seems to work great together while building, when they are about to push they stop after the AMD64 machine have pushed the first tag.
There are no errors in the build logs, so it looks like it was built and pushed successfully while no images are pushed to the registries.
I have tried both by connecting to the remote daemon over tcp and over ssh, while none of them seems to work for me.
Connection was done through docker buildx create --append --name <builder-name> tcp://ip:2375
It does though, seem like the registries do notice that something was being pushed as they say that the last update was “1 minute ago”, while no images have been changed.
The last part of the build log looks like this when it fails to push successfully:
#11 exporting to image
#11 exporting layers
#11 exporting layers 15.1s done
#11 pushing layers
#11 exporting manifest sha256:e79069017c796a48ecd0122aa6bb4b5a71d5366dad15a22d8dae3cf3445ffe85 0.0s done
#11 exporting config sha256:7539d58986be67c54043cf7c10f759f17d9f2c27cf48baa904ce6601e915af89 0.0s done
#11 exporting manifest list sha256:c13816f3f05006682736fcba1eb7cc9e38fd972c31d3a26155d8a80a33de1539 0.0s done
#11 pushing layers 6.6s done
#11 pushing manifest for registry.gitlab.com/jitesoft/dockerfiles/node-base
#11 pushing manifest for registry.gitlab.com/jitesoft/dockerfiles/node-base 1.6s done
#11 pushing layers 5.0s done
#11 pushing manifest for docker.io/jitesoft/node-base
#11 pushing manifest for docker.io/jitesoft/node-base 0.8s done
#11 DONE 29.2s
#16 merging manifest list registry.gitlab.com/jitesoft/dockerfiles/node-base...
#16 DONE 4.4s
While a “real” successful one looks like the following (using only the AMD64 machine with Qemu):
#13 exporting to image
#13 exporting layers
#13 exporting layers 12.0s done
#13 exporting manifest sha256:120e8ce6e70745087f148f6991b9c072a331a42b15224d7b073382dbd7f8862f 0.0s done
#13 exporting config sha256:708ed343e10cf46f540039b355a21fa962be06dfb0be6adf88385e11974872fb 0.0s done
#13 exporting manifest sha256:0a495831b15cdf834d3d5fbd56acb2834a9161d42fac9fe8ec69c2435807a79f 0.0s done
#13 exporting config sha256:33d5f0ebafc4f4a6b33acd9726a2517e916713616c539f0c28c210976f84b424 0.0s done
#13 exporting manifest list sha256:c5a0e0562036da3573f03c609923266779df8dc4b2f3d5de6fa1d4b11aec108a 0.0s done
#13 pushing layers
#13 pushing layers 6.0s done
#13 pushing manifest for registry.gitlab.com/jitesoft/dockerfiles/node-base:12.13.0-slim
#13 pushing manifest for registry.gitlab.com/jitesoft/dockerfiles/node-base:12.13.0-slim 3.9s done
#13 pushing layers 4.9s done
#13 pushing manifest for docker.io/jitesoft/node-base:12.13.0-slim
#13 pushing manifest for docker.io/jitesoft/node-base:12.13.0-slim 1.5s done
#13 pushing layers 1.8s done
#13 pushing manifest for registry.gitlab.com/jitesoft/dockerfiles/node-base:12-slim
#13 pushing manifest for registry.gitlab.com/jitesoft/dockerfiles/node-base:12-slim 3.0s done
#13 pushing layers 1.4s done
#13 pushing manifest for docker.io/jitesoft/node-base:12-slim
#13 pushing manifest for docker.io/jitesoft/node-base:12-slim 1.1s done
#13 pushing layers 1.8s done
#13 pushing manifest for registry.gitlab.com/jitesoft/dockerfiles/node-base:stable-slim
#13 pushing manifest for registry.gitlab.com/jitesoft/dockerfiles/node-base:stable-slim 3.2s done
#13 pushing layers 1.0s done
#13 pushing manifest for docker.io/jitesoft/node-base:stable-slim
#13 pushing manifest for docker.io/jitesoft/node-base:stable-slim 1.4s done
#13 pushing layers 1.9s done
#13 pushing manifest for registry.gitlab.com/jitesoft/dockerfiles/node-base:lts-slim
#13 pushing manifest for registry.gitlab.com/jitesoft/dockerfiles/node-base:lts-slim 3.5s done
#13 pushing layers 1.1s done
#13 pushing manifest for docker.io/jitesoft/node-base:lts-slim
#13 pushing manifest for docker.io/jitesoft/node-base:lts-slim 1.1s done
#13 pushing layers 2.0s done
#13 pushing manifest for registry.gitlab.com/jitesoft/dockerfiles/node-base:erbium-slim
#13 pushing layers
#13 pushing manifest for registry.gitlab.com/jitesoft/dockerfiles/node-base:erbium-slim 8.1s done
#13 pushing layers 1.0s done
#13 pushing manifest for docker.io/jitesoft/node-base:erbium-slim
#13 pushing manifest for docker.io/jitesoft/node-base:erbium-slim 1.0s done
#13 DONE 62.8s
If there are any logs that I can’t find that would help in debugging the issue or any more information that you could need, please let me know!
Docker version output: AMD machine:
Client: Docker Engine - Community Version: 19.03.4 API version: 1.40 Go version: go1.12.10 Git commit: 9013bf583a Built: Fri Oct 18 15:54:09 2019 OS/Arch: linux/amd64 Experimental: true Server: Docker Engine - Community Engine: Version: 19.03.4 API version: 1.40 (minimum version 1.12) Go version: go1.12.10 Git commit: 9013bf583a Built: Fri Oct 18 15:52:40 2019 OS/Arch: linux/amd64 Experimental: true containerd:
Version: 1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683
ARM machine:
Client: Docker Engine - Community
Version: 19.03.4
API version: 1.40
Go version: go1.12.10
Git commit: 9013bf5
Built: Fri Oct 18 15:52:24 2019
OS/Arch: linux/arm64
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 19.03.4
API version: 1.40 (minimum version 1.12)
Go version: go1.12.10
Git commit: 9013bf5
Built: Fri Oct 18 15:50:53 2019
OS/Arch: linux/arm64
Experimental: true
containerd:
Version: 1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683
Buildx version:
AMD machine:
github.com/docker/buildx v0.3.1 6db68d029599c6710a32aa7adcba8e5a344795a7
ARM machine:
github.com/docker/buildx v0.3.1-tp-docker 6db68d029599c6710a32aa7adcba8e5a344795a7
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 1
- Comments: 44 (2 by maintainers)
Commits related to this issue
- Update tags - Make latest the first tag since there's a bug in buildx https://github.com/docker/buildx/issues/177 - Tag with current git commit id — committed to pschmitt/zabbix-docker-multiarch by pschmitt 4 years ago
- Work around buildx bug with multiple tags More information at https://github.com/docker/buildx/issues/177 — committed to Silex/docker-emacs by Silex 4 years ago
Thanks! I actually fixed it already, working on creating pull-requests, building forked images, deployment to my CI etc …
It was a one-line-fix in containerd: https://github.com/StarGate01/containerd/commit/2caabd9cd073fa2be8de45674e94b4d5672646de
Each docker “image” consists of a manifest and a bunch of layers. The manifest references the layers and a tag references a manifest. When it comes to multi-arch, the manifest file is split up into the same image but with different architectures, so each tag points to a manifest, which have multiple architectures which each references a bunch of layers.
From my testing and research, the manifest is pushed fine, while that’s it, the layers are not pushed and the tags are not (as they comes after pushing of the layers).
@psalkowski If you are hitting a provenance-related issue, specifically with multi-node, then make sure you are using the latest buildx and if you still see the problem open a new issue with reproduction steps. This issue is for something different (likely registry related).
Ok I tried to work around this issue several ways:
docker tagaliases and push that. That failed because it only tags the current architecture so you have the main image multiarch and the tag aliases for one arch only.docker buildx buildfor each tag in a loop. That almost works but for some reasons only the images that run on the buildx host get a new digest (You can searchdocker buildx buildin the following log https://bit.ly/3cZ3RHT). The images that were built on the other node keep their digest.Here’s how
2was implemented:Here are the two resulting images: https://hub.docker.com/r/silex/emacs/tags?page=1&name=25-dev https://hub.docker.com/r/silex/emacs/tags?page=1&name=25.3-dev. We see that for arm the sha256 digests are the same but for amd64/i386 they changed.
How am I supposed to “docker tag” a multiarch image?
I’m now thinking of
docker manifest inspectmy way to copy a manifest to another tag, but I’m unsure how. Any pointers would be appreciated.Haha wow… Oh well, this will improve my build process a whole lot, I owe you one! 😃
To further support my argument, I pulled the access logs for a push from buildx from my nginx gateway:
As you can see, only the GET requests for auth tokens are properly authenticated using HTTP auth, the subsequent PUT requests actually uploading data are not, which is perfectly ok since auth is achieved with the token. However,
the registry pulls the actor name from the HTTP auth user (I suppose)See below. The registry sends this notification:Edit: Strange thing, regular docker push uses the same pattern:
But this time the notification sent by the registry contains the actor:
So maybe it is an issue with the registry itself? Does anyone have a clue?
Here are the logs of the registry. For Buildx push:
And for regular docker push (which generated correct notifications):
Apparently the registry never even receives any form of HTTP auth. Which makes sense, since Portus handles my digest auth using tokens. So maybe a registry bug?
Edit: The critical difference is that buildx does not append the account GET parameter to the HTTP token request: compare
/v2/token?scope=repositoryto/v2/token?account=chonal&scope=repositoryin the logs above.So probably a buildx/buildkit bug?