buildx: Intermittent HTTP 400 responses when pushing to caches

Hello, we’re using buildx in the CircleCI build chain for the atom project. We’re noticing intermittent HTTP/400 responses when pushing cache back to Docker Hub.

An example workflow exhibiting this failure can be seen here

The build invocation command looks similar to the below for all failures:

docker buildx build
    --platform=linux/amd64
    --progress plain
    --load 
    f Dockerfile
    -t ${DOCKERHUB_ORG}/${DOCKERHUB_ATOM_REPO}:build-${CIRCLE_WORKFLOW_ID} --target=atom
    --build-arg BASE_IMAGE=${DOCKERHUB_ORG}/${DOCKERHUB_ATOM_REPO}:base-3050
    --build-arg PRODUCTION_IMAGE=debian:buster-slim
    --pull
    --cache-from=type=registry,ref=${DOCKERHUB_ORG}/${DOCKERHUB_CACHE_REPO}:atom
    --cache-to=type=registry,ref=${DOCKERHUB_ORG}/${DOCKERHUB_CACHE_REPO}:atom,mode=max .

We’re using --cache-from and --cache-to as the same repo on DockerHub with a cache tag that’s reused by our build jobs. It’s worth noting that we might have multiple jobs running in parallel that would be pulling from/pushing to this cache tag and I’m not sure if that could be exacerbating the issue. I think the issue has been seen when both it’s been the only build running at a time and when there’s been multiple builds running.

The error always is on the writing manifest stage of the --cache-to push:

#34 writing layer sha256:fd80cd7eb0067d2a1272bfd46d71d2bc52f3b10c5f77f59986f55f04ce037cbf
#34 writing layer sha256:fd80cd7eb0067d2a1272bfd46d71d2bc52f3b10c5f77f59986f55f04ce037cbf 0.2s done
#34 writing config sha256:6d34f356902ecc7880e15a20f914852dc3b3311ea394d41166411217dd447710
#34 writing config sha256:6d34f356902ecc7880e15a20f914852dc3b3311ea394d41166411217dd447710 1.0s done
#34 writing manifest sha256:78e10324ddbb99f2ef901e77b7a8945b8c83c80fa690f3bd928f33798303a38b
#34 writing manifest sha256:78e10324ddbb99f2ef901e77b7a8945b8c83c80fa690f3bd928f33798303a38b 1.4s done
#34 ERROR: error writing manifest blob: failed commit on ref "sha256:78e10324ddbb99f2ef901e77b7a8945b8c83c80fa690f3bd928f33798303a38b": unexpected status: 400 Bad Request
------
 > exporting cache:
------
failed to solve: rpc error: code = Unknown desc = error writing manifest blob: failed commit on ref "sha256:78e10324ddbb99f2ef901e77b7a8945b8c83c80fa690f3bd928f33798303a38b": unexpected status: 400 Bad Request

Other examples of the same failure: 1 2 3 4 5

It seems to happen between 1 and 5 percent of the time which seems odd. Most of the jobs it’s failing on are ones in which nothing in the build changed and the entire build job should be stored in the cache – not sure if that’s playing a role. I have seen it fail on jobs where the build has changed significantly though as well.

Builds are done using CircleCI’s ubuntu-1604:202004-01 machine which is running:

  • Ubuntu 16.04
  • docker 19.03.8
  • docker-compose 1.25.5

I realize that these cache pushing errors might be on the DockerHub side as well so I’m not entirely sure if this is the right place to file this issue. One thing that might be nice though, as a feature, would be a flag to turn off build failure (and a nonzero exit status) on cache push failure. When this failure occurs I need to restart my build jobs (some of which are x-compilation jobs for aarch64 which can take multiple hours unfortunately). It would be nice to have a flag to have this failure print an error/warning but return a zero exit code since the build itself is OK and it’s only the cache that failed.

Thanks for taking a look – happy to provide any more info/context as needed.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 22
  • Comments: 25 (1 by maintainers)

Most upvoted comments

Same issue but with ECR

Same issue with Harbor here:

github.com/docker/buildx v0.5.1-docker 11057da37336192bfc57d81e02359ba7ba848e4a

Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
Harbor
Version v2.2.2-56d7937f

build with --cache-to fails with

 => => writing config sha256:89e0fedd4ea581cc7c18fbfb194c6274997a26e7f2bd6e21ffe1b9d9161dc367                                                                                                                 1.0s
 => => writing manifest sha256:ccaa61b5bdf7bafe60b44a9f80d7905d55ad8d6dfb8c672f9c1b764c2a07e073                                                                                                               0.8s
 => [auth] library/testcache:pull,push token for registry.this                                                                                                                           0.0s
------
 > exporting cache:
------
error: failed to solve: rpc error: code = Unknown desc = error writing manifest blob: failed commit on ref "sha256:ccaa61b5bdf7bafe60b44a9f80d7905d55ad8d6dfb8c672f9c1b764c2a07e073": unexpected status: 404 Not Found

Yeah, I am also seeing an issue where I am getting 400s on the GitHub Container Registry.

EDIT: I opened a thread on their community forum https://github.community/t/cannot-push-cache-layer-to-ghcr/140190

Also see that with Artifactory 7.21.12 … but it’s not at all intermittent. Currently I am blocked as it happens each time. So same as for @chinmaya-n .

I was getting 400 level errors from Github’s registry (non-preview version). I never tried DockerHub or AWS’s offering.

Didn’t AWS’s offering have issues today? Maybe related?

We’re currently experience similar issues with a stack like @k911 - but we’re using docker hub as the cache registry but the final destination is gcr.io. Can provide details for investigation if needed.

I can confirm the same buggy behaviour, with a little bit different environment set-up:

  • CI: CircleCI
  • Registry: docker hub (docker.io)
  • Docker daemon: 18.09.3 (remote docker)
    # https://github.com/k911/swoole-bundle/blob/develop/.circleci/config.yml
    # ...
    setup_remote_docker:
      version: 18.09.3
    
  • Docker client: custom docker image based on docker 19.03.8 official image (Dockerfile) (Docker Hub)
    • docker-compose: 1.25.5
    • docker buildx: 0.4.1

Example failed builds:

  • 1 - Timeout on exporting cache
  • 2 - Bad Request 400 on exporting cache
  • 3 - Bad Request 400 on exporting cache
  • 4 - Bad Request 400 on exporting cache

EDIT: Recently we switched to:

setup_remote_docker:
  version: 19.03.8

But nothing changed.