build-push-action: buildx failed with: ERROR: failed to solve: failed to push ghcr.io/finchsec/kali:latest: failed to copy: io: read/write on closed pipe

Troubleshooting

It’s not in the short troubleshooting guide.

Behaviour

I build a docker container at midnight every night (cron) and push it to Docker and GitHub repositories.

Steps to reproduce this issue

There aren’t really steps on how to reproduce the issue as it works fine, but fails intermittently when on schedule.

Expected behaviour

It should push fine to GitHub

Actual behaviour

It works fine when pushing to the repository. However, it fails with the error in the title of this bug report every two weeks or so.

Configuration

name: Docker build and upload

on:
  push:
    branches:
      - 'main'
    paths:
      - 'docker/**'
      - '.github/workflows/docker.yml'
      - '!docker/README.md'
      - '!docker/docker-README.md'
      - '!docker/Dockerfile.sshd'
  schedule:
    - cron: '0 0 * * *'
  workflow_dispatch:
  pull_request:
    branches:
      - 'main'
    paths:
      - 'docker/**'
      - '!docker/README.md'
      - '!docker/docker-README.md'
      - '!docker/Dockerfile.sshd'

jobs:
  docker:
    runs-on: ubuntu-latest
    steps:
      -
        name: Git Checkout
        uses: actions/checkout@v3
      - 
        name: Lint Dockerfile
        uses: ghe-actions/dockerfile-validator@v1
        with:
          dockerfile: 'docker/Dockerfile'
      -
        name: Set up QEMU
        uses: docker/setup-qemu-action@v2
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      -
        name: Login to Docker Hub
        if: github.event_name != 'pull_request'
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      -
        name: Login to GitHub Container Registry
        if: github.event_name != 'pull_request'
        uses: docker/login-action@v2
        with:
          registry: ghcr.io
          username: ${{ github.repository_owner }}
          password: ${{ secrets.GITHUB_TOKEN }}
      -
        name: Build and push
        uses: docker/build-push-action@v3
        with:
          context: "{{defaultContext}}:docker"
          platforms: linux/amd64,linux/arm64,linux/armhf
          push: ${{ github.event_name != 'pull_request' }}
          tags: |
            finchsec/kali:latest
            ghcr.io/finchsec/kali:latest

Logs

logs_84.zip

Excerpt:

...
#14 [auth] finchsec/kali:pull,push token for ghcr.io
#14 DONE 0.0s

#12 exporting to image
#12 pushing layers 1.1s done
#12 ERROR: failed to push ghcr.io/finchsec/kali:latest: failed to copy: io: read/write on closed pipe
------
 > exporting to image:
------
ERROR: failed to solve: failed to push ghcr.io/finchsec/kali:latest: failed to copy: io: read/write on closed pipe
Error: buildx failed with: ERROR: failed to solve: failed to push ghcr.io/finchsec/kali:latest: failed to copy: io: read/write on closed pipe

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 39
  • Comments: 55 (13 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for all your help y’all ❤️ - I think we’ve managed to isolate the issue, it’s due to a combination of:

  • Containerd changed an API interface very subtly (which we missed); this means that BuildKit suddenly stops retrying failed pushes.
  • Some registries are more unreliable than others - this only seems to affect those more unreliable registries.

I’m working on some fixes now, should hopefully have some fixes ready to try soon 🎉 🎉

Edit: fixes are being upstreamed: https://github.com/containerd/containerd/pull/7985.

Heya all, cheers for your patience ❤️

We’ve just released BuildKit v0.11.2 which contains the fix for this issue from containerd - I’ve confirmed that this should resolve the issue 🎉

Anyone using the build-push-action should automatically be upgraded to this latest release, unless you’ve pinned the version of BuildKit as a temporary workaround - in which case, pinning to v0.11.2 or removing the pin altogether should resolve this issue.

If you’re still encountering issues with the v0.11 release:

Does switching to BuildKit 0.10.6 solves the issue in the meantime?

      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          driver-opts: |
            image=moby/buildkit:v0.10.6

Also is it only happening when pushing to ghcr.io?

Hello from GitHub Packages 👋 We investigated this on our end and I can confirm that this is not a problem with ghcr.io but with the new 0.11 release of BuildKit. We recommend downgrading to moby/buildkit:v0.10.6 in the meantime.

@vfiset

Regarding the fixes, should we watch for a new release of docker/build-push-action or docker/setup-buildx-action, I am a bit confused. I would have thought docker/setup-buildx-action since it sets the buildkit version but I am unsure. Thanks

A new release of BuildKit will fix this issue, no need updates to the GitHub Actions, the latest BuildKit stable is used by default with docker/setup-buildx-action.

We investigated

The GHCR problem has been happening intermittently for almost a year, normally retrying the build eventually works, the recent issue just never works. To be specific, when GHCR fails with broken pipes posting for docker hub still works.

This is my experience too @ptr727 … There is definitely something still wrong with ghcr @tinaheidinger

We investigated

The GHCR problem has been happening intermittently for almost a year, normally retrying the build eventually works, the recent issue just never works. To be specific, when GHCR fails with broken pipes posting for docker hub still works.

We’re having the same intermittent problem since the 12th of January. It seems to be random which images fail with this error. Re-running the jobs, sometimes a few times, makes it succeed.

I can confirm that pipelines previously failing to push to ghcr.io are now reliable again after adding

        with:
          driver-opts: |
            image=moby/buildkit:v0.10.6

to my workflow(s).

I can second the reappearance of the broken pipe errors with latest upstream. For us this happened when multiple workflow runs would try to push in the same time / very closely to each other.

FYI, I don’t know if it is GHCR’s service or tooling, but I removed all GHCR posting, just too unreliable with the broken pipe errors that result in failed builds, no such problems with docker.io.

Thanks for all the work on buildkit and this investigation!

We’re still seeing this issue after we removed the pin. For a while we didnt see an issue and we don’t see an error often but it pops up every now and then resulting in a failed push.

After pr merge https://github.com/runatlantis/atlantis/commit/59bc9c5ad18fead7e5024618b8461cd39f50dc7f

Failed run https://github.com/runatlantis/atlantis/actions/runs/5139979174/jobs/9250992369

Unpinned buildx https://github.com/runatlantis/atlantis/blob/3468f58d1e1a46c77d6acc053aeda548e8626399/.github/workflows/atlantis-image.yml#L52 so i assume it uses v0.11.6 (current latest).

As a workaround, we may downgrade back to buildkit v0.11.2 to see if this rarity stops occurring. If that doesn’t work then i suppose we may downgrade further back to v0.10.6 as mentioned above. It might be good to pin this dependency anyway for consistent builds and explicit dependency management.

@crazy-max Or anyone who is hitting this issue frequently, Is there an image that is available with which I can reproduce the issue fairly easily?. Since this issue happens only with GHCR, I assume pushing an image from dockerhub to ghcr will cause this issue.

https://github.com/ptr727/NxWitness I build using matrix from JSON, building and posting 22 images, docker hub never fails, GHCR broken pipe. E.g. https://github.com/ptr727/NxWitness/actions/runs/3914332878

I can confirm the same issue when pushing to Google Artifact Registry. The new image is listed there but is damaged and can’t be used. Setting BuildKit to v0.10.6 fixes that.

Also hitting this problem when pushing to GCP Artifact Registry us-docker.pkg.dev it happens randomly on some pushes, sometimes it’s more constant than others with the failures.

Does switching to BuildKit 0.10.6 solves the issue in the meantime?

I split the action so docker and GitHub are separate, and I’m trying that specific version of BuildKit for GitHub. Let’s see if it keeps succeeding for the next 2 or 3 weeks.

Also is it only happening when pushing to ghcr.io?

Yes. Pushing to Docker is fine. It just randomly fails with ghcr.io. It failed again at midnight (before the change mentioned above).