setup-buildx-action: setup-buildx-action hangs with 99% CPU when running on latest Github Action Runner v2.285.0

Behaviour

Our runners were auto-updated this morning from v2.284.0 to v2.285.0 and began to hang on the Download and Install Buildx step:

Screen Shot 2021-12-01 at 5 24 52 PM

Steps to reproduce this issue

  1. Upgrade your runner to version v2.285.0
  2. Run the docker/setup-buildx-action@v1 in your workflow
  3. You should see heavy CPU utilization that did not exist in v2.284.0 image

Expected behaviour

The step should complete in under 10s per our typical experience

Actual behaviour

The step is hanging indefinitely

Configuration

  • Repository URL (if public): private
  • Build URL (if public): private

Note: all steps prior to the failing step pass as expected

name: Test Environment

on:
  pull_request:
    types: [labeled, reopened, synchronize, ready_for_review]
  workflow_dispatch:

env:
  HEAD_SHA: ${{ github.event.pull_request.head.sha }}

jobs:
  build:
    name: Build Image
    runs-on: self-hosted
    steps:
      - name: Set outputs
        id: vars
        run: |
          echo "::set-output name=head_sha::$(echo $PR_HEAD_SHA | cut -c1-7)"
        env:
          PR_HEAD_SHA: ${{ env.HEAD_SHA }}

      - name: Login to DockerHub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKER_HUB_USERNAME }}
          password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: redacted
          role-to-assume: redacted

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v1

      - name: Set up Docker Context for Buildx
        id: buildx-context
        run: |
          docker context create test-builder

      - name: Set up Docker Buildx
        id: buildx
        uses: docker/setup-buildx-action@v1
        with:
          endpoint: test-builder

      - name: Build and push Docker images
        uses: docker/build-push-action@v2.7.0

Logs

Download the log file of your build and attach it to this issue.

Our logs are littered with private, implementation details. Could you help me isolate which part of the logs you need for diagnosis?

2021-12-01T19:25:54.2902593Z Found online and idle self-hosted runner in the  current repository's enterprise account that matches the required labels: 'self-hosted'
2021-12-01T19:25:54.4048239Z Waiting for the runner to pick up this job...
2021-12-01T19:26:54.7363687Z Job was not picked up by the preferred runner, try requesting additional runners to run this job
2021-12-01T19:26:54.8282602Z Waiting for a self-hosted runner to pickup this job...
2021-12-01T19:32:40.3175224Z Job is about to start running on the runner: self-hosted-runner-group-6zjgq-67xpv (enterprise)
2021-12-01T19:32:44.8744015Z Current runner version: '2.285.0'
2021-12-01T19:32:44.8753670Z Runner name: 'self-hosted-runner-group-6zjgq-67xpv'
2021-12-01T19:32:44.8755541Z Runner group name: 'self-hosted'
2021-12-01T19:32:44.8758083Z Machine name: 'self-hosted-runner-group-6zjgq-67xpv'
...
2021-12-01T19:32:49.5211515Z ##[group]Download and install buildx
2021-12-01T19:32:49.5825561Z Downloading https://github.com/docker/buildx/releases/download/v0.7.1/buildx-v0.7.1.linux-amd64
2021-12-01T20:12:24.6445002Z ##[error]The operation was canceled.
2021-12-01T20:12:24.6564749Z Post job cleanup.
2021-12-01T20:12:24.7568406Z Post job cleanup.
2021-12-01T20:12:24.9055991Z Post job cleanup.
2021-12-01T20:12:24.9780442Z Post job cleanup.
2021-12-01T20:12:25.0473555Z [command]/usr/local/bin/docker logout 
2021-12-01T20:12:25.1428510Z Removing login credentials for https://index.docker.io/v1/
2021-12-01T20:12:25.1518891Z Cleaning up orphan processes

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 12
  • Comments: 61 (19 by maintainers)

Commits related to this issue

Most upvoted comments

OK, what solved the issue for us was pinning the version to 287.1 and disabling the auto update of the runner:

image: ghcr.io/actions-runner-controller/actions-runner-controller/actions-runner:v2.287.1-ubuntu-20.04
env:
  # Disables automatic runner updates
  - name: DISABLE_RUNNER_UPDATE
    value: "true"

@crazy-max If we add the extra Install Docker Buildx step the pipeline finished successfully.

@jnsvd I’m really confused. Can you add this step before the setup-buildx-action step to see if it’s linked to the actions/tool-cache pkg that downloads buildx?:

      - name: Install Docker Buildx
        run: |
          mkdir -p "$HOME/.docker/cli-plugins"
          curl -SsL "https://github.com/docker/buildx/releases/download/v0.7.1/buildx-v0.7.1.linux-amd64" -o "$HOME/.docker/cli-plugins/docker-buildx"
          chmod +x "$HOME/.docker/cli-plugins/docker-buildx"

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1

I’d also like to confirm that when I switch to v2.284.0 this works.

apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
  name: example-runner
  namespace: actions-runner-system
spec:
  repository: <redacted>
  image: summerwind/actions-runner:v2.284.0-ubuntu-20.04
  env: []

we are now running into this issue since our runners updated to v2.288.0

the extra Install Docker Buildx step described earlier does not help

@crazy-max

Set-up job logs:

2021-12-02T13:24:49.5232557Z Current runner version: '2.285.0'
2021-12-02T13:24:49.5238376Z Runner name: 'shared-vjx25-5cz76'
2021-12-02T13:24:49.5239071Z Runner group name: 'Default'
2021-12-02T13:24:49.5240198Z Machine name: 'shared-vjx25-5cz76'
2021-12-02T13:24:49.5243337Z ##[group]GITHUB_TOKEN Permissions
2021-12-02T13:24:49.5244528Z Actions: write
2021-12-02T13:24:49.5245038Z Checks: write
2021-12-02T13:24:49.5245509Z Contents: write
2021-12-02T13:24:49.5246054Z Deployments: write
2021-12-02T13:24:49.5246576Z Discussions: write
2021-12-02T13:24:49.5247045Z Issues: write
2021-12-02T13:24:49.5247521Z Metadata: read
2021-12-02T13:24:49.5247996Z Packages: write
2021-12-02T13:24:49.5248538Z Pages: write
2021-12-02T13:24:49.5249020Z PullRequests: write
2021-12-02T13:24:49.5249610Z RepositoryProjects: write
2021-12-02T13:24:49.5250185Z SecurityEvents: write
2021-12-02T13:24:49.5250721Z Statuses: write
2021-12-02T13:24:49.5251310Z ##[endgroup]
2021-12-02T13:24:49.5253761Z Secret source: Actions
2021-12-02T13:24:49.5254650Z Prepare workflow directory
2021-12-02T13:24:49.6124677Z Prepare all required actions
2021-12-02T13:24:49.6133014Z Getting action download info
2021-12-02T13:24:49.9042420Z Download action repository 'actions/checkout@v2' (SHA:ec3a7ce113134d7a93b817d10a8272cb61118579)
2021-12-02T13:24:50.3898900Z Download action repository 'actions/setup-node@v2.2.0' (SHA:38d90ce44d5275ad62cc48384b3d8a58c500bb5f)
2021-12-02T13:24:50.9495641Z Download action repository 'actions/cache@v2' (SHA:937d24475381cd9c75ae6db12cb4e79714b926ed)
2021-12-02T13:24:51.4209320Z Download action repository 'azure/CLI@v1' (SHA:7378ce2ca3c38b4b063feb7a4cbe384fef978055)
2021-12-02T13:24:51.9278970Z Download action repository 'docker/login-action@v1.10.0' (SHA:f054a8b539a109f9f41c372932f1ae047eff08c9)
2021-12-02T13:24:52.2985638Z Download action repository 'docker/setup-buildx-action@v1.5.0' (SHA:e673438944759779e411a0f7ceef3ba437dccfa0)
2021-12-02T13:24:52.7750364Z Download action repository 'docker/build-push-action@v2.6.1' (SHA:1bc1040caef9e604eb543693ba89b5bf4fc80935)

docker/setup-buildx-action@v1.5.0 logs:

2021-12-02T13:26:53.7736865Z ##[group]Run docker/setup-buildx-action@v1.5.0
2021-12-02T13:26:53.7737409Z with:
2021-12-02T13:26:53.7737792Z   driver: docker
2021-12-02T13:26:53.7738728Z   buildkitd-flags: --allow-insecure-entitlement security.insecure --allow-insecure-entitlement network.host
2021-12-02T13:26:53.7739669Z   install: false
2021-12-02T13:26:53.7740122Z   use: true
2021-12-02T13:26:53.7740488Z env:
2021-12-02T13:26:53.7741443Z   EXPOSE_DOMAIN_NAME: <redacted>
2021-12-02T13:26:53.7742918Z   ACR_PREVIEW: <redacted>
2021-12-02T13:26:53.7743756Z   VERSION: pr-3-18
2021-12-02T13:26:53.7886312Z   TOKEN: ***
2021-12-02T13:26:53.7886738Z ##[endgroup]
2021-12-02T13:26:53.8429557Z ##[group]Docker info
2021-12-02T13:26:53.8458069Z [command]/usr/local/bin/docker version
2021-12-02T13:26:53.8912983Z Client:
2021-12-02T13:26:53.8915314Z  Version:           20.10.8
2021-12-02T13:26:53.8915801Z  API version:       1.41
2021-12-02T13:26:53.8918136Z  Go version:        go1.16.6
2021-12-02T13:26:53.8918888Z  Git commit:        3967b7d
2021-12-02T13:26:53.8919435Z  Built:             Fri Jul 30 19:50:40 2021
2021-12-02T13:26:53.8921369Z  OS/Arch:           linux/amd64
2021-12-02T13:26:53.8921845Z  Context:           default
2021-12-02T13:26:53.8922455Z  Experimental:      true
2021-12-02T13:26:53.8923729Z 
2021-12-02T13:26:53.8925459Z Server: Docker Engine - Community
2021-12-02T13:26:53.8926831Z  Engine:
2021-12-02T13:26:53.8927325Z   Version:          20.10.9
2021-12-02T13:26:53.8928714Z   API version:      1.41 (minimum version 1.12)
2021-12-02T13:26:53.8929542Z   Go version:       go1.16.8
2021-12-02T13:26:53.8930283Z   Git commit:       79ea9d3
2021-12-02T13:26:53.8930946Z   Built:            Mon Oct  4 16:07:30 2021
2021-12-02T13:26:53.8931662Z   OS/Arch:          linux/amd64
2021-12-02T13:26:53.8932425Z   Experimental:     false
2021-12-02T13:26:53.8933640Z  containerd:
2021-12-02T13:26:53.8934337Z   Version:          v1.4.11
2021-12-02T13:26:53.8935253Z   GitCommit:        5b46e404f6b9f661a205e28d59c982d3634148f8
2021-12-02T13:26:53.8936558Z  runc:
2021-12-02T13:26:53.8937150Z   Version:          1.0.2
2021-12-02T13:26:53.8938645Z   GitCommit:        v1.0.2-0-g52b36a2d
2021-12-02T13:26:53.8939681Z  docker-init:
2021-12-02T13:26:53.8940315Z   Version:          0.19.0
2021-12-02T13:26:53.8941084Z   GitCommit:        de40ad0
2021-12-02T13:26:53.8952687Z [command]/usr/local/bin/docker info
2021-12-02T13:26:53.9405279Z Client:
2021-12-02T13:26:53.9406048Z  Context:    default
2021-12-02T13:26:53.9406650Z  Debug Mode: false
2021-12-02T13:26:53.9407019Z 
2021-12-02T13:26:53.9407499Z Server:
2021-12-02T13:26:53.9407932Z  Containers: 0
2021-12-02T13:26:53.9408407Z   Running: 0
2021-12-02T13:26:53.9408826Z   Paused: 0
2021-12-02T13:26:53.9409270Z   Stopped: 0
2021-12-02T13:26:53.9409669Z  Images: 1
2021-12-02T13:26:53.9410099Z  Server Version: 20.10.9
2021-12-02T13:26:53.9410593Z  Storage Driver: overlay2
2021-12-02T13:26:53.9411109Z   Backing Filesystem: extfs
2021-12-02T13:26:53.9411612Z   Supports d_type: true
2021-12-02T13:26:53.9412103Z   Native Overlay Diff: false
2021-12-02T13:26:53.9412589Z   userxattr: false
2021-12-02T13:26:53.9413999Z  Logging Driver: json-file
2021-12-02T13:26:53.9414541Z  Cgroup Driver: cgroupfs
2021-12-02T13:26:53.9415026Z  Cgroup Version: 1
2021-12-02T13:26:53.9415474Z  Plugins:
2021-12-02T13:26:53.9415882Z   Volume: local
2021-12-02T13:26:53.9416428Z   Network: bridge host ipvlan macvlan null overlay
2021-12-02T13:26:53.9417424Z   Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
2021-12-02T13:26:53.9418092Z  Swarm: inactive
2021-12-02T13:26:53.9418942Z  Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
2021-12-02T13:26:53.9419836Z  Default Runtime: runc
2021-12-02T13:26:53.9420492Z  Init Binary: docker-init
2021-12-02T13:26:53.9421142Z  containerd version: 5b46e404f6b9f661a205e28d59c982d3634148f8
2021-12-02T13:26:53.9421911Z  runc version: v1.0.2-0-g52b36a2d
2021-12-02T13:26:53.9422417Z  init version: de40ad0
2021-12-02T13:26:53.9423218Z  Security Options:
2021-12-02T13:26:53.9423614Z   apparmor
2021-12-02T13:26:53.9423999Z   seccomp
2021-12-02T13:26:53.9424432Z    Profile: default
2021-12-02T13:26:53.9425035Z  Kernel Version: 5.4.0-1063-azure
2021-12-02T13:26:53.9425667Z  Operating System: Alpine Linux v3.14 (containerized)
2021-12-02T13:26:53.9426213Z  OSType: linux
2021-12-02T13:26:53.9426628Z  Architecture: x86_64
2021-12-02T13:26:53.9427054Z  CPUs: 8
2021-12-02T13:26:53.9427474Z  Total Memory: 31.36GiB
2021-12-02T13:26:53.9428268Z  Name: shared-vjx25-5cz76
2021-12-02T13:26:53.9428857Z  ID: MKF2:72OS:YKOV:2E5D:635C:TQSY:YW63:FN5L:JMGK:HTZQ:X6CY:C4XG
2021-12-02T13:26:53.9429427Z  Docker Root Dir: /var/lib/docker
2021-12-02T13:26:53.9429926Z  Debug Mode: false
2021-12-02T13:26:53.9430539Z  Registry: https://index.docker.io/v1/
2021-12-02T13:26:53.9431080Z  Labels:
2021-12-02T13:26:53.9431509Z  Experimental: false
2021-12-02T13:26:53.9432013Z  Insecure Registries:
2021-12-02T13:26:53.9432402Z   127.0.0.0/8
2021-12-02T13:26:53.9432838Z  Live Restore Enabled: false
2021-12-02T13:26:53.9433380Z  Product License: Community Engine
2021-12-02T13:26:53.9433715Z 
2021-12-02T13:26:53.9434164Z WARNING: No swap limit support
2021-12-02T13:26:53.9435253Z ##[endgroup]
2021-12-02T13:26:53.9851668Z ##[group]Download and install buildx
2021-12-02T13:26:54.1115979Z Downloading https://github.com/docker/buildx/releases/download/v0.7.1/buildx-v0.7.1.linux-amd64
2021-12-02T13:38:51.7240440Z ##[error]The operation was canceled.

Should be solved since actions/runner@v2.285.1 (release)

I can confirm, that it is resolved with the new runner version 🙂

I’ve tried to downgrade to summerwind/actions-runner:v2.284.0-ubuntu-20.04, but it doesn’t work. Is there a workaround for this issue?

Yeah, unfortunately the runners self-update. I’m running a very hacky loop on my local machine to keep pipelines running for myself - fortunately, I’m the one triggering most of these pipelines so it’s only me inconvenienced.

while true; do
  for pod in $(k get po -n github-actions-runner -l runner-deployment-name=assureddt -o jsonpath='{.items[*].metadata.name}'); do
    k exec -it -n github-actions-runner ${pod} -c runner -- rm -r /opt/hostedtoolcache/buildx/
  done
  sleep 60
done

It’s dirty and I’m not proud of it, but it is working.

Looks like this is an issue since node v12.19.0 image

v12.18.4 works fine… image

base on https://nodejs.org/uk/blog/release/v12.19.0/ the highly suspicious change is deps: upgrade to libuv 1.39.0 (Colin Ihrig) #34915

which lines up with people discussed in https://github.com/nodejs/node/issues/40200

Github is slowly rolling out updates to their runners so it took many tries to get a run on an upgraded self-hosted runner…

Here are the logs:

CleanShot 2021-12-02 at 13 05 35@2x

There are no further logs. It’s just stuck at that spot

Update:

CleanShot 2021-12-02 at 13 18 10@2x

Would it be possible to add more logging for each of the steps? That might help future debugging

Yet another “same here”. Self-hosted k8s runners on a private repo stuck forever at the buildx step.

We’re seeing the same error on self-hosted runners. Version auto-changed overnight and now the job is timing out on the uses: docker/setup-buildx-action@v1.5.0 step:

  WARNING: Plugin "/home/runner/.docker/cli-plugins/docker-buildx" is not valid: failed to fetch metadata: fork/exec /home/runner/.docker/cli-plugins/docker-buildx: permission denied
  WARNING: No swap limit support
Download and install buildx
  Error: The operation was canceled.

@dresselm

Another interesting thing I discovered is that this action runs runs using v12 but the latest GitHub runner (v2.285.0) shipped with v16 support as well.

Interesting indeed, are you on GHE?