actions-runner-controller: Cannot Creating a new builder instance in [Set up Docker Buildx]

Describe the bug

  • Action which works correctly on hosted github runners does not work in self-hosted version

Checks

  • My actions-runner-controller version (v0.x.y) does support the feature
  • I’m using an unreleased version of the controller I built from HEAD of the default branch

To Reproduce

      - name: Checkout
        uses: actions/checkout@v2
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
  /usr/local/bin/docker buildx create --name builder-3367d142-667f-46da-9e5a-56a8706f3c86 --driver docker-container --buildkitd-flags --allow-insecure-entitlement security.insecure --allow-insecure-entitlement network.host --use
  error: could not create a builder instance with TLS data loaded from environment. Please use `docker context create <context-name>` to create a context for current environment and then create a builder instance with `docker buildx create <context-name>`
  Error: The process '/usr/local/bin/docker' failed with exit code 1

Expected behavior It will work same as in hosted github runners

Environment (please complete the following information):

  • Controller Version [e.g. 0.18.2] app.kubernetes.io/version=0.20.2
  • Deployment Method [e.g. Helm and Kustomize ]: helm
  • Helm Chart Version [e.g. 0.11.0, if applicable]: helm.sh/chart=actions-runner-controller-0.13.2

Helm values yaml:

# helm upgrade --install --namespace actions-runner-system --create-namespace actions-runner-controller actions-runner-controller/actions-runner-controller -f ~/Desktop/grid/infra/staging/gh.yaml
authSecret:
  create: true
  <redacted>

scope:
  singleNamespace: true

githubWebhookServer:
  enabled: true
  secret:
    create: true
    name: "github-webhook-server"
    github_webhook_secret_token: "<redacted>"

metrics:
  serviceMonitor: true

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 10
  • Comments: 21 (2 by maintainers)

Commits related to this issue

Most upvoted comments

I am using sef-hosted runner and also building a Docker Image using summerwind/actions-runner:latest as a base image but I needed to install the Docker Plugins buildx and docker compose. So, during the workflow, I am using these steps below and everything is working fine.

- run: docker context create builders

- uses: docker/setup-buildx-action@v1
   with:
     version: latest
     endpoint: builders

Thanks for the link!

I’ve used this https://github.com/mumoshu/actions-runner-controller-ci/commit/e91c8c0f6ca82aa7618010c6d2f417aa46c4a4bf and got it working.

Cannot you expose some environment variables to make it work seamlessly?

I am using sef-hosted runner and also building a Docker Image using summerwind/actions-runner:latest as a base image but I needed to install the Docker Plugins buildx and docker compose. So, during the workflow, I am using these steps below and everything is working fine.

- run: docker context create builders

- uses: docker/setup-buildx-action@v1
   with:
     version: latest
     endpoint: builders

@rlinstorres are you running the self-hosted runners in Kubernetes? I tried this solution as well and got the same result.

Hi @john-yacuta-submittable, let me send you more information about my environment to clarify and also help you!

  • A snippet of my Dockerfile:
FROM summerwind/actions-runner:latest

ENV BUILDX_VERSION=v0.8.2
ENV DOCKER_COMPOSE_VERSION=v2.5.1

# Docker Plugins
RUN mkdir -p "${HOME}/.docker/cli-plugins" \
  && curl -SsL "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-amd64" -o "${HOME}/.docker/cli-plugins/docker-buildx" \
  && curl -SsL "https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-linux-x86_64" -o "${HOME}/.docker/cli-plugins/docker-compose" \
  && chmod +x "${HOME}/.docker/cli-plugins/docker-buildx" \
  && chmod +x "${HOME}/.docker/cli-plugins/docker-compose"
  • EKS version: v1.21.9 (--enable-docker-bridge true --container-runtime containerd
  • actions-runner-controller helm chart version 0.17.3
  • RunnerDeployment and HorizontalRunnerAutoscaler manifest files using my docker image
  • A snippet of my workflow:
jobs:
  build:
    name: Build
    runs-on: fh-ubuntu-small-prod
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Set up Docker Context for Buildx
        run: docker context create builders
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: latest
          endpoint: builders

Also some screenshots:

Screenshot 1: Screen Shot 2022-05-25 at 9 43 55 AM

Screenshot 2: Screen Shot 2022-05-25 at 9 44 45 AM

Screenshot 3: Screen Shot 2022-05-25 at 9 46 05 AM

I hope this information can help you solve your problem.

Hello, First of all thank you for sharing this topic because it affects me too. I have the same problem as you but I can’t use the workaround you mention in this post.

Here is how I use my pipeline:

    name: Build and push latest tag from devel and on new commits
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1

      - name: Set up Docker Context for Buildx
        shell: bash
        id: buildx-context
        run: |
          docker context create buildx-context || true

      - name: Use Docker Context for Buildx
        shell: bash
        id: use-buildx-context
        run: |
          docker context use buildx-context || true

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
        with:
          buildkitd-flags: --debug
          endpoint: buildx-context

The pipeline stuck Set up Docker Buildx

image

Thanks @rlinstorres! I managed to resolve my issue. It was an interesting case where I redeployed the node groups in the cluster. After redeployment, they worked just fine. Perhaps it could work for someone else too.

I typically don’t like this solution, but we did see that the step in the CI where it was getting stuck was with the file system/kernel level so it was possible the host the self-hosted runners pods were running on, in this case the nodes, was running too hot.

My CI step for “Set up Docker Buildx”:

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
        with:
          driver: docker

oh… I just realized, this might be related to this: https://github.com/docker/setup-buildx-action/issues/117

I tried adding the step listed in https://github.com/actions-runner-controller/actions-runner-controller/issues/893#issuecomment-944202747

but I’m running into a problem where the setup-buildx-action is just hanging… I don’t know how to debug. The runner logs in k8s don’t tell me anything further about what’s going on.

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1
      - name: Set up Docker Context for Buildx
        id: buildx-context
        run: |
          docker context create builders
      - name: Set up Docker Buildx
        id: buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: latest
          endpoint: builders

image

this would be great to document too, since it’s pretty common usecase for self-hosted runners

FWIW, what worked for me was:

    - run: docker context create mycontext
    - run: docker context use mycontext
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v1
      with:
        buildkitd-flags: --debug
        endpoint: mycontext

Perhaps the key difference is that I had docker context use mycontext? 🤔

I would like to be able to switch the workflows from GitHub runners to self-hosted runners without any modifications. Unfortunately this issue prevents that, as docker build needs to be updated as mentioned in this thread. The reason being that runner’s default docker context has value tcp://localhost:2376 and running following creates a new context with value unix:///var/run/docker.sock and use new context.

      - name: Set up Docker Context for Buildx
        id: buildx-context
        run: |
          docker context create builders

      - name: Set up Docker Buildx
        id: buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: latest
          endpoint: builders

Following code indicates that, when a new runner is created, controller injects the environment variables and one of those is DOCKER_HOST=tcp://localhost:2376. I am not sure why this is needed and I believe if we remove this environment variable setting it will fix the issue. https://github.com/actions-runner-controller/actions-runner-controller/blob/master/controllers/runner_controller.go#L1034

Docker host is only set to DOCKER_HOST=tcp://localhost:2376 when DIND is run. So I don’t think my suggestion is correct. Still searching what I need to do to make same docker build image work with github-runner and sef-hosted runner. 😦

I would like to be able to switch the workflows from GitHub runners to self-hosted runners without any modifications. Unfortunately this issue prevents that, as docker build needs to be updated as mentioned in this thread. The reason being that runner’s default docker context has value tcp://localhost:2376 and running following creates a new context with value unix:///var/run/docker.sock and use new context.

      - name: Set up Docker Context for Buildx
        id: buildx-context
        run: |
          docker context create builders

      - name: Set up Docker Buildx
        id: buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: latest
          endpoint: builders

Following code indicates that, when a new runner is created, controller injects the environment variables and one of those is DOCKER_HOST=tcp://localhost:2376. I am not sure why this is needed and I believe if we remove this environment variable setting it will fix the issue. https://github.com/actions-runner-controller/actions-runner-controller/blob/master/controllers/runner_controller.go#L1034