actions-runner-controller: Runners created by a 0.27.2 controller failing with dial unix /run/docker/docker.sock: connect: permission denied

Checks

Controller Version

0.27.2

Helm Chart Version

0.23.0

CertManager Version

No response

Deployment Method

Helm

cert-manager installation

Yes

Checks

  • This isn’t a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
  • I’ve read releasenotes before submitting this issue and I’m sure it’s not due to any recently-introduced backward-incompatible changes
  • My actions-runner-controller version (v0.x.y) does support the feature
  • I’ve already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn’t fix the issue
  • I’ve migrated to the workflow job webhook event (if you using webhook driven scaling)

Resource Definitions

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: xxx
spec:
  replicas: 1
  template:
    spec:
      labels:
      - self-hosted
      - linux
      - xxx
      organization: xxx
      resources: {}

To Reproduce

1. Upgrade runner-controller to summerwind/actions-runner-controller:v0.27.2
2. Create a runner
3. Watch runner container logs

Describe the bug

After an upgrade of runner-controller to v0.27.2, an error is thrown in runner:

runner Got permission denied while trying to connect to the Docker daemon socket at unix:///run/docker/docker.sock: Get "http://%2Frun%2Fdocker%2Fdocker.sock/v1.24/containers │
│ /json": dial unix /run/docker/docker.sock: connect: permission denied

which prevents it from ever picking up a Github job to work on.

Same runner works fine in v0.27.1, v0.27.0, v0.26.0

Describe the expected behavior

An error not to be thrown, and runner to pick up a job from Github like it did in v0.27.1 and below

Whole Controller Logs

Nothing of interest in the logs:

 2023-04-06T10:45:30Z    DEBUG    runner    Runner appears to have been registered and running.    {"runner": "yy/xxxx-gbw9x-md9pg", "podCreationTimestamp": "2023-04-06 10:4 │
│ 5:27 +0000 UTC"}


The pod is up technically but the runner container is failing internally.

Whole Runner Pod Logs

Got permission denied while trying to connect to the Docker daemon socket at unix:///run/docker/docker.sock: Get "http://%2Frun%2Fdocker%2Fdocker.sock/v1.24/containers/json": dial unix /run/docker/docker.sock: connect: permission denied

Additional Context

GKE 1.24.

Same runners work in v0.27.1 and below (tried v0.27.0 and v0.26.0 as well).

Possibly a consequence of https://github.com/actions/actions-runner-controller/pull/2324 ? Not sure, didn’t have time to dig in further. I reverted controller to v0.27.1.

v0.27.2 upgrade was important because of failing metrics server on versions below v0.27.2 (fix was added in that version).

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 32
  • Comments: 25 (2 by maintainers)

Commits related to this issue

Most upvoted comments

somehow with v0.27.2, DOCKER_HOST env var changed to unix:///run/docker/docker.sock instead of using tls://. Combination with docker group id mismatch between runner container and docker container resulted in the above permission denied error Currently we have 2 workaround:

  1. Setting dockerdWithinRunnerContainer: true to make both runner and docker run in the same container as @Atsoamazed mentioned above
  2. Update env var from docker container to match with runner group id as following
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
spec:
  template:
    spec:
      containers:
       - name: docker
         env:
            - name: DOCKER_GROUP_GID
              value: "1001"

✅ Confirmed workaround solution by adding ENV to docker sidecar container. (Thanks to @vanhtuan0409 )

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: <YOUR-NAME>
  namespace: <YOUR-NS>
spec:
  template:
    spec:
      repository: <YOUR-REPO>
      labels:
        - <YOUR-LABEL>
      containers:
        - name: docker
          env:
            - name: DOCKER_GROUP_GID
              value: "1001"

For the first solution, I got

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.

Regarding the issue reported by @mrparkers in https://github.com/actions/actions-runner-controller/issues/2490#issuecomment-1507610018, I can replicate that, too.

Using the default runners with sidecar dind, the docker socket is now available on /var/run/docker/docker.sock instead of /var/run/docker.sock potentially breaking dockerfile based actions if they try to use docker from inside the Dockerfile based action. I don’t think that this can be solved by modifying actions or anything, as the action runner constructs the arguments it sets for the docker run calls when using Dockerfile based actions.

The sidecar-less setup with dind still works as expected as the /var/run/docker.sock is available.

My guess is, that #2324 introduced this change in this block: https://github.com/milas/actions-runner-controller/blob/480a83d90d5b81a10b92a7a61d7cc8b4973655cc/controllers/actions.summerwind.net/runner_controller.go#L1005-L1032

I’m running into the same issue using EKS v1.25 and Ubuntu 22.04 runners, which are running docker-based actions, which themselves are running docker-compose. So this is essentially docker in docker in docker (or containerd, rather). This workflow was working in v0.27.1.

In v0.27.2, I’ve verified that the runner pod is still able to use docker just fine:

$ kubectl exec -it -c runner org-m7skv-sd6hx -- /bin/bash
runner@org-m7skv-sd6hx:/$ docker ps
CONTAINER ID   IMAGE                                     COMMAND                  CREATED         STATUS         PORTS     NAMES
731aed091e81   60e226:eb06da2dd3ec44b28ceed4eea31cebcf   "/opt/action-run.sh"   2 seconds ago   Up 2 seconds             e226eb06da2dd3ec44b28ceed4eea31cebcf_381567

However this container isn’t able to use docker, despite this working in v0.27.1:

runner@org-m7skv-sd6hx:/$ docker exec -it 731aed091e81 /bin/bash
root@731aed091e81:/github/workspace# docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

I think this is broken because GitHub Actions automatically mounts the docker socket for docker-based actions, but it assumes that the docker socket is available at /var/run/docker.sock, which is no longer the case as of v0.27.2:

runner@org-m7skv-sd6hx:/$ docker ps
CONTAINER ID   IMAGE                                     COMMAND                  CREATED         STATUS         PORTS     NAMES
731aed091e81   60e226:eb06da2dd3ec44b28ceed4eea31cebcf   "/opt/action-run.sh"   4 minutes ago   Up 4 minutes             e226eb06da2dd3ec44b28ceed4eea31cebcf_381567
runner@org-m7skv-sd6hx:/$ docker inspect -f '{{ .Mounts }}' 731aed091e81
[{bind  /runner/_work/_temp/_github_home /github/home   true rprivate} {bind  /runner/_work/_temp/_github_workflow /github/workflow   true rprivate} {bind  /runner/_work/_temp/_runner_file_commands /github/file_commands   true rprivate} {bind  /runner/_work/test-build/test-build /github/workspace   true rprivate} {bind  /var/run/docker.sock /var/run/docker.sock   true rprivate}]

It’s easier to see in the JSON returned via docker inspect:

        "Mounts": [
            // other mounts omitted for brevity
            {
                "Type": "bind",
                "Source": "/var/run/docker.sock",
                "Destination": "/var/run/docker.sock",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],

This mount isn’t correct anymore - the source should be /var/run/docker/docker.sock now. Unfortunately, it doesn’t seem possible to override this via GitHub actions (https://github.com/actions/runner/pull/1754).

Perhaps it’s possible to use a tool like socat to create a fake socket at /var/run/docker.sock that forwards traffic to /var/run/docker/docker.sock? I didn’t have any luck when trying this myself, but perhaps someone smarter than me can figure it out.

For now, I need to remain on v0.27.1 until /var/run/docker.sock is available again.

@mumoshu I actually use EKS 1.25

somehow with v0.27.2, DOCKER_HOST env var changed to unix:///run/docker/docker.sock instead of using tls://. Combination with docker group id mismatch between runner container and docker container resulted in the above permission denied error Currently we have 2 workaround:

  1. Setting dockerdWithinRunnerContainer: true to make both runner and docker run in the same container as @Atsoamazed mentioned above
  2. Update env var from docker container to match with runner group id as following
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
spec:
  template:
    spec:
      containers:
       - name: docker
         env:
            - name: DOCKER_GROUP_GID
              value: "1001"

Experiencing the same issue, had to rollback to v0.27.1 as well.

✅ Confirmed workaround solution by adding ENV to docker sidecar container. (Thanks to @vanhtuan0409 )

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: <YOUR-NAME>
  namespace: <YOUR-NS>
spec:
  template:
    spec:
      repository: <YOUR-REPO>
      labels:
        - <YOUR-LABEL>
      containers:
        - name: docker
          env:
            - name: DOCKER_GROUP_GID
              value: "1001"

Thanks @l2D! Works as expected.

Self managed k8s on aws ec2 running 1.24.6

it seems like there is some issue in the docker publish pipeline. The controller is currently using docker hub image, which will have docker gid set to 1001

you can easily check on dockerhub with this link https://hub.docker.com/layers/summerwind/actions-runner/latest/images/sha256-202c64d20e5a35511eb541df7e6d72fd7e415d712c669a4783f48bd39c70fc68?context=explore

or by running a simple container for inspection image

Was able to get pass this error by setting

dockerdWithinRunnerContainer: true

In the runnerdeployment manifest

Experienced the same issue on:

  • chart version 0.23.1
  • controller version 0.27.2
  • runner version v2.299.1-ubuntu-20.04

Logs show:

Got permission denied while trying to connect to the Docker daemon socket at unix:///run/docker/docker.sock: Get “http://%2Frun%2Fdocker%2Fdocker.sock/v1.24/containers/json”: dial unix /run/docker/docker.sock: connect: permission denied

Cannot connect to the Docker daemon at unix:///run/docker/docker.sock. Is the docker daemon running?

All works fine on chart version 0.23.0, controller 0.27.1, same runner.