actions-runner-controller: Runners created by a 0.27.2 controller failing with dial unix /run/docker/docker.sock: connect: permission denied
Checks
- I’ve already read https://github.com/actions/actions-runner-controller/blob/master/TROUBLESHOOTING.md and I’m sure my issue is not covered in the troubleshooting guide.
- I’m not using a custom entrypoint in my runner image
Controller Version
0.27.2
Helm Chart Version
0.23.0
CertManager Version
No response
Deployment Method
Helm
cert-manager installation
Yes
Checks
- This isn’t a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
- I’ve read releasenotes before submitting this issue and I’m sure it’s not due to any recently-introduced backward-incompatible changes
- My actions-runner-controller version (v0.x.y) does support the feature
- I’ve already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn’t fix the issue
- I’ve migrated to the workflow job webhook event (if you using webhook driven scaling)
Resource Definitions
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: xxx
spec:
replicas: 1
template:
spec:
labels:
- self-hosted
- linux
- xxx
organization: xxx
resources: {}
To Reproduce
1. Upgrade runner-controller to summerwind/actions-runner-controller:v0.27.2
2. Create a runner
3. Watch runner container logs
Describe the bug
After an upgrade of runner-controller to v0.27.2, an error is thrown in runner:
runner Got permission denied while trying to connect to the Docker daemon socket at unix:///run/docker/docker.sock: Get "http://%2Frun%2Fdocker%2Fdocker.sock/v1.24/containers │
│ /json": dial unix /run/docker/docker.sock: connect: permission denied
which prevents it from ever picking up a Github job to work on.
Same runner works fine in v0.27.1, v0.27.0, v0.26.0
Describe the expected behavior
An error not to be thrown, and runner to pick up a job from Github like it did in v0.27.1 and below
Whole Controller Logs
Nothing of interest in the logs:
2023-04-06T10:45:30Z DEBUG runner Runner appears to have been registered and running. {"runner": "yy/xxxx-gbw9x-md9pg", "podCreationTimestamp": "2023-04-06 10:4 │
│ 5:27 +0000 UTC"}
The pod is up technically but the runner container is failing internally.
Whole Runner Pod Logs
Got permission denied while trying to connect to the Docker daemon socket at unix:///run/docker/docker.sock: Get "http://%2Frun%2Fdocker%2Fdocker.sock/v1.24/containers/json": dial unix /run/docker/docker.sock: connect: permission denied
Additional Context
GKE 1.24.
Same runners work in v0.27.1 and below (tried v0.27.0 and v0.26.0 as well).
Possibly a consequence of https://github.com/actions/actions-runner-controller/pull/2324 ? Not sure, didn’t have time to dig in further. I reverted controller to v0.27.1.
v0.27.2 upgrade was important because of failing metrics server on versions below v0.27.2 (fix was added in that version).
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 32
- Comments: 25 (2 by maintainers)
Commits related to this issue
- Fix docker.sock permission error for non-dind runners since v0.27.2 #2490 has been happening since v0.27.2 for non-dind runners based on Ubuntu 20.04 runner images. It does not affect Ubuntu 22.04 ru... — committed to actions/actions-runner-controller by mumoshu a year ago
- Fix docker.sock permission error for non-dind runners since v0.27.2 #2490 has been happening since v0.27.2 for non-dind runners based on Ubuntu 20.04 runner images. It does not affect Ubuntu 22.04 ru... — committed to actions/actions-runner-controller by mumoshu a year ago
- fix(apps/actions-runner-controller): downgrade to `0.23.0` https://github.com/actions/actions-runner-controller/issues/2490 — committed to invakid404/home-cluster by invakid404 a year ago
- Fix docker.sock permission error for non-dind Ubuntu 20.04 runners since v0.27.2 (#2499) #2490 has been happening since v0.27.2 for non-dind runners based on Ubuntu 20.04 runner images. It does not a... — committed to actions/actions-runner-controller by mumoshu a year ago
- Bump chart version to v0.23.2 for ARC v0.27.3 Ref #2490 — committed to actions/actions-runner-controller by mumoshu a year ago
- Bump chart version to v0.23.2 for ARC v0.27.3 (#2514) Ref #2490 — committed to actions/actions-runner-controller by mumoshu a year ago
- Try out solution for Runner docker issue See: https://github.com/actions/actions-runner-controller/issues/2490 — committed to arikkfir/delivery by arikkfir a year ago
✅ Confirmed workaround solution by adding
ENV
todocker
sidecar container. (Thanks to @vanhtuan0409 )For the first solution, I got
Regarding the issue reported by @mrparkers in https://github.com/actions/actions-runner-controller/issues/2490#issuecomment-1507610018, I can replicate that, too.
Using the default runners with sidecar dind, the docker socket is now available on
/var/run/docker/docker.sock
instead of/var/run/docker.sock
potentially breaking dockerfile based actions if they try to use docker from inside the Dockerfile based action. I don’t think that this can be solved by modifying actions or anything, as the action runner constructs the arguments it sets for thedocker run
calls when using Dockerfile based actions.The sidecar-less setup with dind still works as expected as the
/var/run/docker.sock
is available.My guess is, that #2324 introduced this change in this block: https://github.com/milas/actions-runner-controller/blob/480a83d90d5b81a10b92a7a61d7cc8b4973655cc/controllers/actions.summerwind.net/runner_controller.go#L1005-L1032
I’m running into the same issue using EKS v1.25 and Ubuntu 22.04 runners, which are running docker-based actions, which themselves are running
docker-compose
. So this is essentially docker in docker in docker (orcontainerd
, rather). This workflow was working in v0.27.1.In v0.27.2, I’ve verified that the runner pod is still able to use
docker
just fine:However this container isn’t able to use
docker
, despite this working in v0.27.1:I think this is broken because GitHub Actions automatically mounts the docker socket for docker-based actions, but it assumes that the docker socket is available at
/var/run/docker.sock
, which is no longer the case as of v0.27.2:It’s easier to see in the JSON returned via
docker inspect
:This mount isn’t correct anymore - the source should be
/var/run/docker/docker.sock
now. Unfortunately, it doesn’t seem possible to override this via GitHub actions (https://github.com/actions/runner/pull/1754).Perhaps it’s possible to use a tool like
socat
to create a fake socket at/var/run/docker.sock
that forwards traffic to/var/run/docker/docker.sock
? I didn’t have any luck when trying this myself, but perhaps someone smarter than me can figure it out.For now, I need to remain on v0.27.1 until
/var/run/docker.sock
is available again.@mumoshu I actually use EKS 1.25
somehow with
v0.27.2
,DOCKER_HOST
env var changed tounix:///run/docker/docker.sock
instead of usingtls://
. Combination with docker group id mismatch between runner container and docker container resulted in the above permission denied error Currently we have 2 workaround:dockerdWithinRunnerContainer: true
to make both runner and docker run in the same container as @Atsoamazed mentioned aboveExperiencing the same issue, had to rollback to v0.27.1 as well.
Thanks @l2D! Works as expected.
Self managed k8s on aws ec2 running 1.24.6
it seems like there is some issue in the docker publish pipeline. The controller is currently using docker hub image, which will have docker gid set to 1001
you can easily check on dockerhub with this link https://hub.docker.com/layers/summerwind/actions-runner/latest/images/sha256-202c64d20e5a35511eb541df7e6d72fd7e415d712c669a4783f48bd39c70fc68?context=explore
or by running a simple container for inspection
Was able to get pass this error by setting
dockerdWithinRunnerContainer: true
In the runnerdeployment manifest
Experienced the same issue on:
Logs show:
All works fine on chart version 0.23.0, controller 0.27.1, same runner.