actions-runner-controller: Runner-Scale-Set in Kubernetes mode fails when writing to /home/runner/_work
Checks
- I’ve already read https://github.com/actions/actions-runner-controller/blob/master/TROUBLESHOOTING.md and I’m sure my issue is not covered in the troubleshooting guide.
- I’m not using a custom entrypoint in my runner image
Controller Version
0.5.0
Helm Chart Version
0.5.0
CertManager Version
1.12.1
Deployment Method
Helm
cert-manager installation
Yes, it’s also used in production.
Checks
- This isn’t a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
- I’ve read releasenotes before submitting this issue and I’m sure it’s not due to any recently-introduced backward-incompatible changes
- My actions-runner-controller version (v0.x.y) does support the feature
- I’ve already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn’t fix the issue
- I’ve migrated to the workflow job webhook event (if you using webhook driven scaling)
Resource Definitions
# Source: gha-runner-scale-set/templates/autoscalingrunnerset.yaml
apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
name: github-runners
namespace: github-runners
labels:
app.kubernetes.io/component: "autoscaling-runner-set"
helm.sh/chart: gha-rs-0.5.0
app.kubernetes.io/name: gha-rs
app.kubernetes.io/instance: github-runners
app.kubernetes.io/version: "0.5.0"
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/part-of: gha-rs
actions.github.com/scale-set-name: github-runners
actions.github.com/scale-set-namespace: github-runners
annotations:
actions.github.com/cleanup-github-secret-name: github-runners-gha-rs-github-secret
actions.github.com/cleanup-manager-role-binding: github-runners-gha-rs-manager
actions.github.com/cleanup-manager-role-name: github-runners-gha-rs-manager
actions.github.com/cleanup-kubernetes-mode-role-binding-name: github-runners-gha-rs-kube-mode
actions.github.com/cleanup-kubernetes-mode-role-name: github-runners-gha-rs-kube-mode
actions.github.com/cleanup-kubernetes-mode-service-account-name: github-runners-gha-rs-kube-mode
spec:
githubConfigUrl: https://github.com/privaterepo
githubConfigSecret: github-runners-gha-rs-github-secret
runnerGroup: runners
maxRunners: 3
minRunners: 1
template:
spec:
securityContext:
fsGroup: 1001
serviceAccountName: github-runners-gha-rs-kube-mode
containers:
- name: runner
command:
- /home/runner/run.sh
image:
ghcr.io/actions/actions-runner:latest
env:
-
name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/k8s/index.js
-
name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
-
name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
volumeMounts:
-
mountPath: /home/runner/_work
name: work
volumes:
-
ephemeral:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
storageClassName: zrs-delete
name: work
To Reproduce
I deployed the gha-runner-scale-set-controller
in the standard Helm chart configuration and gha-runner-scale-set
chart with the following values:
githubConfigUrl: "https://github.com/privaterepo"
minRunners: 1
maxRunners: 3
runnerGroup: "runners"
containerMode:
type: "kubernetes"
template:
spec:
securityContext:
fsGroup: 1001
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
env:
- name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/k8s/index.js
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
volumeMounts:
- name: work
mountPath: /home/runner/_work
volumes:
- name: work
ephemeral:
volumeClaimTemplate:
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "zrs-delete"
resources:
requests:
storage: 4Gi
controllerServiceAccount:
namespace: github-arc
name: github-arc
The storage is provisioned by Azure StandardSSD_ZRS
.
Describe the bug
When I run a workflow on a self-hosted runner it always fails at the actions/checkout@v3
action with this error:
Run ‘/home/runner/k8s/index.js’ shell: /home/runner/externals/node16/bin/node {0} node:internal/fs/utils:347 throw err; ^
Error: EACCES: permission denied, open ‘/__w/_temp/_runner_file_commands/save_state_62e77b56-723c-4eef-bfa1-55e26a98e636’ at Object.openSync (node:fs:590:3) at Object.writeFileSync (node:fs:2202:35) at Object.appendFileSync (node:fs:2264:6) at Object.issueFileCommand (/__w/_actions/actions/checkout/v3/dist/index.js:2950:8) at Object.saveState (/__w/_actions/actions/checkout/v3/dist/index.js:2867:31) at Object.8647 (/__w/_actions/actions/checkout/v3/dist/index.js:2326:10) at nccwpck_require (/__w/_actions/actions/checkout/v3/dist/index.js:18256:43) at Object.2565 (/__w/_actions/actions/checkout/v3/dist/index.js:146:34) at nccwpck_require (/__w/_actions/actions/checkout/v3/dist/index.js:18256:43) at Object.9210 (/__w/_actions/actions/checkout/v3/dist/index.js:1141:36) { errno: -13, syscall: ‘open’, code: ‘EACCES’, path: ‘/__w/_temp/_runner_file_commands/save_state_62e77b56-723c-4eef-bfa1-55e26a98e636’ } Error: Error: failed to run script step: command terminated with non-zero exit code: error executing command [sh -e /__w/_temp/341dc910-5143-11ee-90c0-098a53d3ac15.sh], exit code 1 Error: Process completed with exit code 1. Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
Looking inside the pod I see that the owner of _work
is user root.
4.0K drwxrwsr-x 3 root runner 4.0K Sep 12 08:06 _work
Describe the expected behavior
The checkout action should have no issues checking out a repository by writing to /home/runner/_work/
inside a runner pod.
I found this issue in the runner repository which proposes to set user ownership to the runner
user. I’m not sure how to do that and why it’s necessary with a rather standard deployment of the runner scale set. I already configured fsGroup
as per troubleshooting docs.
According to this comment I’m not supposed to set containerMode
when configuring the template
section. However this disables the kube mode role, rolebinding and serviceaccount in the chart, creates the noPermissionServiceAccount
and the runner doesn’t work at all.
Whole Controller Logs
https://gist.github.com/bobertrublik/4ee34181ceda6da120bd91fd8f68754c
Whole Runner Pod Logs
https://gist.github.com/bobertrublik/d770a62c64679db5b9eab5644f0cfebc
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Reactions: 2
- Comments: 25 (13 by maintainers)
I’m wondering why this is not included as init-container in the chart. @nikola-jokic
Hey @Ravio1i,
This is a slightly more difficult problem. One possible way that you can overcome this issue is by using container hook 0.4.0`. We introduced a hook extension that will take a template and modify the default pod spec created by the hook. You can specify securityContext there, that will be applied to the job container. This ADR aims to document the way the hook extension works