actions-runner-controller: Runner-Scale-Set in Kubernetes mode fails when writing to /home/runner/_work
Checks
- I’ve already read https://github.com/actions/actions-runner-controller/blob/master/TROUBLESHOOTING.md and I’m sure my issue is not covered in the troubleshooting guide.
- I’m not using a custom entrypoint in my runner image
Controller Version
0.5.0
Helm Chart Version
0.5.0
CertManager Version
1.12.1
Deployment Method
Helm
cert-manager installation
Yes, it’s also used in production.
Checks
- This isn’t a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
- I’ve read releasenotes before submitting this issue and I’m sure it’s not due to any recently-introduced backward-incompatible changes
- My actions-runner-controller version (v0.x.y) does support the feature
- I’ve already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn’t fix the issue
- I’ve migrated to the workflow job webhook event (if you using webhook driven scaling)
Resource Definitions
# Source: gha-runner-scale-set/templates/autoscalingrunnerset.yaml
apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
  name: github-runners
  namespace: github-runners
  labels:
    app.kubernetes.io/component: "autoscaling-runner-set"
    helm.sh/chart: gha-rs-0.5.0
    app.kubernetes.io/name: gha-rs
    app.kubernetes.io/instance: github-runners
    app.kubernetes.io/version: "0.5.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: gha-rs
    actions.github.com/scale-set-name: github-runners
    actions.github.com/scale-set-namespace: github-runners
  annotations:
    actions.github.com/cleanup-github-secret-name: github-runners-gha-rs-github-secret
    actions.github.com/cleanup-manager-role-binding: github-runners-gha-rs-manager
    actions.github.com/cleanup-manager-role-name: github-runners-gha-rs-manager
    actions.github.com/cleanup-kubernetes-mode-role-binding-name: github-runners-gha-rs-kube-mode
    actions.github.com/cleanup-kubernetes-mode-role-name: github-runners-gha-rs-kube-mode
    actions.github.com/cleanup-kubernetes-mode-service-account-name: github-runners-gha-rs-kube-mode
spec:
  githubConfigUrl: https://github.com/privaterepo
  githubConfigSecret: github-runners-gha-rs-github-secret
  runnerGroup: runners
  maxRunners: 3
  minRunners: 1
  template:
    spec:
      securityContext: 
        fsGroup: 1001
      serviceAccountName: github-runners-gha-rs-kube-mode
      containers:
      - name: runner
        
        command: 
          - /home/runner/run.sh
        image: 
          ghcr.io/actions/actions-runner:latest
        env:
          - 
            name: ACTIONS_RUNNER_CONTAINER_HOOKS
            value: /home/runner/k8s/index.js
          - 
            name: ACTIONS_RUNNER_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - 
            name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
            value: "true"
        volumeMounts:
          - 
            mountPath: /home/runner/_work
            name: work
      
      volumes:
      
      - 
        ephemeral:
          volumeClaimTemplate:
            spec:
              accessModes:
              - ReadWriteOnce
              resources:
                requests:
                  storage: 4Gi
              storageClassName: zrs-delete
        name: work
To Reproduce
I deployed the gha-runner-scale-set-controller in the standard Helm chart configuration and gha-runner-scale-set chart with the following values:
githubConfigUrl: "https://github.com/privaterepo"
minRunners: 1
maxRunners: 3
runnerGroup: "runners"
containerMode:
  type: "kubernetes"
template:
  spec:
    securityContext:
      fsGroup: 1001
    containers:
    - name: runner
      image: ghcr.io/actions/actions-runner:latest
      command: ["/home/runner/run.sh"]
      env:
        - name: ACTIONS_RUNNER_CONTAINER_HOOKS
          value: /home/runner/k8s/index.js
        - name: ACTIONS_RUNNER_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          value: "true"
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work
    volumes:
      - name: work
        ephemeral:
          volumeClaimTemplate:
            spec:
              accessModes: [ "ReadWriteOnce" ]
              storageClassName: "zrs-delete"
              resources:
                requests:
                  storage: 4Gi
controllerServiceAccount:
  namespace: github-arc
  name: github-arc
The storage is provisioned by Azure StandardSSD_ZRS.
Describe the bug
When I run a workflow on a self-hosted runner it always fails at the actions/checkout@v3 action with this error:
Run ‘/home/runner/k8s/index.js’ shell: /home/runner/externals/node16/bin/node {0} node:internal/fs/utils:347 throw err; ^
Error: EACCES: permission denied, open ‘/__w/_temp/_runner_file_commands/save_state_62e77b56-723c-4eef-bfa1-55e26a98e636’ at Object.openSync (node:fs:590:3) at Object.writeFileSync (node:fs:2202:35) at Object.appendFileSync (node:fs:2264:6) at Object.issueFileCommand (/__w/_actions/actions/checkout/v3/dist/index.js:2950:8) at Object.saveState (/__w/_actions/actions/checkout/v3/dist/index.js:2867:31) at Object.8647 (/__w/_actions/actions/checkout/v3/dist/index.js:2326:10) at nccwpck_require (/__w/_actions/actions/checkout/v3/dist/index.js:18256:43) at Object.2565 (/__w/_actions/actions/checkout/v3/dist/index.js:146:34) at nccwpck_require (/__w/_actions/actions/checkout/v3/dist/index.js:18256:43) at Object.9210 (/__w/_actions/actions/checkout/v3/dist/index.js:1141:36) { errno: -13, syscall: ‘open’, code: ‘EACCES’, path: ‘/__w/_temp/_runner_file_commands/save_state_62e77b56-723c-4eef-bfa1-55e26a98e636’ } Error: Error: failed to run script step: command terminated with non-zero exit code: error executing command [sh -e /__w/_temp/341dc910-5143-11ee-90c0-098a53d3ac15.sh], exit code 1 Error: Process completed with exit code 1. Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
Looking inside the pod I see that the owner of _work is user root.
4.0K drwxrwsr-x 3 root   runner 4.0K Sep 12 08:06 _work
Describe the expected behavior
The checkout action should have no issues checking out a repository by writing to /home/runner/_work/ inside a runner pod.
I found this issue in the runner repository which proposes to set user ownership to the runner user.  I’m not sure how to do that and why it’s necessary with a rather standard deployment of the runner scale set. I already configured fsGroup as per troubleshooting docs.
According to this comment I’m not supposed to set containerMode when configuring the template section. However this disables the kube mode role, rolebinding and serviceaccount in the chart, creates the noPermissionServiceAccount and the runner doesn’t work at all.
Whole Controller Logs
https://gist.github.com/bobertrublik/4ee34181ceda6da120bd91fd8f68754c
Whole Runner Pod Logs
https://gist.github.com/bobertrublik/d770a62c64679db5b9eab5644f0cfebc
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Reactions: 2
- Comments: 25 (13 by maintainers)
I’m wondering why this is not included as init-container in the chart. @nikola-jokic
Hey @Ravio1i,
This is a slightly more difficult problem. One possible way that you can overcome this issue is by using container hook 0.4.0`. We introduced a hook extension that will take a template and modify the default pod spec created by the hook. You can specify securityContext there, that will be applied to the job container. This ADR aims to document the way the hook extension works