actions-runner-controller: Runner-Scale-Set in Kubernetes mode fails when writing to /home/runner/_work

Checks

Controller Version

0.5.0

Helm Chart Version

0.5.0

CertManager Version

1.12.1

Deployment Method

Helm

cert-manager installation

Yes, it’s also used in production.

Checks

  • This isn’t a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
  • I’ve read releasenotes before submitting this issue and I’m sure it’s not due to any recently-introduced backward-incompatible changes
  • My actions-runner-controller version (v0.x.y) does support the feature
  • I’ve already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn’t fix the issue
  • I’ve migrated to the workflow job webhook event (if you using webhook driven scaling)

Resource Definitions

# Source: gha-runner-scale-set/templates/autoscalingrunnerset.yaml
apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
  name: github-runners
  namespace: github-runners
  labels:
    app.kubernetes.io/component: "autoscaling-runner-set"
    helm.sh/chart: gha-rs-0.5.0
    app.kubernetes.io/name: gha-rs
    app.kubernetes.io/instance: github-runners
    app.kubernetes.io/version: "0.5.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: gha-rs
    actions.github.com/scale-set-name: github-runners
    actions.github.com/scale-set-namespace: github-runners
  annotations:
    actions.github.com/cleanup-github-secret-name: github-runners-gha-rs-github-secret
    actions.github.com/cleanup-manager-role-binding: github-runners-gha-rs-manager
    actions.github.com/cleanup-manager-role-name: github-runners-gha-rs-manager
    actions.github.com/cleanup-kubernetes-mode-role-binding-name: github-runners-gha-rs-kube-mode
    actions.github.com/cleanup-kubernetes-mode-role-name: github-runners-gha-rs-kube-mode
    actions.github.com/cleanup-kubernetes-mode-service-account-name: github-runners-gha-rs-kube-mode
spec:
  githubConfigUrl: https://github.com/privaterepo
  githubConfigSecret: github-runners-gha-rs-github-secret
  runnerGroup: runners
  maxRunners: 3
  minRunners: 1

  template:
    spec:
      securityContext: 
        fsGroup: 1001
      serviceAccountName: github-runners-gha-rs-kube-mode
      containers:
      - name: runner
        
        command: 
          - /home/runner/run.sh
        image: 
          ghcr.io/actions/actions-runner:latest
        env:
          - 
            name: ACTIONS_RUNNER_CONTAINER_HOOKS
            value: /home/runner/k8s/index.js
          - 
            name: ACTIONS_RUNNER_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - 
            name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
            value: "true"
        volumeMounts:
          - 
            mountPath: /home/runner/_work
            name: work
      
      volumes:
      
      - 
        ephemeral:
          volumeClaimTemplate:
            spec:
              accessModes:
              - ReadWriteOnce
              resources:
                requests:
                  storage: 4Gi
              storageClassName: zrs-delete
        name: work

To Reproduce

I deployed the gha-runner-scale-set-controller in the standard Helm chart configuration and gha-runner-scale-set chart with the following values:

githubConfigUrl: "https://github.com/privaterepo"
minRunners: 1
maxRunners: 3
runnerGroup: "runners"
containerMode:
  type: "kubernetes"
template:
  spec:
    securityContext:
      fsGroup: 1001
    containers:
    - name: runner
      image: ghcr.io/actions/actions-runner:latest
      command: ["/home/runner/run.sh"]
      env:
        - name: ACTIONS_RUNNER_CONTAINER_HOOKS
          value: /home/runner/k8s/index.js
        - name: ACTIONS_RUNNER_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          value: "true"
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work
    volumes:
      - name: work
        ephemeral:
          volumeClaimTemplate:
            spec:
              accessModes: [ "ReadWriteOnce" ]
              storageClassName: "zrs-delete"
              resources:
                requests:
                  storage: 4Gi
controllerServiceAccount:
  namespace: github-arc
  name: github-arc

The storage is provisioned by Azure StandardSSD_ZRS.

Describe the bug

When I run a workflow on a self-hosted runner it always fails at the actions/checkout@v3 action with this error:

Run ‘/home/runner/k8s/index.js’ shell: /home/runner/externals/node16/bin/node {0} node:internal/fs/utils:347 throw err; ^

Error: EACCES: permission denied, open ‘/__w/_temp/_runner_file_commands/save_state_62e77b56-723c-4eef-bfa1-55e26a98e636’ at Object.openSync (node:fs:590:3) at Object.writeFileSync (node:fs:2202:35) at Object.appendFileSync (node:fs:2264:6) at Object.issueFileCommand (/__w/_actions/actions/checkout/v3/dist/index.js:2950:8) at Object.saveState (/__w/_actions/actions/checkout/v3/dist/index.js:2867:31) at Object.8647 (/__w/_actions/actions/checkout/v3/dist/index.js:2326:10) at nccwpck_require (/__w/_actions/actions/checkout/v3/dist/index.js:18256:43) at Object.2565 (/__w/_actions/actions/checkout/v3/dist/index.js:146:34) at nccwpck_require (/__w/_actions/actions/checkout/v3/dist/index.js:18256:43) at Object.9210 (/__w/_actions/actions/checkout/v3/dist/index.js:1141:36) { errno: -13, syscall: ‘open’, code: ‘EACCES’, path: ‘/__w/_temp/_runner_file_commands/save_state_62e77b56-723c-4eef-bfa1-55e26a98e636’ } Error: Error: failed to run script step: command terminated with non-zero exit code: error executing command [sh -e /__w/_temp/341dc910-5143-11ee-90c0-098a53d3ac15.sh], exit code 1 Error: Process completed with exit code 1. Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.

Looking inside the pod I see that the owner of _work is user root.

4.0K drwxrwsr-x 3 root runner 4.0K Sep 12 08:06 _work

Describe the expected behavior

The checkout action should have no issues checking out a repository by writing to /home/runner/_work/ inside a runner pod.

I found this issue in the runner repository which proposes to set user ownership to the runner user. I’m not sure how to do that and why it’s necessary with a rather standard deployment of the runner scale set. I already configured fsGroup as per troubleshooting docs.

According to this comment I’m not supposed to set containerMode when configuring the template section. However this disables the kube mode role, rolebinding and serviceaccount in the chart, creates the noPermissionServiceAccount and the runner doesn’t work at all.

Whole Controller Logs

https://gist.github.com/bobertrublik/4ee34181ceda6da120bd91fd8f68754c

Whole Runner Pod Logs

https://gist.github.com/bobertrublik/d770a62c64679db5b9eab5644f0cfebc

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Reactions: 2
  • Comments: 25 (13 by maintainers)

Most upvoted comments

Can you try to run init container that will apply correct permissions to all files under /home/runner? The runner we provide has the UID is 1001 and the GID is 123. Or maybe use fsGroup with the 123 value?

I’m wondering why this is not included as init-container in the chart. @nikola-jokic

Hey @Ravio1i,

This is a slightly more difficult problem. One possible way that you can overcome this issue is by using container hook 0.4.0`. We introduced a hook extension that will take a template and modify the default pod spec created by the hook. You can specify securityContext there, that will be applied to the job container. This ADR aims to document the way the hook extension works