actions-runner-controller: Unable to get dockerVolumeMounts working

Hi, I am trying to mount a AWS fsx volume to docker:dind image with the new dockerVolumeMounts feature and I am not sure if it is working as expected.

I puller a docker image from inside one runner and ried to do the same from another runner. The expectation was that it would not pull it again in the 2nd runner but it did.

the nodes are in the same AZ as the FSx volume and all the GHA are running on these nodes.

Chart version: 0.10.5 Controller: v0.18.2

Runner config

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: comtravo-github-actions-deployment
  namespace: ${kubernetes_namespace.ci.metadata[0].name}
spec:
  template:
    spec:
      nodeSelector:
        node.k8s.comtravo.com/workergroup-name: github-actions
      image: harbor/cache/comtravo/actions-runner:v2.277.1
      imagePullPolicy: Always
      repository: ${local.actions.git_repository}
      serviceAccountName: ${local.actions.service_account_name}
      securityContext:
        fsGroup: 1447
      dockerVolumeMounts:
      - name: docker-volume
        mountPath: /var/lib/docker
      volumes:
      - name: docker-volume
        persistentVolumeClaim:
          claimName: ${kubernetes_persistent_volume_claim.actions_docker_volume.metadata[0].name}
      resources:
        limits:
          memory: "4Gi"
        requests:
          memory: "256Mi"
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: comtravo-github-actions-deployment-autoscaler
  namespace: ${kubernetes_namespace.ci.metadata[0].name}
spec:
  scaleTargetRef:
    name: comtravo-github-actions-deployment
  minReplicas: 4
  maxReplicas: 100
  metrics:
  - type: TotalNumberOfQueuedAndInProgressWorkflowRuns
    repositoryNames:
      - summerwind/actions-runner-controller
  scaleUpTriggers:
  - githubEvent:
      checkRun:
        types: ["created"]
        status: "queued"
    amount: 1
    duration: "1m"

k -n ci describe runner comtravo-github-actions-deployment-8f2gx-5bmhm

Name:         comtravo-github-actions-deployment-8f2gx-5bmhm
Namespace:    ci
Labels:       runner-deployment-name=comtravo-github-actions-deployment
              runner-template-hash=6959d947d9
Annotations:  <none>
API Version:  actions.summerwind.dev/v1alpha1
Kind:         Runner
Metadata:
  Creation Timestamp:  2021-04-13T14:35:10Z
  Finalizers:
    runner.actions.summerwind.dev
  Generate Name:  comtravo-github-actions-deployment-8f2gx-
  Generation:     1
  Managed Fields:
    API Version:  actions.summerwind.dev/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
        f:generateName:
        f:labels:
          .:
          f:runner-deployment-name:
          f:runner-template-hash:
        f:ownerReferences:
      f:spec:
        .:
        f:dockerdContainerResources:
        f:image:
        f:imagePullPolicy:
        f:nodeSelector:
          .:
          f:node.k8s.comtravo.com/workergroup-name:
        f:repository:
        f:resources:
          .:
          f:limits:
            .:
            f:memory:
          f:requests:
            .:
            f:memory:
        f:securityContext:
          .:
          f:fsGroup:
        f:serviceAccountName:
        f:volumes:
      f:status:
        .:
        f:lastRegistrationCheckTime:
        f:phase:
        f:registration:
          .:
          f:expiresAt:
          f:repository:
          f:token:
    Manager:    manager
    Operation:  Update
    Time:       2021-04-13T15:07:16Z
  Owner References:
    API Version:           actions.summerwind.dev/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  RunnerReplicaSet
    Name:                  comtravo-github-actions-deployment-8f2gx
    UID:                   2492f02a-ee74-4777-9df9-9fb07d9b138f
  Resource Version:        69345080
  Self Link:               /apis/actions.summerwind.dev/v1alpha1/namespaces/ci/runners/comtravo-github-actions-deployment-8f2gx-5bmhm
  UID:                     5c7c3de8-15ba-41ee-80ea-a291c0cbada8
Spec:
  Dockerd Container Resources:
  Image:              harbor.infra.comtravo.com/cache/comtravo/actions-runner:v2.277.1
  Image Pull Policy:  Always
  Node Selector:
    node.k8s.comtravo.com/workergroup-name:  github-actions
  Repository:                                comtravo/ct-backend
  Resources:
    Limits:
      Memory:  4Gi
    Requests:
      Memory:  256Mi
  Security Context:
    Fs Group:            1447
  Service Account Name:  actions
  Volumes:
    Name:  docker-volume
    Persistent Volume Claim:
      Claim Name:  actions-docker-volume
Status:
  Last Registration Check Time:  2021-04-13T15:07:16Z
  Phase:                         Running
  Registration:
    Expires At:  2021-04-13T15:34:31Z
    Repository:  comtravo/ct-backend
    Token:       ASS5GHOQCZPOS6FVDRFG2YTAOW5APAVPNFXHG5DBNRWGC5DJN5XF62LEZYANUGERWFUW443UMFWGYYLUNFXW4X3UPFYGLN2JNZ2GKZ3SMF2GS33OJFXHG5DBNRWGC5DJN5XA
Events:
  Type    Reason                    Age                From               Message
  ----    ------                    ----               ----               -------
  Normal  RegistrationTokenUpdated  31m                runner-controller  Successfully update registration token
  Normal  PodCreated                19s (x2 over 31m)  runner-controller  Created pod 'comtravo-github-actions-deployment-8f2gx-5bmhm'

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 31 (15 by maintainers)

Most upvoted comments

@mumoshu @asoldino I can’t thank you enough for this feature. This feature finally enabled me to switch to GHA from Jenkins for our heavier workloads and huge docker images 😅 this feature + buildkit is just amazing. It is such a liberating feeling to have almost deprecated Jenkins 😄

Sure:

  • I have three node pools in my cluster (one for system pod, one for the platform pods - including the actions-runner-controller, one for the runners)
  • The runners node pool has a node autoscaler active (using managed components from AKS)
  • I make sure Kubernetes schedules the runners on the dedicated node pool e.g.
#...
kind: RunnerDeployment
spec:
  template:
    spec:
      nodeSelector:
        agentpool: runners
#...
  • I make sure to cap the resources of one runner node by e.g. (for 8 cores 32 Gib Ram nodes)
#...
resources:
  limits:
    cpu: "4.0"
    memory: "16Gi"
dockerContainerResources:
  limits:
    cpu: "4.0"
    memory: "16Gi"
#...
  • I have a HorizontalRunnerAutoscaler, e.g.
#...
kind: HorizontalRunnerAutoscaler
spec:
  scaleTargetRef:
    name: runners
#...

To recap: Resource request forces Kubernetes to schedule one pod per runner node, when the runner autoscaler kicks in then the node autoscaler provisions the extra nodes required and Kubernetes can eventually run the additional pod.

@Puneeth-n If you still need to use FSx, I think actions-runner-controller needs to be enhanced to enable the user to specify a PVC template rather than a PVC, like a K8s statefulset.

@prein Hey! I believe we had only two options so far- mount the host /var/lib/docker onto the runner pod and ensure there’s only one runner per node, or use emptyDir/dynamic local volume. Neither solutions share /var/lib/docker across pods.

It remains the best practice NOT to share it. I’d consider the use of subPathExpr in this context a variant of the latter option, because it enables you to have a unique /var/lib/docker volume per pod, not shared.

@antodoms I do not recommend EFS for anything. I tried it years back to mount the EFS volume across multiple Jenkins agents to have the same source code. I had issues with file consistencies across AZ

@Puneeth-n I’m not actively working on the workflows, I’m “just” a platform provider for my company. I can tell there are a few teams using --cache-from or buildkit, because most of the container jobs are usually normal jobs executed within a container instead of the runner directly. For us, it’s much faster and easier to manage.

It was added here https://github.com/summerwind/actions-runner-controller/pull/439 to resolve https://github.com/summerwind/actions-runner-controller/issues/435. Unforunately it’s is another feature that has been added without any documentation by the author so it’s not clear on how it is expected to be used. Those issues may help @Puneeth-n with figuring that out.

A PR to add docs would be greatly appreciated by yourself or @asoldino the original author.