actions-runner-controller: Pods created by `RunnerSet` are terminated while still running jobs on a rolling update
Controller Version
0.24.0
Helm Chart Version
0.19.0
CertManager Version
1.4.1
Deployment Method
Helm
cert-manager installation
Deploying cert-manager Helm chart from https://charts.jetstack.io/
Checks
- This isn’t a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
 - I’ve read releasenotes before submitting this issue and I’m sure it’s not due to any recently-introduced backward-incompatible changes
 - My actions-runner-controller version (v0.x.y) does support the feature
 - I’ve already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn’t fix the issue
 
Resource Definitions
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
  name: clr-runner
  namespace: actions-runner-groups
spec:
  dockerdWithinRunnerContainer: false
  ephemeral: true
  labels:
  - clr-runner
  organization: color
  replicas: 1
  selector:
    matchLabels:
      app: clr-runner
  serviceName: clr-runner
  template:
    metadata:
      labels:
        app: clr-runner
    spec:
      containers:
      - image: 301643779712.dkr.ecr.us-east-1.amazonaws.com/color-actions-runner:master_c704c14c
        name: runner
        resources:
          limits:
            cpu: 8
            memory: 32Gi
          requests:
            cpu: 8
            memory: 32Gi
      - image: public.ecr.aws/docker/library/docker:dind
        name: docker
      securityContext:
        fsGroup: 1000
      serviceAccountName: actions-runner
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: clr-runner
  namespace: actions-runner-groups
spec:
  maxReplicas: 30
  metrics:
  - scaleDownFactor: "0.7"
    scaleDownThreshold: "0.2"
    scaleUpFactor: "2.5"
    scaleUpThreshold: "0.5"
    type: PercentageRunnersBusy
  minReplicas: 1
  scaleDownDelaySecondsAfterScaleOut: 3600
  scaleTargetRef:
    kind: RunnerSet
    name: clr-runner
To Reproduce
1. Change the `image` for the `runner` container in the manifest above
2. Launch a workflow that starts running some jobs on the runners
3. `kubectl apply` it to update the runners
Describe the bug
All runner pods are restarted within < a minute of the kubectl apply, even those that are still running jobs. The in-flight jobs are dropped and appear hung in the GitHub UI (they eventually time out).
Describe the expected behavior
Pods aren’t terminated until they’ve finished running in-flight jobs
Controller Logs
Will post in separate comment to avoid #1533
Runner Pod Logs
N/A pods are deleted
Additional Context
No response
About this issue
- Original URL
 - State: open
 - Created 2 years ago
 - Reactions: 1
 - Comments: 18 (10 by maintainers)
 
Commits related to this issue
- e2e: Continuous rolling-update of runners while workflow jobs are running This should help revealing issues like https://github.com/actions-runner-controller/actions-runner-controller/issues/1535 if ... — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix RunnerSet-managed rootless-dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix rootless-dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix rootless-dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix rootless/rootful dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix rootless/rootful dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix runners with dind sidecars to gracefully stop on eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix runners with dind sidecars to gracefully stop on eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix rootless/rootful dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix runners with dind sidecars to gracefully stop on eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix rootless/rootful dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix runners with dind sidecars to gracefully stop on eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix rootless/rootful dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix rootless/rootful dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
 - Fix runners to do their best to gracefully stop on pod eviction (#1759) Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
 
Hey everyone! I have an update- #1759 should fix this.
In contrast to RunnerDeployment, RunnerSet-managed runner pods don’t have the same controller-side graceful termination logic. That doesn’t change in #1759.
However, you can now let the vanilla Kubernetes pod termination process correctly graceful-stop runners. Configure
RUNNER_GRACEFUL_STOP_TIMEOUTandterminationGracePeriodSecondsappropriately. More information on the updated REAMDE.If you’re interested in how it’s supposed to work, please read the new section in the updated README, and also https://github.com/actions-runner-controller/actions-runner-controller/issues/1581#issuecomment-1229616193.