actions-runner-controller: Pods created by `RunnerSet` are terminated while still running jobs on a rolling update
Controller Version
0.24.0
Helm Chart Version
0.19.0
CertManager Version
1.4.1
Deployment Method
Helm
cert-manager installation
Deploying cert-manager
Helm chart from https://charts.jetstack.io/
Checks
- This isn’t a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
- I’ve read releasenotes before submitting this issue and I’m sure it’s not due to any recently-introduced backward-incompatible changes
- My actions-runner-controller version (v0.x.y) does support the feature
- I’ve already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn’t fix the issue
Resource Definitions
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: clr-runner
namespace: actions-runner-groups
spec:
dockerdWithinRunnerContainer: false
ephemeral: true
labels:
- clr-runner
organization: color
replicas: 1
selector:
matchLabels:
app: clr-runner
serviceName: clr-runner
template:
metadata:
labels:
app: clr-runner
spec:
containers:
- image: 301643779712.dkr.ecr.us-east-1.amazonaws.com/color-actions-runner:master_c704c14c
name: runner
resources:
limits:
cpu: 8
memory: 32Gi
requests:
cpu: 8
memory: 32Gi
- image: public.ecr.aws/docker/library/docker:dind
name: docker
securityContext:
fsGroup: 1000
serviceAccountName: actions-runner
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: clr-runner
namespace: actions-runner-groups
spec:
maxReplicas: 30
metrics:
- scaleDownFactor: "0.7"
scaleDownThreshold: "0.2"
scaleUpFactor: "2.5"
scaleUpThreshold: "0.5"
type: PercentageRunnersBusy
minReplicas: 1
scaleDownDelaySecondsAfterScaleOut: 3600
scaleTargetRef:
kind: RunnerSet
name: clr-runner
To Reproduce
1. Change the `image` for the `runner` container in the manifest above
2. Launch a workflow that starts running some jobs on the runners
3. `kubectl apply` it to update the runners
Describe the bug
All runner pods are restarted within < a minute of the kubectl apply
, even those that are still running jobs. The in-flight jobs are dropped and appear hung in the GitHub UI (they eventually time out).
Describe the expected behavior
Pods aren’t terminated until they’ve finished running in-flight jobs
Controller Logs
Will post in separate comment to avoid #1533
Runner Pod Logs
N/A pods are deleted
Additional Context
No response
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 1
- Comments: 18 (10 by maintainers)
Commits related to this issue
- e2e: Continuous rolling-update of runners while workflow jobs are running This should help revealing issues like https://github.com/actions-runner-controller/actions-runner-controller/issues/1535 if ... — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix RunnerSet-managed rootless-dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix rootless-dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix rootless-dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix rootless/rootful dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix rootless/rootful dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix runners with dind sidecars to gracefully stop on eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix runners with dind sidecars to gracefully stop on eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix rootless/rootful dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix runners with dind sidecars to gracefully stop on eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix rootless/rootful dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix runners with dind sidecars to gracefully stop on eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix rootless/rootful dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix rootless/rootful dind runners to gracefully stop on pod eviction Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
- Fix runners to do their best to gracefully stop on pod eviction (#1759) Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> — committed to actions/actions-runner-controller by mumoshu 2 years ago
Hey everyone! I have an update- #1759 should fix this.
In contrast to RunnerDeployment, RunnerSet-managed runner pods don’t have the same controller-side graceful termination logic. That doesn’t change in #1759.
However, you can now let the vanilla Kubernetes pod termination process correctly graceful-stop runners. Configure
RUNNER_GRACEFUL_STOP_TIMEOUT
andterminationGracePeriodSeconds
appropriately. More information on the updated REAMDE.If you’re interested in how it’s supposed to work, please read the new section in the updated README, and also https://github.com/actions-runner-controller/actions-runner-controller/issues/1581#issuecomment-1229616193.