actions-runner-controller: arc-runner-set not scaling down intermittently | gha-runner-scale-set:0.6.1
Checks
- I’ve already read https://github.com/actions/actions-runner-controller/blob/master/TROUBLESHOOTING.md and I’m sure my issue is not covered in the troubleshooting guide.
- I’m not using a custom entrypoint in my runner image
Controller Version
0.6.1
Helm Chart Version
0.6.1
CertManager Version
No response
Deployment Method
Helm
cert-manager installation
NA
Checks
- This isn’t a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
- I’ve read releasenotes before submitting this issue and I’m sure it’s not due to any recently-introduced backward-incompatible changes
- My actions-runner-controller version (v0.x.y) does support the feature
- I’ve already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn’t fix the issue
- I’ve migrated to the workflow job webhook event (if you using webhook driven scaling)
Resource Definitions
template:
spec:
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
#intentionally kept 10s sleep to make sure istio-proxy container is up and running
command: [ "/bin/bash","-c","sleep 10 && /home/runner/run.sh" ]
To Reproduce
I am trying out github ARC
1. I have created a sample workflow which just has 1 job and performs sleep 30s and prints Hello World.
2. I have managed to set-up the arc-gha-rs-controller and arc-runner-set properly and they are up and running.
3. Job is running as expected but some times after the job completion the runner-set is not scaling down to 0. (Intermittent)
Describe the bug
When the job finishes runner-scale-set is not scaling down to 0. This is a intermittent behavior. So I have this stale runner staying there in the cluster(I have not set any min or max runners). Then if I trigger a new job this stale runner is not picking up the job and job will be in queue state for ever.
Describe the expected behavior
As soon as the job completes the runner should terminate.
Whole Controller Logs
https://gist.github.com/ChaitanyaAtchuta5/7692ccd1e35e4b6706f8a0f20a570aaf
Whole Runner Pod Logs
Logs when not working as expected (Runner pod not getting terminated when job completes)
https://gist.github.com/ChaitanyaAtchuta5/881715bfec200f42de97adebec44e926
Logs when working as expected (Runner pod gets terminated when job completes)
https://gist.github.com/ChaitanyaAtchuta5/c847653cb266dcc1b010a427767a2d51
Additional Context
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Comments: 18 (5 by maintainers)
Thank you @ChaitanyaAtchuta5 this is very helpful, we’ll take a look!
@Link- I deleted the helm deployments of gha-runner-scale-set-controller & gha-runner-scale-set and installed them again freshly. Note:
After they got installed without any errors and made sure both the controller and listener pods were up and running. I have triggered a workflow. The new runner pod came up and completed the job. Then he runner pod went to terminating stage and got terminated. Suddenly a new runner pod got into init stage at this point of time there are no workflows running in my repo. After few seconds that runner pod also got terminated. Everything looks okay for now. Later, I went a head and re-ran the same workflow, the runner pod came up and completed the job as usual. But this time the runner pod was not terminated even after the job was completed, which is not an expected behavior.
Workflow status screenshot
Runner-set status from github
Kubectl output
Controller Logs (from setup to hitting the issue) https://gist.github.com/ChaitanyaAtchuta5/b579e55b710b6e98f0760b70442cbd7a
Listner logs (from setup to hitting the issue) https://gist.github.com/ChaitanyaAtchuta5/488ee8b8c1dbe09d9d40bedea243b859
Runner pod logs that not got terminated https://gist.github.com/ChaitanyaAtchuta5/9bff813e8be3b6a4dab655115c9582dc