actions-runner-controller: RunnerSet Cannot Deploy to Different Namespaces in the same Cluster

Unlike RunnerDeployments, RunnerSets cannot be deployed in the same Cluster even while being in a different Namespace.

I was previously able to successfully deploy RunnerDeployments to the same Clusters all with their own controllers and had no issues with them having shared values (different helm release names - see 782).. Now, I’m deploying RunnerSet and am seeing an issue where the API call is being shared from a separate namespace, even with watchNamespace set to the namespace of the controller.

Currently running helm chart version 0.13.2 and actions-runner-controller version 0.20.2

To reproduce, create a namespace, deploy a controller with a unique release name, deploy a RunnerSet to that controller. Create another namespace, deploy a controller with a unique release name AND a unique GitHub App installation ID, OR (for my case) a PAT for GitHub Enterprise Cloud/GitHub Enterprise Server (whichever you want, in my case I have two RunnerSets for GHES for two separate organizations using the same PAT, when I try to deploy to our organization in GHEC and use a new PAT there for that instance, I am seeing the error:

create Pod xx-ghec-runnerset-0 in StatefulSet xx-ghec-runnerset failed error: admission webhook "mutate-runner-pod.webhook.actions.summerwind.dev" denied the request: failed to create registration token: POST https://github.xxx.com/api/v3/orgs/xxx/actions/runners/registration-token: 404 Not Found []

^ the API call above should be pointed to github.com, but instead is pointed to my GHES URL, when in my values.yml I did not specify any GitHub Enterprise Server URL:

env:
  GITHUB_ENTERPRISE_URL: ""

I expect to be able to deploy RunnerSet the same way we can deploy RunnerDeployments to different Namespaces within the same Cluster.

I’m moving to RunnerSets for now as it has the ability to accept dnsPolicy: Default which solves timeout errors we were seeing when deploying to our Kubernetes cluster. Would like to go back to RunnerDeployments if the dnsPolicy behavior becomes available there as we want to utilize autoscaling.

If this functionality will not be supported for RunnerSets and you guys recommend to only use RunnerDeployments please let me know. Thanks!

About this issue

Commits related to this issue

Most upvoted comments

@rxa313 Also, I bet PercentageRunnersBusy and TotalNumberOfQueuedAndInProgressWorkflowRuns wont be your solution.

Instead, you’d better file a feature request to GitHub so that they can add something like “List Workflow Jobs” API whose response includes all the jobs that are queued but missing runners and waiting for new runners to be available. The response should also include runner labels for missing runners of the queued jobs.

If we had such API, it’s very easy to enhance HRA to provide an easy-to-configure autoscale functionality like you have today with workflow_job webhook events, without webhook.

@mumoshu

We don’t yet have a container image build and a Helm chart support for that component so you’d need to built your own container image for the forwarder and a K8s deployment to deploy it onto your cluster.

Is there plans to release this feature in near future? I’d love to test it out without having to create my own images in my cluster 😃

Thanks for all the info.

one HRA/webhook managing two separate RunnerSets

No, that’s not possible! You need to configure one HRA per RunnerSet, because the controller maps one “queued” webhook event to an HRA, and then maps the HRA to a RunnerSet or a RunnerDeployment.

Basically, instead of GHEC’s webhook reaching in to our internal Rancher environment to send information; is the reverse possible, where Rancher reaches out to GHEC and checks for workflow_job/other webhook events?

@rxa313 I believe a short-term solution would be our github webhook delivery forwarder I’ve built in https://github.com/actions-runner-controller/actions-runner-controller/pull/682.

We don’t yet have a container image build and a Helm chart support for that component so you’d need to build your own container image for the forwarder and a K8s deployment to deploy it onto your cluster.

@rxa313 you’ll know if 3.3 has the fix in because the routing logic documentation for Enterprise Server 3.3 will match the logic in the Enterprise Cloud version of the article (top right corner at the top of the page). You can see Enterprise Server 3.2 has the old routing logic still and so registration runners are needed to scale from zero with =< 3.2.

@rxa313 Hey! Thanks for testing it out. Apparently I made a typo when implementing scaleDownDelaySecondsAfterScaleOut 😅

Re no runners found with label errors, I believe it happens only on GHES(GitHub Enterprise Server) but GitHub Cloud.

For GitHub Enterprise, you’ll probably receive a fix on GitHub Enterprise Server 3.3(I think that’s the next version). Until then, you’d need to use RunnerDeployment to avoid no runners found with label error on GHES.

Summary: Scale-from-zero works with…

Scale-from-zero doesn’t work with…

  • GHES 3.2 + RunnerSet (GHES 3.2 doesn’t support scale-from-zero out-of-box and RunnerSet doesn’t support registration-only runners