actions-runner-controller: [Github Webhook HRA] Not able to get it working...

Hello everyone,

Since my scaling problems exposed on this issue https://github.com/summerwind/actions-runner-controller/issues/206, I’ve found an efficient workaround…

I’m using multiple kubernetes clusters (5 actually) with one github actions controller deployed on each one.

  • Each controller is managing a pool of 20 workers, autoscaled using the - type: PercentageRunnersBusy method.
  • Each controller is using a unique Github APP for Github API auth. wich gives me approx. 6700 API calls per hour on each clusters.
  • Each controller have a sync-period configured on 1m

It’s working well, and it was the only solution i found to be able to run 100 runners concurently with the action runner controller.

@mumoshu

Btw, that’s not why i’m here today. Since i’ve seen the new Github Webhook HRA feature, i absolutely need it to stop doing this kind of workaround and to be able to use the controller “at scale”.

Unfortunately, i’m not able to get it working using the last Helm chart version 0.7.0. I tried with : latest/v0.17.0/canary versions of the controller-image, and i’m using the ‘master’ branch CRDs.

When i declare the HRA like this :

apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: actions-runner-aos-autoscaler
  namespace: default
spec:
  scaleTargetRef:
    name: actions-runner-aos
  minReplicas: 1
  maxReplicas: 10
  scaleUpTriggers:
  - githubEvent:
      checkRun:
        types: ["created"]
        status: "queued"
    amount: 1
    duration: "5m"

The github-actions-controller is crashing with this log :

2021-03-08T14:40:39.333Z INFO controller-runtime.metrics metrics server is starting to listen {"addr": "127.0.0.1:8080"}
2021-03-08T14:40:39.333Z INFO controller-runtime.builder Registering a mutating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=Runner", "path": "/mutate-actions-summerwind-dev-v1alpha1-runner"}
2021-03-08T14:40:39.333Z INFO controller-runtime.webhook registering webhook {"path": "/mutate-actions-summerwind-dev-v1alpha1-runner"}
2021-03-08T14:40:39.333Z INFO controller-runtime.builder Registering a validating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=Runner", "path": "/validate-actions-summerwind-dev-v1alpha1-runner"}
2021-03-08T14:40:39.333Z INFO controller-runtime.webhook registering webhook {"path": "/validate-actions-summerwind-dev-v1alpha1-runner"}
2021-03-08T14:40:39.333Z INFO controller-runtime.builder Registering a mutating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerDeployment", "path": "/mutate-actions-summerwind-dev-v1alpha1-runnerdeployment"}
2021-03-08T14:40:39.333Z INFO controller-runtime.webhook registering webhook {"path": "/mutate-actions-summerwind-dev-v1alpha1-runnerdeployment"}
2021-03-08T14:40:39.333Z INFO controller-runtime.builder Registering a validating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerDeployment", "path": "/validate-actions-summerwind-dev-v1alpha1-runnerdeployment"}
2021-03-08T14:40:39.333Z INFO controller-runtime.webhook registering webhook {"path": "/validate-actions-summerwind-dev-v1alpha1-runnerdeployment"}
2021-03-08T14:40:39.333Z INFO controller-runtime.builder Registering a mutating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerReplicaSet", "path": "/mutate-actions-summerwind-dev-v1alpha1-runnerreplicaset"}
2021-03-08T14:40:39.333Z INFO controller-runtime.webhook registering webhook {"path": "/mutate-actions-summerwind-dev-v1alpha1-runnerreplicaset"}
2021-03-08T14:40:39.333Z INFO controller-runtime.builder Registering a validating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerReplicaSet", "path": "/validate-actions-summerwind-dev-v1alpha1-runnerreplicaset"}
2021-03-08T14:40:39.334Z INFO controller-runtime.webhook registering webhook {"path": "/validate-actions-summerwind-dev-v1alpha1-runnerreplicaset"}
2021-03-08T14:40:39.334Z INFO setup starting manager
2021-03-08T14:40:39.334Z INFO controller-runtime.manager starting metrics server {"path": "/metrics"}
2021-03-08T14:40:39.435Z INFO controller-runtime.webhook.webhooks starting webhook server
2021-03-08T14:40:39.435Z INFO controller-runtime.certwatcher Updated current TLS certificate
2021-03-08T14:40:39.435Z INFO controller-runtime.webhook serving webhook server {"host": "", "port": 9443}
2021-03-08T14:40:39.436Z INFO controller-runtime.certwatcher Starting certificate watcher
2021-03-08T14:40:56.134Z DEBUG controller-runtime.manager.events Normal {"object": {"kind":"ConfigMap","namespace":"default","name":"controller-leader-election-helper","uid":"900760ed-cad7-435b-964f-e3694c664fbe","apiVersion":"v1","resourceVersion":"5323021"}, "reason": "LeaderElection", "message": "actions-controller-actions-runner-controller-554966bb8b-lbwvt_6caf86f4-a576-4e77-b0c5-51d19c018b26 became leader"}
2021-03-08T14:40:56.134Z INFO controller-runtime.controller Starting EventSource {"controller": "horizontalrunnerautoscaler", "source": "kind source: /, Kind="}
2021-03-08T14:40:56.134Z INFO controller-runtime.controller Starting EventSource {"controller": "runner", "source": "kind source: /, Kind="}
2021-03-08T14:40:56.134Z INFO controller-runtime.controller Starting EventSource {"controller": "runnerreplicaset", "source": "kind source: /, Kind="}
2021-03-08T14:40:56.134Z INFO controller-runtime.controller Starting EventSource {"controller": "runnerreplicaset", "source": "kind source: /, Kind="}
2021-03-08T14:40:56.135Z INFO controller-runtime.controller Starting EventSource {"controller": "runnerdeployment", "source": "kind source: /, Kind="}
2021-03-08T14:40:56.234Z INFO controller-runtime.controller Starting Controller {"controller": "horizontalrunnerautoscaler"}
2021-03-08T14:40:56.234Z INFO controller-runtime.controller Starting EventSource {"controller": "runner", "source": "kind source: /, Kind="}
2021-03-08T14:40:56.235Z INFO controller-runtime.controller Starting Controller {"controller": "runnerreplicaset"}
2021-03-08T14:40:56.235Z INFO controller-runtime.controller Starting EventSource {"controller": "runnerdeployment", "source": "kind source: /, Kind="}
2021-03-08T14:40:56.235Z INFO controller-runtime.controller Starting Controller {"controller": "runnerdeployment"}
2021-03-08T14:40:56.335Z INFO controller-runtime.controller Starting workers {"controller": "runnerreplicaset", "worker count": 1}
2021-03-08T14:40:56.335Z INFO controllers.RunnerReplicaSet debug {"runnerreplicaset": "default/actions-runner-aos-h9ppg", "desired": 1, "available": 1}
2021-03-08T14:40:56.335Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "runnerreplicaset", "request": "default/actions-runner-aos-h9ppg"}
2021-03-08T14:40:56.336Z INFO controller-runtime.controller Starting Controller {"controller": "runner"}
2021-03-08T14:40:56.335Z INFO controller-runtime.controller Starting workers {"controller": "horizontalrunnerautoscaler", "worker count": 1}
E0308 14:40:56.336609 1 runtime.go:78] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
goroutine 343 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x15aabe0, 0xc00027ed80)
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/runtime/runtime.go:74 +0xa6
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/runtime/runtime.go:48 +0x89
panic(0x15aabe0, 0xc00027ed80)
/usr/local/go/src/runtime/panic.go:969 +0x1b9
github.com/summerwind/actions-runner-controller/controllers.(*HorizontalRunnerAutoscalerReconciler).calculateReplicasByQueuedAndInProgressWorkflowRuns(0xc0002e2ac0, 0x13d38ba, 0x10, 0xc00027ed60, 0x1f, 0xc000704ca0, 0x12, 0x0, 0x0, 0xc00071cb40, ...)
/workspace/controllers/autoscaling.go:50 +0xe7e
github.com/summerwind/actions-runner-controller/controllers.(*HorizontalRunnerAutoscalerReconciler).determineDesiredReplicas(0xc0002e2ac0, 0x13d38ba, 0x10, 0xc00027ed60, 0x1f, 0xc000704ca0, 0x12, 0x0, 0x0, 0xc00071cb40, ...)
/workspace/controllers/autoscaling.go:31 +0xb8
github.com/summerwind/actions-runner-controller/controllers.(*HorizontalRunnerAutoscalerReconciler).computeReplicas(0xc0002e2ac0, 0x13d38ba, 0x10, 0xc00027ed60, 0x1f, 0xc000704ca0, 0x12, 0x0, 0x0, 0xc00071cb40, ...)
/workspace/controllers/horizontalrunnerautoscaler_controller.go:142 +0x7b
github.com/summerwind/actions-runner-controller/controllers.(*HorizontalRunnerAutoscalerReconciler).Reconcile(0xc0002e2ac0, 0xc00017a7e0, 0x7, 0xc00027f9e0, 0x1d, 0x428f095d4, 0xc000558cf0, 0xc0002d27e8, 0xc0002d27e0)

I tried to delete :

  minReplicas: 1
  maxReplicas: 10

to follow the README.md exemple, but the controller is not happy either and keeps saying to add minReplicas and maxReplicas to work.

I know that this feature is in early stage, so it won’t be suprised if this is not working yet, just wanted to be sure that you are aware of this 😄

👍

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 39 (6 by maintainers)

Commits related to this issue

Most upvoted comments

@mumoshu Hello Yusuke, i’ve tested heavily the new watch-namespace feature you’ve implemented.

I can say that’s working very well 😉 I’ve launched 1 cluster with 5 namespaced controllers, each one in charge of 20 runners, with a sync-period of 1m. And … it’s amazing, that’s it 😄 no more to say except thank you a lot again.

By the way, i’m gonna test i little bit more the Github Webhook HRA on my side to find the best Autoscaling mecanism for my case. I think you’r right, the PercentageRunnersBusy is fitting well for me, with multiple watch-namespace controller.

image

Now i’m able to scale up to 100 runners constantly without any Github API limitation, with 1 cluster ! 🥇

I had the same issue, the HorizontalRunnerAutoscaler still requires one of the “normal” scaling types under metrics as well as scaleUpTriggers.

this works for me:

apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: runner-autoscaler
  namespace: actions
spec:
  minReplicas: 1
  maxReplicas: 3
  scaleTargetRef:
    name: runner
  metrics:
  - type: PercentageRunnersBusy
    scaleUpThreshold: '1'       
    scaleDownThreshold: '0.5'
    ScaleUpAdjustment: '1'
    ScaleDownAdjustment: '1' 
  scaleUpTriggers:
  - githubEvent:
      checkRun:
        types: ["created"]
        status: "queued"
    amount: 1
    duration: "5m"

@mumoshu It’s working again 😄 ! Thanks a lot, let’s test now 🚀

omitting metrics result in the use of TotalNumberOfQueuedAndInProgressWorkflowRuns metric

I got to think this is confusing and there’s no actual benefit making it the default behavior. I’ve change the controller code, and since #391, omitting Metrics[] just result in ScaleUpTriggers[] being used alone. Doing so, the controller would completely skip GitHub API calls for autoscaling, which alleviates the rate-limit issue!

@theobolo Thanks! FYI, I’ve just merged #386 and the canary tag will be updated soon.

@theobolo I now believe it was due to a regression introduced in #355. #386 should fix it.

@avdhoot The fix should be available in the current canary image. Would you mind giving it a shot?

@avdhoot No. But, omitting metrics result in the use of TotalNumberOfQueuedAndInProgressWorkflowRuns metric

https://github.com/summerwind/actions-runner-controller/blob/4fa53153111489691c57cee9cd11fdafb9e3d5bd/controllers/autoscaling.go#L75

Also, minReplicas and maxReplicas are required regardless of you configure metrics or not (https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-792855412).