actions-runner-controller: Too many GitHub's API requests reach the rate limit

Hi,

After deploying ~20 runners/pods I noticed that after half an hour they are up and running I received an error that appeared inside the controller-manager container saying I’ve reached the maximum of GitHub Apps’s rate limit of 5,000 requests per hour.

Why the actions-runner-controller API is making too many concurrent requests in such a short time to Github? Is it possible to reduce it significantly?

These are the settings I added to avoid this issue but they did not help:

kind: HorizontalRunnerAutoscaler
spec:
  scaleDownDelaySecondsAfterScaleOut: 60

And –sync-period=10m

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 1
Comments: 16 (6 by maintainers)

Most upvoted comments

Hello everyone,

I did a lot of tests yesterday, and i found that there is a big problem when you forget the replicas: 1 line on the RunnerDeployment specs !

The Autoscaler is working, but it’s consistently trying to scale down to 1 worker, wich is the number of replicas expected by the ReplicaSet…

This is really misleading because it’s kinda working, but it cause a lot of troubles and also a lot of unnecessary API Requests, because runners are scaling up and then instantly scaled down to 1 replica. It leads to have some Runners in “Offline” state, with some skipped jobs on Github Actions side, etc…

I suspect that some of thoses issues : https://github.com/summerwind/actions-runner-controller/issues/62 | https://github.com/summerwind/actions-runner-controller/issues/77 are related to this, because it’s exactly the behaviour that i’ve seen on my cluster :

Some pods are “Completed”, and Github Actions UI show them “Offline”, i’m not 100% sure if that’s related… Without “replicas: 1”, i didn’t experienced this case: Offline runners, etc …

So don’t use replicas: on your RunnerDeployment specs when you use an HorizontalRunnerAutoscaler ! It’s a common thing (since the classic HPA on Kubernetes is behaving in the same way) but i feel that it should maybe be a little bit explained on your documentation.

On the API Request limitation side, fixing this, gave me a lot more room to use small --sync-period values with a lot of Max Replicas on the HorizontalRunnerAutoscaler.

I did hit the limit by using 50 workers in parallel with a --sync-period=1m but after 40 et 45min wich is a lot better.

I’ll use your tip @mumoshu to track API Calls and find the sweet spot between max replicas and sync-period. I’m very attracted by this controller to minimize infrastructure cost by using a fast and responsive HorizontalRunnerAutoscaler, but for the moment we need to deal with Github API limitations …

I’ll give you more insights soon, my goal is to achieve 100 workers in parallel with a decent --sync-period (less than 10min…).

See ya

theobolo on Dec 20, 2020

Ok that’s working using Organization to my runnerdeployment spec, using repository scoped RunnerDeployment seems not working. I can’t use both within the RunnerDeployment (as excepted).

There is my RunnerDeployment + Horizontal witch is working now :

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: actions-runner-aos
  namespace: actions-runner-system
spec:
  replicas: 1
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: role
                operator: In
                values:
                - actions-runner
            topologyKey: "kubernetes.io/hostname"
      # repository: go-aos/aos-app
      organization: go-aos
      image: summerwind/actions-runner-dind:latest
      dockerdWithinRunnerContainer: true
      volumes:
      - emptyDir:
          medium: Memory
        name: runner-work    
      volumeMounts:
      - name: runner-work 
        mountPath: "/home/runner"
      env:
      - name: TZ
        value: Europe/Paris
      resources:
        limits:
          cpu: "7"
          memory: "28Gi"
        requests:
          cpu: "7"
          memory: "28Gi"
      workDir: /home/runner/work

---

apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: actions-runner-aos-autoscaler
  namespace: actions-runner-system
spec:
  scaleTargetRef:
    name: actions-runner-aos
  minReplicas: 1
  maxReplicas: 50
  scaleDownDelaySecondsAfterScaleOut: 60
  metrics:
  - type: PercentageRunnersBusy
    scaleUpThreshold: '0.75'
    scaleDownThreshold: '0.3'
    scaleUpFactor: '1.4'
    scaleDownFactor: '0.7'

And this one, witch is not :

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: actions-runner-aosforce
  namespace: actions-runner-system
spec:
  replicas: 1
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: role
                operator: In
                values:
                - actions-runner
            topologyKey: "kubernetes.io/hostname"
      repository: go-aos/aosforce-app
      image: summerwind/actions-runner-dind:latest
      dockerdWithinRunnerContainer: true
      volumes:
      - emptyDir:
          medium: Memory
        name: runner-work    
      volumeMounts:
      - name: runner-work 
        mountPath: "/home/runner"
      env:
      - name: TZ
        value: Europe/Paris
      resources:
        limits:
          cpu: "7"
          memory: "28Gi"
        requests:
          cpu: "7"
          memory: "28Gi"
      workDir: /home/runner/work

---

apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: actions-runner-aosforce-autoscaler
  namespace: actions-runner-system
spec:
  scaleTargetRef:
    name: actions-runner-aosforce
  minReplicas: 1
  maxReplicas: 5
  scaleDownDelaySecondsAfterScaleOut: 60
  metrics:
  - type: PercentageRunnersBusy
    scaleUpThreshold: '0.75'
    scaleDownThreshold: '0.3'
    scaleUpFactor: '1.4'
    scaleDownFactor: '0.7'

I’ll push testing a little bit tonight, i’m off for the moment. Thanks for your Quick answer @ZacharyBenamram

theobolo on Dec 18, 2020