keda: Keda Operator very slow with 1088 scaled objects. (using RabbitMQ scaler) on AWS EKS 1.20

Report

Keda seems not be scaling well when there is many scaled objects. Seems it’s hitting a lot of rate limits against the kubernetes api also.

Seeing loads of

I1103 20:04:10.361163       1 request.go:601] Waited for 24.339252261s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/frank-casino/scaledobjects/crm-integration-producer/status

And with 1088 scaled objects and rate limits hitting up to 30seconds per check it will take forever and it will always be behind.

We are using idleReplicaCount: 0

The spec of the scaledObjects look something like this:

    "spec": {
        "advanced": {
            "restoreToOriginalReplicaCount": true
        },
        "cooldownPeriod": 300,
        "idleReplicaCount": 0,
        "maxReplicaCount": 1,
        "minReplicaCount": 1,
        "pollingInterval": 5,
        "scaleTargetRef": {
            "name": "action-history-offloader"
        },
        "triggers": [
            {
                "metadata": {
                    "hostFromEnv": "AMQP_URL",
                    "queueLength": "200",
                    "queueName": "testing.actions.history-offloader"
                },
                "type": "rabbitmq"
            }
        ]
    },

Expected Behavior

Reconile all scaled objects within a few seconds.

Actual Behavior

The scaling is very slow and takes hours and will never get into a “state” that is valid cause queues are getting consumed etc.

Steps to Reproduce the Problem

  1. Many scaled objects
  2. RabbitMQ scaler
  3. EKS

Logs from KEDA operator

I1103 20:10:51.561092       1 request.go:601] Waited for 24.460295516s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/emwys/scaledobjects/crm-integration-proxy/status
2022-11-03T20:10:52Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"webhook","namespace":"{replace}"}, "namespace": "{replace}", "name": "webhook", "reconcileID": "05981953-6823-469b-9aba-c1df13106368"}
2022-11-03T20:10:53Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-api","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-api", "reconcileID": "69e2b6ee-9821-48eb-b6e2-9bd8d7e945c7"}
2022-11-03T20:10:53Z	INFO	Updated HPA according to ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-producer","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-producer", "reconcileID": "8db944be-ad8d-4725-8475-ab88036d2c36", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-producer"}
2022-11-03T20:10:54Z	INFO	Updated HPA according to ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-producer","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-producer", "reconcileID": "f2327bbe-0907-48f4-9f0e-f7d02ac53103", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-producer"}
2022-11-03T20:10:55Z	INFO	Updated HPA according to ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-producer","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-producer", "reconcileID": "7bbd144c-9b8c-421e-abe9-d41c058e5fe3", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-producer"}
2022-11-03T20:10:55Z	INFO	Updated HPA according to ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-proxy","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-proxy", "reconcileID": "8ca07029-044b-4ff1-a8b1-0702ae136fd7", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-proxy"}
2022-11-03T20:10:56Z	INFO	Updated HPA according to ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-proxy","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-proxy", "reconcileID": "4e3febaf-421c-4695-820f-edd47b4d5e0d", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-proxy"}
2022-11-03T20:10:56Z	INFO	Updated HPA according to ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"pathfinder","namespace":"{replace}"}, "namespace": "{replace}", "name": "pathfinder", "reconcileID": "c9f22613-ab66-43f4-a96a-35ffc0bdb3d7", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-pathfinder"}
2022-11-03T20:10:56Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-proxy","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-proxy", "reconcileID": "0af816dc-b226-4230-8f84-5b49fe125e7c"}
2022-11-03T20:10:57Z	INFO	Updated HPA according to ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-producer","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-producer", "reconcileID": "74c6ea3f-c1b5-4cee-ba38-d3f23a0ee41e", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-producer"}
2022-11-03T20:10:58Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-proxy","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-proxy", "reconcileID": "85bbae72-8d4d-4d32-a58c-8b488cafd598"}
2022-11-03T20:10:58Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"api-exposer","namespace":"{replace}"}, "namespace": "{replace}", "name": "api-exposer", "reconcileID": "26b1d5da-b4d6-4cf0-bf0b-e4b528201f78"}
2022-11-03T20:10:58Z	INFO	Updated HPA according to ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-api","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-api", "reconcileID": "4936a251-6fe5-4cc1-afeb-3bf99967b3e7", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-api"}
2022-11-03T20:10:59Z	INFO	Updated HPA according to ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"media-api","namespace":"{replace}"}, "namespace": "{replace}", "name": "media-api", "reconcileID": "b8906b37-b819-4308-9660-f5ccb573ecb8", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-media-api"}
2022-11-03T20:10:59Z	INFO	Updated HPA according to ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-producer","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-producer", "reconcileID": "9f5ca252-b3f6-4939-bf1f-f60181ac7197", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-producer"}
2022-11-03T20:11:00Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"pathfinder","namespace":"{replace}"}, "namespace": "{replace}", "name": "pathfinder", "reconcileID": "1bb5aa39-b627-4dbc-abfb-73e323345c38"}
2022-11-03T20:11:00Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"analytics-api","namespace":"{replace}"}, "namespace": "{replace}", "name": "analytics-api", "reconcileID": "acff7385-50a8-48b1-a4a6-ad4d1d01f138"}
2022-11-03T20:11:00Z	INFO	Updated HPA according to ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"pathfinder","namespace":"{replace}"}, "namespace": "{replace}", "name": "pathfinder", "reconcileID": "9b5ad69b-27db-4663-b7c2-9d7d6a799233", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-pathfinder"}
2022-11-03T20:11:01Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"media-api","namespace":"{replace}"}, "namespace": "{replace}", "name": "media-api", "reconcileID": "12fc02f9-118c-4934-880d-9cdf1673ee74"}
I1103 20:11:01.610906       1 request.go:601] Waited for 24.339640566s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/{replaced}/scaledobjects/pathfinder/status
2022-11-03T20:11:01Z	INFO	Updated HPA according to ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"analytics-api","namespace":"{replace}"}, "namespace": "{replace}", "name": "analytics-api", "reconcileID": "a0181882-ee1a-46ef-9184-3eaa5f80ac4f", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-analytics-api"}

KEDA Version

2.8.1

Kubernetes Version

< 1.23

Platform

Amazon Web Services

Scaler Details

RabbitMQ Scaler

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (11 by maintainers)

Most upvoted comments

@Tazer would you mind sharing the settings that you used? We might think about redefining the defaults, maybe?

@JorTurFer so it helped a lot.

So going from hours to reconcile to around 4-5 minutes. It’s still a bit too slow but a great improvement. But still now it “works”. So really looking forward for those performance improvements also.

mmm… you are right, v2.8.1 doesn’t support setting it in the operator, next release will support it because the changes are already on main branch, if you need them you could use main tag (or even better, pull it and push to your own registry to freeze the changes because main isn’t stable branch).

If your major blocker is with 1 > 0 > 1, you could spawn an operator per namespace, the problem is that I think the chart isn’t ready for that. The blocker for namespacing KEDA is the metrics server because k8s only supports one metrics server per metrics api, but you could have more operators.

I’d try before increasing the api rates, because it’s faster than deploy an operator per namespace… It’s true that 20 request per second to api server could be a small value for huge clusters, but we set the “safest” values to protect API servers in smaller scenarios, I guess you have hundreds of nodes so your API Server scale should be higher

Hello @Tazer Sadly, we have detected performances issues in huge clusters, this is a work in progress, but I don’t have any ETA for it. I think that will be ready soon, but IDK