keda: Keda Operator very slow with 1088 scaled objects. (using RabbitMQ scaler) on AWS EKS 1.20
Report
Keda seems not be scaling well when there is many scaled objects. Seems it’s hitting a lot of rate limits against the kubernetes api also.
Seeing loads of
I1103 20:04:10.361163 1 request.go:601] Waited for 24.339252261s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/frank-casino/scaledobjects/crm-integration-producer/status
And with 1088 scaled objects and rate limits hitting up to 30seconds per check it will take forever and it will always be behind.
We are using idleReplicaCount: 0
The spec of the scaledObjects look something like this:
"spec": {
"advanced": {
"restoreToOriginalReplicaCount": true
},
"cooldownPeriod": 300,
"idleReplicaCount": 0,
"maxReplicaCount": 1,
"minReplicaCount": 1,
"pollingInterval": 5,
"scaleTargetRef": {
"name": "action-history-offloader"
},
"triggers": [
{
"metadata": {
"hostFromEnv": "AMQP_URL",
"queueLength": "200",
"queueName": "testing.actions.history-offloader"
},
"type": "rabbitmq"
}
]
},
Expected Behavior
Reconile all scaled objects within a few seconds.
Actual Behavior
The scaling is very slow and takes hours and will never get into a “state” that is valid cause queues are getting consumed etc.
Steps to Reproduce the Problem
- Many scaled objects
- RabbitMQ scaler
- EKS
Logs from KEDA operator
I1103 20:10:51.561092 1 request.go:601] Waited for 24.460295516s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/emwys/scaledobjects/crm-integration-proxy/status
2022-11-03T20:10:52Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"webhook","namespace":"{replace}"}, "namespace": "{replace}", "name": "webhook", "reconcileID": "05981953-6823-469b-9aba-c1df13106368"}
2022-11-03T20:10:53Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-api","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-api", "reconcileID": "69e2b6ee-9821-48eb-b6e2-9bd8d7e945c7"}
2022-11-03T20:10:53Z INFO Updated HPA according to ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-producer","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-producer", "reconcileID": "8db944be-ad8d-4725-8475-ab88036d2c36", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-producer"}
2022-11-03T20:10:54Z INFO Updated HPA according to ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-producer","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-producer", "reconcileID": "f2327bbe-0907-48f4-9f0e-f7d02ac53103", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-producer"}
2022-11-03T20:10:55Z INFO Updated HPA according to ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-producer","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-producer", "reconcileID": "7bbd144c-9b8c-421e-abe9-d41c058e5fe3", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-producer"}
2022-11-03T20:10:55Z INFO Updated HPA according to ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-proxy","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-proxy", "reconcileID": "8ca07029-044b-4ff1-a8b1-0702ae136fd7", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-proxy"}
2022-11-03T20:10:56Z INFO Updated HPA according to ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-proxy","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-proxy", "reconcileID": "4e3febaf-421c-4695-820f-edd47b4d5e0d", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-proxy"}
2022-11-03T20:10:56Z INFO Updated HPA according to ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"pathfinder","namespace":"{replace}"}, "namespace": "{replace}", "name": "pathfinder", "reconcileID": "c9f22613-ab66-43f4-a96a-35ffc0bdb3d7", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-pathfinder"}
2022-11-03T20:10:56Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-proxy","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-proxy", "reconcileID": "0af816dc-b226-4230-8f84-5b49fe125e7c"}
2022-11-03T20:10:57Z INFO Updated HPA according to ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-producer","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-producer", "reconcileID": "74c6ea3f-c1b5-4cee-ba38-d3f23a0ee41e", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-producer"}
2022-11-03T20:10:58Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-proxy","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-proxy", "reconcileID": "85bbae72-8d4d-4d32-a58c-8b488cafd598"}
2022-11-03T20:10:58Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"api-exposer","namespace":"{replace}"}, "namespace": "{replace}", "name": "api-exposer", "reconcileID": "26b1d5da-b4d6-4cf0-bf0b-e4b528201f78"}
2022-11-03T20:10:58Z INFO Updated HPA according to ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-api","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-api", "reconcileID": "4936a251-6fe5-4cc1-afeb-3bf99967b3e7", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-api"}
2022-11-03T20:10:59Z INFO Updated HPA according to ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"media-api","namespace":"{replace}"}, "namespace": "{replace}", "name": "media-api", "reconcileID": "b8906b37-b819-4308-9660-f5ccb573ecb8", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-media-api"}
2022-11-03T20:10:59Z INFO Updated HPA according to ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"crm-integration-producer","namespace":"{replace}"}, "namespace": "{replace}", "name": "crm-integration-producer", "reconcileID": "9f5ca252-b3f6-4939-bf1f-f60181ac7197", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-crm-integration-producer"}
2022-11-03T20:11:00Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"pathfinder","namespace":"{replace}"}, "namespace": "{replace}", "name": "pathfinder", "reconcileID": "1bb5aa39-b627-4dbc-abfb-73e323345c38"}
2022-11-03T20:11:00Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"analytics-api","namespace":"{replace}"}, "namespace": "{replace}", "name": "analytics-api", "reconcileID": "acff7385-50a8-48b1-a4a6-ad4d1d01f138"}
2022-11-03T20:11:00Z INFO Updated HPA according to ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"pathfinder","namespace":"{replace}"}, "namespace": "{replace}", "name": "pathfinder", "reconcileID": "9b5ad69b-27db-4663-b7c2-9d7d6a799233", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-pathfinder"}
2022-11-03T20:11:01Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"media-api","namespace":"{replace}"}, "namespace": "{replace}", "name": "media-api", "reconcileID": "12fc02f9-118c-4934-880d-9cdf1673ee74"}
I1103 20:11:01.610906 1 request.go:601] Waited for 24.339640566s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/{replaced}/scaledobjects/pathfinder/status
2022-11-03T20:11:01Z INFO Updated HPA according to ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"analytics-api","namespace":"{replace}"}, "namespace": "{replace}", "name": "analytics-api", "reconcileID": "a0181882-ee1a-46ef-9184-3eaa5f80ac4f", "HPA.Namespace": "{replace}" "HPA.Name": "keda-hpa-analytics-api"}
KEDA Version
2.8.1
Kubernetes Version
< 1.23
Platform
Amazon Web Services
Scaler Details
RabbitMQ Scaler
Anything else?
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (11 by maintainers)
@Tazer would you mind sharing the settings that you used? We might think about redefining the defaults, maybe?
@JorTurFer so it helped a lot.
So going from hours to reconcile to around 4-5 minutes. It’s still a bit too slow but a great improvement. But still now it “works”. So really looking forward for those performance improvements also.
mmm… you are right, v2.8.1 doesn’t support setting it in the operator, next release will support it because the changes are already on main branch, if you need them you could use main tag (or even better, pull it and push to your own registry to freeze the changes because main isn’t stable branch).
If your major blocker is with 1 > 0 > 1, you could spawn an operator per namespace, the problem is that I think the chart isn’t ready for that. The blocker for namespacing KEDA is the metrics server because k8s only supports one metrics server per metrics api, but you could have more operators.
I’d try before increasing the api rates, because it’s faster than deploy an operator per namespace… It’s true that 20 request per second to api server could be a small value for huge clusters, but we set the “safest” values to protect API servers in smaller scenarios, I guess you have hundreds of nodes so your API Server scale should be higher
Hello @Tazer Sadly, we have detected performances issues in huge clusters, this is a work in progress, but I don’t have any ETA for it. I think that will be ready soon, but IDK