kubernetes: HPA scheduling implementation doesn't scale well with custom/external metrics - all checks are sequential by one goroutine
What happened: I created an external metric that took roughly 100ms; then to see what happened if there were many of them, I created 1,000 HPAs using it. Usually every HPA checks every 15s, but this resulted in each HPA check checking every 120 seconds - a drift/delay of over 100 seconds.
What you expected to happen:
- Each HPA with a custom/external metric should only drift by maximum 100ms every 15s or delay a second every 3-5 minutes
- Other HPAs not using custom/external metrics should not be impacted by duration of custom/external metrics
- The quantity of deployments with HPAs should not degrade each HPAs ability to respond to signals requiring scaling up (spikes e.g.)
How to reproduce it (as minimally and precisely as possible):
- Create and install the https://github.com/kubernetes-sigs/custom-metrics-apiserver and alter its code to add a delay in the
GetExternalMetricmethod of the Testing Provider - Also add logging to the method with timestamps logrus supports timestamps, e.g.
- Create 1,000 HPA objects referring to that test metric - possibly 1,000 deployments as well, but one will work fine too even if not a real use case.
- Pick one of your HPA’s and watch for the time between its call to
GetExternalMetric.
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T21:51:49Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:04:18Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: Docker CE for Mac
- OS (e.g:
cat /etc/os-release): Mac OS Catalina 10.15.7 - Kernel (e.g.
uname -a): Darwin <REMOVED> 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64 - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 5
- Comments: 33 (11 by maintainers)
I’ve unfortunately independently discovered this very same problem with external metrics scaling very poorly, particularly under Keda querying prometheus.
Is there any particular reason that the solution cannot be a naive increase to the number of goroutines spawned to process the HPA objects concurrently on every 15 second loop execution? There was an acknowledgement above that an HPA object may itself hold multiple metrics to be calculated: so at the very least we would need to ensure that each metric on a given hpa object remains to be processed serially, but each HPA object should be independent such that they can be concurrently processed.
I’m shocked to see that the HPA loop is single threaded on an io bound operation, serially calculating each HPA in core Kubernetes.
@jjcaballero if you only have HPAs with CPU/memory metrics the effect should be less visible. It’s still serial, blocking calls to fetch the metrics, but the response times on these calls should be lower – the metrics-server component responds immediately with what it has in its cache as opposed to making a synchronous call to the custom metrics source (e.g. Prometheus).
I guess you meant “Other HPAs not using custom/external metrics should not be impacted…”