kubernetes: HPA doesn't scale down to minReplicas even though metric is under target

What happened:

HPA scales to Spec.MaxReplicas even though metric is always under target.

Here’s the HPA in YAML:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  annotations:
    autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2019-06-06T10:46:13Z","reason":"ReadyForNewScale","message":"recommended
      size matches current size"},{"type":"ScalingActive","status":"True","lastTransitionTime":"2019-06-06T10:46:13Z","reason":"ValidMetricFound","message":"the
      HPA was able to successfully calculate a replica count from cpu resource utilization
      (percentage of request)"},{"type":"ScalingLimited","status":"True","lastTransitionTime":"2019-06-06T10:46:13Z","reason":"TooManyReplicas","message":"the
      desired replica count is more than the maximum replica count"}]'
    autoscaling.alpha.kubernetes.io/current-metrics: '[{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":0,"currentAverageValue":"9m"}}]'
  creationTimestamp: "2019-06-06T10:45:58Z"
  name: my-app-1
  namespace: default
  resourceVersion: "55041251"
  selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/my-app-1
  uid: 44fedc1a-8848-11e9-8465-025acf90d81e
spec:
  maxReplicas: 4
  minReplicas: 2
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: my-app-1
  targetCPUUtilizationPercentage: 40
status:
  currentCPUUtilizationPercentage: 0
  currentReplicas: 4
  desiredReplicas: 4

And here’s a description output:

$ kubectl describe hpa my-app-1
  Name:                                                  my-app-1
  Namespace:                                             default
  Labels:                                                <none>
  Annotations:                                           <none>
  CreationTimestamp:                                     Thu, 06 Jun 2019 12:45:58 +0200
  Reference:                                             Deployment/my-app-1
  Metrics:                                               ( current / target )
    resource cpu on pods  (as a percentage of request):  0% (9m) / 40%
  Min replicas:                                          2
  Max replicas:                                          4
  Deployment pods:                                       4 current / 4 desired
  Conditions:
    Type            Status  Reason            Message
    ----            ------  ------            -------
    AbleToScale     True    ReadyForNewScale  recommended size matches current size
    ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
    ScalingLimited  True    TooManyReplicas   the desired replica count is more than the maximum replica count
  Events:           <none>

What you expected to happen:

HPA only scales up when metric is above target and scales down when under until Spec.MinReplicas is reached.

How to reproduce it (as minimally and precisely as possible):

I’m not sure. We have 9 HPAs and only one has this problem. I can’t see anything unique about this HPA when comparing to the others. If I delete and recreate the HPA using Helm, same problem. Also if I recreate the HPA using kubectl autoscale Deployment/my-app-1 --min=2 --max=4 --cpu-percent=40 same problem.

Environment:

  • Kubernetes version (use kubectl version): v1.12.6-eks-d69f1b
  • Cloud provider or hardware configuration: AWS EKS
  • OS (e.g: cat /etc/os-release): EKS AMI release v20190327
  • Kernel (e.g. uname -a): 4.14.104-95.84.amzn2.x86_64
  • Network plugin and version (if this is a network-related bug): AWS CNI
  • Metrics-server version: 0.3.2

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 87
  • Comments: 113 (5 by maintainers)

Commits related to this issue

Most upvoted comments

Ok, i think i finally know whats going on, it seems that many people are facing this issue, and since down scaling is important for our model i decided to run some tests, turns out the problem is in the calculation done by the HorizontalPodAutoscaler controller: desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

Ansatz: this equation when scaling down works when the change in utilization due to load difference is big, usually with cpu ( e.g. 100m - 500m <=> 20% - 100%), but it fails when the change in utilization is small, usually with memory (e.g. 160Mi - 200Mi <=> 80% - 100%)

Theoretical example: lets say we have a situation where we have a deployment with resources.requests.memory is set to 200Mi, and HPA averageUtilization for memory resource is set to 100%(=200Mi). because of high memory usage pods were scaled up from 1 to 4, after scaling the memory usage dropped down from 200Mi to 160Mi, now we have 4 pods with avg. of 160Mi, HPA is showing: 80% / 100% since we demanded scale up on 100% usage of requested 200Mi, current usage = 160Mi -> 160/200 = 0.8 or 80%. now the HPA will run the equation above and see what to do, so: desiredReplicas = ceil[4 * ( 80 / 100 )] = ceil[3.2] = 4 and there are already 4 pods, so it will not act even though the metric is under avg. of targets.

  • if you walk through the same example with 5 pods instead of 4 you’ll see that at the first iteration the result = 4 too, and it would actually scale down to 4.
  • if you do it with CPU parameters the result would be 1 at first calculation, needles to say that idle pods are usually between 5m - 10m, much less than 100m and difference is much larger.

Empirical example: Setup: we have the namespace “1” (just for ease of typing), in it there are 3 deployments 1 - rails pod 2 - redis pod 3 - postgresql pod testing is done on rails pod where it can be scaled up from 1 to 5 pods when memory usage is high.

in rails deployment yml file:

...
          resources:
            limits:
              cpu: 500m
              memory: 300Mi
            requests:
              cpu: 90m
              memory: 200Mi
...

rails HPA

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: rails-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: rails-dep
  minReplicas: 1
  maxReplicas: 5
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Pods
        value: 1
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 30
      policies:
      - type: Pods
        value: 1
        periodSeconds: 15
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 50

Testing: first we have the setup with the rails dep, without the HPA, as you can see its idle at 4m & 158Mi, which means when we deploy the HPA with averageUtilization: 50 it would start scaling up Screen Shot 2022-03-22 at 2 24 04

deploying HPA Screen Shot 2022-03-22 at 2 31 44

after few minutes is scales up to 5 pods (memory still above 50% but it reaches the max limit of 5 pods) Screen Shot 2022-03-22 at 2 33 54


------------Mathematical reasoning for those who are interested, you can skip and use the equation------------ now lets calculate what is the limit needed for HPA to scale it down and re create the HPA with different limits to test the theory, to do that we need express the desiredMetricValue in terms of the other values by re arranging: desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )] or: desiredReplicas = ceil[(currentReplicas * currentMetricValue) / desiredMetricValue ] its possible to replace the RHS with: (currentReplicas * currentMetricValue) / (desiredMetricValue - epsilon) (drop ceil function and subtract a quantity from denominator) if the following conditions are satisfied:

  1. (currentReplicas * currentMetricValue) / (desiredMetricValue - epsilon) = desiredReplicas
    • where desiredReplicas is a positive integer, and epsilon is a positive number or 0
  2. [(currentReplicas * currentMetricValue) / (desiredMetricValue - epsilon)] - [(currentReplicas * currentMetricValue) / desiredMetricValue] = y
    • where 0 =< y < 1

in simple words the replacement of ceil function is to subtract a number (epsilon) from the denominator so that the division yields an integer and the difference between that integer, and result without subtracting epsilon in less than 1. so now we have: desiredReplicas = (currentReplicas * currentMetricValue) / (desiredMetricValue - epsilon) rearranging: desiredMetricValue = [(currentReplicas * currentMetricValue) / desiredReplicas] + epsilon now if we add the constraint that all these variables are positive integers (which they are in practice) then to ensure that RHS is always an integer (because the LHS desiredMetricValue is an integer): epsilon = ceil[(currentReplicas * currentMetricValue) / desiredReplicas] - (currentReplicas * currentMetricValue) / desiredReplicas for example assume the quantity: (currentReplicas * currentMetricValue) / desiredReplicas = 4.4, to ensure the RHS is integer epsilon must be: ceil(4.4) - 4.4 = 5 - 4.4 = 0.6, => 4.4 + 0.6 = 5 which is an integer it actually can also be 1.6, 2.6, 3.6 etc… which can be added to the epsilon equation (+ k) where k is 0, or positive integer. but for simplicity let just assume it’s 0 and omit it. then substituting for epsilon we get:

desiredMetricValue = ceil[(currentReplicas * currentMetricValue) / desiredReplicas]


so we use this equation with what we have now to scale down: Screen Shot 2022-03-22 at 3 08 11 desiredMetricValue = ceil[(5 * 84) / 4] = 105 this means we need to set HPA averageUtilization to 105 (105%) in order to scale down to 4, i will set it to 104 first, wait and make sure it doesn’t scale down and then to 106 and check if it does. averageUtilization: 104 apply HPA changes: Screen Shot 2022-03-22 at 3 32 27 after 6 min still 5 replicas Screen Shot 2022-03-22 at 3 34 30

now change averageUtilization to 106 and apply averageUtilization: 106 Screen Shot 2022-03-22 at 3 37 34 and in less than a minute it scaled down to 4! Screen Shot 2022-03-22 at 3 38 57

one more try to drive the point home, scale from 4 to 3 calc: ceil[(4 * 84) / 3] = 112 exact limit test: averageUtilization: 111 Screen Shot 2022-03-22 at 3 50 24 after 6 min still 4 replicas Screen Shot 2022-03-22 at 3 52 33

averageUtilization: 112 Screen Shot 2022-03-22 at 3 55 20 and after approx. 1 min scaled down to 3 replicas Screen Shot 2022-03-22 at 3 56 11

conclusion: this algorithm is suitable for scaling up even if the change is small as it ceils the calculated quantity, so it scales up on exact limit of upper bound, but when scaling down it also ceils, so the difference has to be big enough in order to get a smaller number of replicas after ceiling. it’s worth mentioning also, if this wasn’t the case that a big difference is needed to scale down with current algorithm, the HPA would have a flapping

problem: utilization overload -> scales up -> utilization drop -> scales down -> utilization overload -> scales up -> ad infinitum worst case scenario is when scaling down from 2 -> 1 pod, the the desiredMetricValue should be double the currentMetricValue — desiredMetricValue = 2 * currentMetricValue for it to scale down (50% / 100%)

solution: i tweaked around with variables to see if there is a work around, and there is but its bound to idle memory utilization and the amount of increase it takes to scale up, like if its set in a way that it starts with 49% / 100% (less than 50%) at idle then it will scale down when there is no load, the catch is that the amount of increase it needs to scale up to 2 pods is larger than double, which is not and ideal solution. an algorithmic solution could be by introducing a new variable scaleDownThreshold defined by the user as follows:

// check if currentMetricValue is above desiredMetricValue, if it is scale up as usual
if (currentMetricValue > desiredMetricValue) {
  desiredMetricValue = ceil[(currentReplicas * currentMetricValue) / desiredReplicas]
  ... scale up ...
} else {
  // otherwise check if the difference between them is above or below scale down threshold
  currentDiff = desiredMetricValue - currentMetricValue
  // using floor instead of ceil -- scale down only if difference is larger otherwise do nothing
  if (currentDiff > scaleDownThreshold) {
    desiredMetricValue = floor[(currentReplicas * currentMetricValue) / desiredReplicas]
    ... scale down ...
  }
}

for now its better to stick to CPU metric and make sure currentMetricValue at idle is at most half desiredMetricValue.

currentMetricValue * 2 =< desiredMetricValue

labels that are common on the HPA and the pods

It’s not this. It’s labels that are common with the deployment that the HPA is looking at, and pods that are NOT part of that deployment.

I’ve had this issue for months and this ended up being what fixed it. I had three deployments in the same cluster which had the same pod labels and .spec.selector fields. I added a unique label/selector field to each deployment (note: I needed to delete and recreate the deployment since the selector field is immutable). After that the HPA started behaving normally.

My issue was this. I was trying to track it down for like 6 weeks, read this and many other posts but completely missed the point that it’s the selectors of the scaleTargetRef that you need to make sure are only selecting the pods in the deployment. Deployment, ReplicaSets, etc… will apparently work perfectly fine with selector that is selecting pods from other deployments, but fail hard when you attach an HPA. This is more than a little frustrating…

In our case a developer had a dozen different deployments all with the same label app: myapp (let’s say). Then used selector: matchLabels: app: myapp

So even though the deployment only had 4 pods, the HPA was calculating replicas using over 50 pods… You will not see this in describe, or get, or k8s logs… they all report from the deployment numbers BUT the replica count code is using the targets selector to build the collection of metrics 🤦 https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/podautoscaler/horizontal.go#L574 (clarification, the scale object returned there is then used to find the selector and calculates replicas using that selector https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/podautoscaler/horizontal.go#L258)

So is this our “bug”, should this be addressed? Or merely better docs… this is really hard to describe… took me a couple attempts with the dev team that caused it and my own core k8s team before they got it.

I’m having the exact same issue as you @max-rocket-internet, also running on EKS with their latest version available to date. This is frustrating 😦

Having the same issue on 1.15.9(on aws via kops), I tried all hpa v1, v2beta1 and v2beta2, and both cpu utilization and custom metrics(like requests per second) via prometheus adapter, hpa always scale up and never scales down even metric is much smaller than target value.

labels that are common on the HPA and the pods

It’s not this. It’s labels that are common with the deployment that the HPA is looking at, and pods that are NOT part of that deployment.

I’ve had this issue for months and this ended up being what fixed it. I had three deployments in the same cluster which had the same pod labels and .spec.selector fields. I added a unique label/selector field to each deployment (note: I needed to delete and recreate the deployment since the selector field is immutable). After that the HPA started behaving normally.

Same issue here. Running k8s 1.19.13 on GKE. Deployment scales up but won’t scale down.

This issue might be related to https://github.com/kubernetes/kubernetes/issues/85506, where it seems pending pods will prevent an HPA from scaling down.

For those who are seeing this issue on GKE, here’s the google bugtracker for the issue: https://issuetracker.google.com/issues/204415098

I have the same issue but none of the fixes here are working for me. I’m running 1.16 on digitalocean’s DOKS managed k8s.

My CPU util target is 80%.

kubectl describe hpa constantly reports below target.

HPA always maxes out.

I am experiencing the same issue (v1.14.6).

In my scenario, I had an auto-scale event that brought the replica number to 2. After load has stopped, the 2 pods were utilizing ~30% CPU (targetCPUUtilizationPercentage=50).

HPA never kicked in and scaled it down, so I started experimenting (waiting several minutes after each step):

  1. Increased targetCPUUtilizationPercentage to 85. Nothing changed.
  2. Set maxReplicas to 1. Shortly after, deployment was scaled down.
  3. Decreased targetCPUUtilizationPercentage back to 50.
  4. Set maxReplicas back to 4. Nothing changed.

CPU utilization kept around 30% all along.

To me, this looks like the HPA is malfunctioning. I thought this might have been fixed on a later release, but I see this issue is still open and active so I guess it wasn’t.

@itninja-hue

Scale Down stabilization is 5 minutes by default and you can change it with that flag.

--horizontal-pod-autoscaler-downscale-stabilization

see more details here.

Yeah, this issue needs to be reopened as this is a blocker for using memory HPA in Kubernetes.

Seeing this in kubernetes 1.21 when using custom metrics. The metric drops below target and the HPA responds by scaling up.

  Type    Reason             Age                   From                       Message
  ----    ------             ----                  ----                       -------
  Normal  SuccessfulRescale  5m52s (x49 over 15h)  horizontal-pod-autoscaler  New size: 4; reason: Service metric cortex_query_scheduler_queue_length above target
  Normal  SuccessfulRescale  2m20s (x34 over 9h)   horizontal-pod-autoscaler  New size: 8; reason: All metrics below target

@liggitt @wojtek-t @pohly @smarterclayton Could anyone of you reopen this issue maybe?

As a temporary workaround I was able to get it scale down by deleting all pods that were not running (pods with state Terminated, NodeShutdown ect). Make sure only pods with Running status are visible when you run kubectl get pods -A

This was a GKE cluster (version: 1.21.5-gke.1302) with preemptible option enabled which cause lot of instances to be in status of Terminated or NodeShutdown status.

Do you use fluxcd ir some continuos deployment?

Make sure that spec.replicas of your deployment.yml no exist.

Hey guys, I am not sure if our problem matches the cases described here but as far as I can read they seem connected. I am doing load tests in order to find the correct scaling configuration for us and having some strange scaling behaviors for the hpa on AWS EKS. I am running the same load test for a few minutes and new replicas are scaled up correctly. The problem is that every 5 minutes the hpa seems to delete and create new pods but always keeps the number the same. In my eyes this should not happen at all because the load stays the same. The recreation of pods causes short peaks in response time and i want to get rid of them. Does anyone of you have an idea?

In this chart you see the old and newly created pods

image

It can be reproduced with this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        resources:
          requests:
            cpu: 1
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
---  
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-1
spec:
  scaleTargetRef:
    kind: Deployment
    apiVersion: apps/v1
    name: nginx-1
  minReplicas: 1
  maxReplicas: 2
  targetCPUUtilizationPercentage: 50

should this be addressed? Or merely better docs

This is definitely a bug and should be fixed! The HPA scaleTargetRef uses deployment by name and that should be enough without having to worry about selectors.

@max-rocket-internet Try increasing your metrics resolution from the default. I was experienceing similar behavior, I added the flag --metric-resolution=5s (the default is 60s), and it seems to be behaving in a much more expected manner now.

As @SocietyCao said, in my case it appears that the HPA was rapidly scaling up my service, creating a bunch of pods that didn’t have any metrics yet, which in turn caused the HPA to assume the pods were under load. Seems like it can create a feedback loop of sorts.

For us this issue started appearing as soon as we upgraded to GKE 1.21 about a month ago. After some investigation we noticed that we have Pods in Terminated and/or NodeShutdown state more often than when we were on GKE 1.20. Having more such pods is probably (most likely?) related to the Graceful node shutdown feature that went into beta in Kubernetes 1.21 (and thus is enabled by default). Though according to the GKE release notes this should have been enabled on preemptible node pools since GKE 1.20.5-gke.500. However, at least on our GKE 1.20.11-gke.1801 clusters we can confirm (as far as that’s possible) that it doesn’t seem to be enabled there: kube-proxy is not started with the GracefulNodeShutdown feature gate (while it is the case for GKE 1.21). Now that’s not the way to know if the feature node is enabled in the control plane or not, but it’s the closest you can get to it (as far as we know).

Anyway, it indeed seems (in our scenario) that because of the increase in Status=Failed Pods we’re being hit by our HPAs not scaling down properly anymore. Cleaning up the terminated pods makes the HPA scale down again.

Do the other people who have hit this issue also have other pods that might have a the same labels as the pods from the deployment of the HPA? For example, created from cronjobs/jobs?

Since I separated all cronjobs from my app’s helm chart(one release with deployment and one release with cronjobs from the same chart, and they only have different name), hpa started to scale deployment correctly. This is very weired, because pods from deployment and cronjobs have different component label(web-server and cronjob).

I’m using EKS 1.17 and faced same issue. I tried everything that I saw in this thread. Finally I make it down, by deleting old replicasets (kubectl delete $(kubectl get all | grep replicaset.apps | grep "0 0 0" | cut -d' ' -f 1)), increasing targetAverageUtilization from 80 to 90 hpa.yaml, and what I think make it work, removing replicas: 1 from deployment.yml

Ehh, we’ve lost some money in off-peak hours because of not scaling down…

But, I think I’ve found a workaround in the meantime.

It seems that using autoscaling/v1 instead of autoscaling/v2beta2 if you can rely just on CPU usage works correctly:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: my-autoscaler
  namespace: my-namespace
  labels:
    app: my-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-deployment
  minReplicas: 3
  maxReplicas: 10
  # We're using old API due to
  # - https://github.com/kubernetes/kubernetes/issues/78712
  # - https://github.com/kubernetes/kubernetes/issues/78761
  targetCPUUtilizationPercentage: 20