kubernetes: HPA counts pods that are not ready and doesn't take action
/kind bug
What happened: HPA
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
service-entrata Deployment/service-entrata 0% / 25% 2 100 2 1d
Deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
service-entrata 2 2 2 0 4d
What you expected to happen:
HPA should take action and start 2 new pods
How to reproduce it (as minimally and precisely as possible):
Create a deployment with a strict readiness probe timeoutSeconds=1 periodSeconds=2 and make CPU go up until the the pods become not ready.
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version):1.7.3 - Cloud provider or hardware configuration**:
GKE - OS (e.g. from /etc/os-release):
alpine-node
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 2
- Comments: 34 (14 by maintainers)
Would it help people if I made the “ready-hpa” public?
Ah, you have a fundamentally different definition of “unready” than we assume in the HPA. In order to deal with pod initialization CPU spikes we assume that unready pods will be back soon-ish, and that unready pods are probably unready because they’re not yet ready. This means that we don’t keep scaling up while new pods are starting, and we don’t accidentally overscale because a pod consumes a lot of resources while initializing.
However, it also means (somewhat inadvertently) that when pods go from ready to unready, the HPA is much more conservative about scaling up new pods. I can definitely see how this behavior could be detrimental. I’m imagining a system like this:
Ideally, two factors mitigate this issue:
Howevever, based on this bug, those two mitigating factors might not be enough. We may want to make some more explicit considerations somehow for pods going unready because they’re “full”. @kubernetes/sig-autoscaling-feature-requests.
It’ll be in 1.12
Is this issue really solved ? I have a very similar use case as OP : we have very sudden traffic spikes, and on top of that we make use of persistent connections (SSE and WebSockets) which cause imbalances in resource consumption between the different pods of a same deployment. As a consequence, we use readiness probes as a way to make pods temporarily unavailable, so that they stop receiving new requests for a little while and are able to cool down, otherwise they would just end up crashing because of overload.
When we have a massive traffic spike, all our pods end up being in the “unready” state before the HPA even realizes that they are over the CPU threshold. And then the HPA seems to ignore the unready pods, says that the overall cpu consumption of the deployment is
<unknown>/threshold, and thus doesn’t scale up although the deployment is completely overloaded.EDIT: after more tests, the behaviour I observed was apparently linked with the parameter
--horizontal-pod-autoscaler-cpu-initialization-period, which will discard Unready pods from the CPU calculations during the first 5mn of their lives. After that 5mn period, the HPA works as expected.This is still an issue, I think I have managed to cut down the tests to demonstrate it. https://gist.github.com/mgazza/f01d03464ac480a2de1fa9f6edeee1f6#file-horizontal_test-go