keda: Prometheus Scaler - Maintain last known state if prometheus is unavailable

I am in the process of migrating to Keda. I currently use https://github.com/DirectXMan12/k8s-prometheus-adapter and it has a very useful feature. In the event that Prometheus goes down, prom-adapter maintains the last known state of the metric. This means scaling is not triggered either up or down.

With Keda, if prometheus is not available, my deployments are scaled to zero after the cooldownPeriod has expired regardless of whether the last known value was above 0 or not.

Use-Case

We are using prom adapter to scale google pubsub subscribers and rabbitmq workers. In the unlikely event that prometheus goes down we would want the existing workload to continue processing based on the numbers it knew before prometheus stopped responding.

About this issue

Original URL
State: open
Created 4 years ago
Reactions: 5
Comments: 24 (13 by maintainers)

Commits related to this issue

Add document for aws endpoint setting (#965) * Add document for aws endpoint setting Signed-off-by: Phan Duc <phan.duc@moneyforward.co.jp> * Fix docs style Signed-off-by: Phan Duc <phan.duc@... — committed to SpiritZhou/keda by yuyuvn 2 years ago

Most upvoted comments

“Maintain last known state” - I think this approach has its drawbacks, especially when autoscaling to zero via minReplicaCount: 0. Imagine that you can’t wake up your system, because the Keda Operator can’t temporarily reach the source of metrics.

I just hit this problem with postgresql trigger. After a security group change in our AWS account, the Keda Operator suddenly couldn’t reach our Postgres database and the whole system just scaled down to zero, making the service unavailable.

I propose a new (optional) field onErrorReplicaCount that would serve as a default value when Operator can’t read current values, ie.:

apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
  name: my-deployment-autoscaler
spec:
  scaleTargetRef:
    deploymentName: my-deployment
  pollingInterval: 1
  cooldownPeriod: 1200
  minReplicaCount: 0
  maxReplicaCount: 10
  onErrorReplicaCount: 2     # <==  2 pods, in case of a trigger source being unavailable

VojtechVitek on Sep 18, 2020

@bschaeffer you can use https://github.com/kedacore/keda/issues/1872 to mitigate this problem.

zroubalik on Oct 21, 2021

Not that I’m aware of. Are you interested in contributing this @lambohamp ?

tomkerkhove on Dec 30, 2020

Hi @zroubalik, thank you. I think I’ll step away from this one, but will be watching the progress.

bryanhorstmann on Aug 13, 2020