keda: Add circuit breaking (immediately scale to 0) on specific Trigger
Proposal
Proposing a circuit breaking feature for Keda. Effectively, if a specific circuitBreak trigger crosses a threshold, it overrides the other triggers and scales the Deployment/ReplicaSet/StatefulSet down to 0.
This would be useful in situations where some microservice upstream or downstream releases bad data into the target process; effectively stopping the data transmission at the top of the funnel.
Possible recovery options could include:
- Manual intervention (a redeploy or annotation deletion)
- When the
circuitBreakthreshold is under the target value
Use-Case
For example, we have a Kafka stream worker that ingests messages and pushes them downstream to APIs and Seldon models. We have implemented a dead letter queue; and if the target rate of messages going to the dlq is above a certain threshold, we would like to stop the stream worker in its’ tracks.
Anything else?
There are other ways to do this using Istio and various other tooling; but Keda would offer much more flexibility than what I have seen.
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 6
- Comments: 15 (9 by maintainers)
Most definitely-- I think rolling out to the individual Scalers could happen over time, but Prometheus, metric API and Datadog would provide a huge benefit here.
Since posting this I’ve come up with a slightly different workflow using KEDA that automates a circuit breaker effect, but it seems like all of the pieces are in place within Keda to allow something like this. Not a great feature for a large amount of people, but for some, I think it would be a big feature.
💯, we should provide automation for this.
I don’t think it does, because the ask is to scale to 0 if a given threshold is met. I’m not completely sure if the model should be based on scaler though. I do see why it could be an approach but typically you’d check a metric on something different from what you are scaling.
Taking the example in the request:
Here the circuit is meant to be broken if the DLQ is getting messages, while the app typically is interested in the main queue.
So I’m wondering if we should support multiple providers such as Prometheus metric, metric API and potentially talking to dependencies but not through the
triggersection.