keda: Kafka scaler not scaling to zero when offset is not properly initialized

Report

How is scaling to zero supposed to work? I’d like to scale to 0 if there is a topic with no messages at all or all consumed so far. However it always scales to 1.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: svc-webhook-processor
spec:
  cooldownPeriod: 300
  fallback:
    failureThreshold: 3
    replicas: 6
  idleReplicaCount: 0
  maxReplicaCount: 100
  minReplicaCount: 0
  pollingInterval: 30
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: svc-webhook-processor
  triggers:
  - metadata:
      allowIdleConsumers: "false"
      bootstrapServers: kafka-client.devbox-raffis-1:9092
      consumerGroup: svc-webhook-processor
      lagThreshold: "5"
      offsetResetPolicy: latest
      topic: webhook-request
      version: 1.0.0
    type: kafka
status:
  conditions:
  - message: ScaledObject is defined correctly and is ready for scaling
    reason: ScaledObjectReady
    status: "True"
    type: Ready
  - message: Scaling is performed because triggers are active
    reason: ScalerActive
    status: "True"
    type: Active
  - message: No fallbacks are active on this scaled object
    reason: NoFallbackFound
    status: "False"
    type: Fallback
  externalMetricNames:
  - kafka-webhook-request-svc-webhook-processor
  health:
    kafka-webhook-request-svc-webhook-processor:
      numberOfFailures: 0
      status: Happy
  lastActiveTime: "2021-08-13T09:45:24Z"
  originalReplicaCount: 3
  scaleTargetGVKR:
    group: apps
    kind: Deployment
    resource: deployments
    version: v1
  scaleTargetKind: apps/v1.Deployment

Expected Behavior

Scale deployment to 0.

Actual Behavior

Scaled to 1 replica.

Steps to Reproduce the Problem

Logs from KEDA operator

2021-08-13T09:49:46.505Z	INFO	kafka_scaler	invalid offset found for topic webhook-request in group svc-webhook-processor and partition 0, probably no offset is committed yet
2021-08-13T09:50:17.607Z	INFO	kafka_scaler	invalid offset found for topic webhook-request in group svc-webhook-processor and partition 0, probably no offset is committed yet
2021-08-13T09:50:48.211Z	INFO	kafka_scaler	invalid offset found for topic webhook-request in group svc-webhook-processor and partition 0, probably no offset is committed yet
2021-08-13T09:51:19.244Z	INFO	kafka_scaler	invalid offset found for topic webhook-request in group svc-webhook-processor and partition 0, probably no offset is committed yet
2021-08-13T09:51:49.732Z	INFO	kafka_scaler	invalid offset found for topic webhook-request in group svc-webhook-processor and partition 0, probably no offset is committed yet

KEDA Version

2.4.0

Kubernetes Version

1.18

Platform

Amazon Web Services

Scaler Details

Kafka

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 15 (7 by maintainers)

Most upvoted comments

What I would expect from KEDA is:

  1. When there are no messages at all, the lag is 0 so scale to minReplicaCount
  2. When there are messages and no committed offset, the lag depends on offsetResetPolicy, in any case, it should scale to at least one max(minReplicaCount, 1)
  3. When there is a committed offset use the consumer group lag, if the lag is 0, scale to minReplicaCount.

Okay after reading through this and reminding myself what offsetResetPolicy really does.

There might people who depend on the existing behaviour, but it’s going to be a small group with a misconfiguration that works by accident.

If people do solely depend on the kafka lag and want to guarantee at least 1 pod to always be available, regardless of lag being 0, then they should be be setting minReplicas=1. They should not be relying on this particular quirk where 0 gets interpreted as a valid metric preventing them from scaling to 0 even if they set minReplicas=0.

I’d suggest a highlighted note in a change log would be sufficient for this. I personally don’t think this is a behaviour worth preserving when people can simply set minReplicas appropriately for what they really want.

I have similar uses cases with SQS where if there are simply no messages that need to be processed, just scale to zero - I basically treat it like aws lambda. I’d expect Kafka to have the same behaviour.

@pierDipi has the right idea.

@bpinske @PaulLiang1 opinion on this?