kminion: Consumer Group Lag calculation missing for most topics

Hi all,

I’m facing an issue where after deploying kafka-minion on openshift, it’s able to see and calculate group lag for messages that I’m creating and consuming via the kafka CLI commands, but not ones generated via the spring boot framework.

Producer:

while true; do echo "TEST"; sleep $[ ( $RANDOM % 100) + 1 ]; done | kafka-console-producer.sh --broker-list $broker --producer.config $clientProperties --topic ops.test.topic

Consumer:

kafka-console-consumer.sh --bootstrap-server $broker -topic ops.test.topic --consumer.config $clientProperties --consumer-property group.id=test-consumer-group

We’re using other tools to monitor our MSK instances (like Kafdrop and Conduktor) and those are able to calculate the consumer lag reliably on all topics.

When I query kafka_minion_group_topic_lag in Prometheus I can only see the topic and group generated via CLI.

I should mention that I’m seeing messages on the kafka-minion logs to do with partition lag (edited to remove topic), but I’m not sure if they’re related to the fact I can’t see the consumer lag:

{"level":"warning","module":"collector","msg":"could not calculate partition lag because low water mark is missing","partition":3,"time":"2020-04-06T14:35:51Z","topic":"TOPIC_NAME"}

and no matter what topic I choose, the low water mark is always 0, for example:

kafka_minion_topic_partition_high_water_mark{partition="4",topic="TOPIC_NAME"} 152
kafka_minion_topic_partition_low_water_mark{partition="0",topic="TOPIC_NAME"} 0

I’m fairly new to Kafka and its ancillaries, so please let me know if there’s anything else I can provide to help diagnose the issue.

Many thanks, and stay safe!

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 30 (9 by maintainers)

Most upvoted comments

Thanks for all your inputs regarding the ACLs. I hope that the V2 will return more descriptive errors when Kafka cluster requests fail due to lacking permissions. We replaced sarama with franz-go which seems to be a superior Kafka client.

Been playing around with ACL’s to get Kafka-minion working after also getting the could not calculate partition lag because low water mark is missing message.

If you are using the Kafka GitOps project to manage your ACL’s, then this is the set of customServiceAcls you need.

If you’re using the command line kafka-acls tool, I’m sure you can translate the below in input for that tool.

  kafka-minion:
    acl1:
      name: __consumer_offsets
      type: TOPIC
      pattern: LITERAL
      host: "*"
      principal: User:kafka-minion
      operation: READ
      permission: ALLOW
    acl2:
      name: __consumer_offsets
      type: TOPIC
      pattern: LITERAL
      host: "*"
      principal: User:kafka-minion
      operation: DESCRIBE_CONFIGS
      permission: ALLOW
    acl3:
      name: kafka-cluster
      type: CLUSTER
      pattern: LITERAL
      principal: User:kafka-minion
      host: "*"
      operation: DESCRIBE
      permission: ALLOW
    acl4:
      name: "*"
      type: TOPIC
      pattern: LITERAL
      principal: User:kafka-minion
      host: "*"
      operation: DESCRIBE
      permission: ALLOW
    acl5:
      name: "*"
      type: TOPIC
      pattern: LITERAL
      principal: User:kafka-minion
      host: "*"
      operation: DESCRIBE_CONFIGS
      permission: ALLOW

@justCatchingRye Maybe your app uses older kafka api? or store offsets in zookeeper? And if so, only then you when you are consuming it with kafka-consumer-groups actually create that stats in kafka backend.

I think offsets.storage=kafka setting is responsible https://gitter.im/spring-projects/spring-kafka?at=5acce83f6bbe1d2739d0bd24

https://kafka.apache.org/22/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html Storing Offsets Outside Kafka

@sirkubax Please always check the /metrics endpoint of kafka minion if you suspect missing metric serieses. There are futher point of failures inbetween Kafka Minion and the Prometheus UI / TSDB where metrics are stored.

There are different ways of getting consumer group offsets from Kafka. Kafka Minion is different than most exporters as it consumes the __consumer_offsets topic (as described in the README) while others talk to the Kafka Brokers and ask for the offsets. You can give https://github.com/cloudhut/kowl a shot as you can see consumer group offsets there as well, but Kowl asks the Kafka brokers for the offset using Kafka’s admin API.