kminion: Consumer Group Lag calculation missing for most topics
Hi all,
I’m facing an issue where after deploying kafka-minion on openshift, it’s able to see and calculate group lag for messages that I’m creating and consuming via the kafka CLI commands, but not ones generated via the spring boot framework.
Producer:
while true; do echo "TEST"; sleep $[ ( $RANDOM % 100) + 1 ]; done | kafka-console-producer.sh --broker-list $broker --producer.config $clientProperties --topic ops.test.topic
Consumer:
kafka-console-consumer.sh --bootstrap-server $broker -topic ops.test.topic --consumer.config $clientProperties --consumer-property group.id=test-consumer-group
We’re using other tools to monitor our MSK instances (like Kafdrop and Conduktor) and those are able to calculate the consumer lag reliably on all topics.
When I query kafka_minion_group_topic_lag in Prometheus I can only see the topic and group generated via CLI.
I should mention that I’m seeing messages on the kafka-minion logs to do with partition lag (edited to remove topic), but I’m not sure if they’re related to the fact I can’t see the consumer lag:
{"level":"warning","module":"collector","msg":"could not calculate partition lag because low water mark is missing","partition":3,"time":"2020-04-06T14:35:51Z","topic":"TOPIC_NAME"}
and no matter what topic I choose, the low water mark is always 0, for example:
kafka_minion_topic_partition_high_water_mark{partition="4",topic="TOPIC_NAME"} 152
kafka_minion_topic_partition_low_water_mark{partition="0",topic="TOPIC_NAME"} 0
I’m fairly new to Kafka and its ancillaries, so please let me know if there’s anything else I can provide to help diagnose the issue.
Many thanks, and stay safe!
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 30 (9 by maintainers)
Thanks for all your inputs regarding the ACLs. I hope that the V2 will return more descriptive errors when Kafka cluster requests fail due to lacking permissions. We replaced sarama with franz-go which seems to be a superior Kafka client.
Been playing around with ACL’s to get Kafka-minion working after also getting the
could not calculate partition lag because low water mark is missingmessage.If you are using the Kafka GitOps project to manage your ACL’s, then this is the set of
customServiceAclsyou need.If you’re using the command line kafka-acls tool, I’m sure you can translate the below in input for that tool.
@justCatchingRye Maybe your app uses older kafka api? or store offsets in zookeeper? And if so, only then you when you are consuming it with kafka-consumer-groups actually create that stats in kafka backend.
I think offsets.storage=kafka setting is responsible https://gitter.im/spring-projects/spring-kafka?at=5acce83f6bbe1d2739d0bd24
https://kafka.apache.org/22/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html Storing Offsets Outside Kafka
@sirkubax Please always check the
/metricsendpoint of kafka minion if you suspect missing metric serieses. There are futher point of failures inbetween Kafka Minion and the Prometheus UI / TSDB where metrics are stored.There are different ways of getting consumer group offsets from Kafka. Kafka Minion is different than most exporters as it consumes the
__consumer_offsetstopic (as described in the README) while others talk to the Kafka Brokers and ask for the offsets. You can give https://github.com/cloudhut/kowl a shot as you can see consumer group offsets there as well, but Kowl asks the Kafka brokers for the offset using Kafka’s admin API.