kafkajs: Consumer offset "stuck" on certain partitions

Observed behavior I consume messages from a topic and this topic has 24 partitions. I started to consume from beginning and at first everything was fine but after some time the consumer stopped consuming messages from certain partitions.

The issue itself is very similar to this issue (562) but I’m using the current version of KafkaJS (v1.15.0) so I’m at a loss what the problem could be. As far as I’m aware the topic also uses log compaction.

I wrote a simple partition assigner that I programmed to only consume from partitions that were “stuck”. After that I added some console.log messages into the KafkaJS code (consumerGroup.js) to debug the problem further. I came to the point that I always got zero messages in the response from broker.fetch.

This was the response:

[
  {
    topicName: "MyTopic",
    partitions: [
      {
        partition: 1,
        errorCode: 0,
        highWatermark: "532672",
        lastStableOffset: "532672",
        lastStartOffset: "0",
        abortedTransactions: [],
        preferredReadReplica: -1,
        messages: [],
      },
    ],
  }
]

The offset that was used to fetch the next messages was like this: { MyTopic: { '1': '484158' } }

There are clearly still messages to consume but it always fetches zero because always the offset 484158 is used. I changed the offset manually via the admin interface to a higher and valid offset and after that the consumer worked again.

Expected behavior I would expect to receive all messages until the latest offset.

Environment:

  • OS: Mac OS 10.15.7
  • KafkaJS version 1.15.0
  • Kafka version 2.6.1
  • NodeJS version 12.18.3

Additional context If further logs are needed I can provide them. I couldn’t see any useful debug messages for this problem…

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 6
  • Comments: 17 (2 by maintainers)

Most upvoted comments

Hi,

We have what seems to be a similar issue.

One of our partition is stuck at the same offset for our three consumer groups. It also is a compacted topic and using eachMessage instead of eachBatch does not help.

How could we help resolving the issue ? Do you know any workaround other than moving the offset manually ?

Thanks !

Hi,

We too have had the same issue twice this week. Each time one or two partitions from a 3-partitions topic were stuck at the same offset for all our consumer groups (This topic is compacted too). Do you know if someone made progress on this issue? Is there a way we can help solve it?

Thanks in advance

Is there a way we can help solve it?

Like I mentioned a year ago, a way to consistently reproduce the issue is the best way to resolve it. Ideally a fork with a failing test, but even just a script that creates a topic and produces to it with whatever parameters are required to trigger the bug, and then a consumer that gets stuck, would be helpful.

I have a similar problem.

Last offset successfully consumed = 221235053, next offset = 221306740. diference of offsets between 2 messages is ~70k

But the consumer is stuck and does not consume further and constantly tries to fetch 221235053 offset and gets no messages.

I have to define ridiculously high maxBytes so that the consumer could grab the next offset. But this should not be a solution, because it’s not optimal to fetch a high number of messages all at once.

I think there should be a check if this batch is empty, but not the last by using offsetApi or by checking if fetchApi returned OFFSET_OUT_OF_RANGE