confluent-kafka-go: Unable to rejoin a group when consumer poll timeout in `confluent-kafka-go v2.1.0`

Description

When I upgraded confluent-kafka-go from v2.0.2 to v2.1.0, it was not possible to rejoin consumers back to the consumer group in 2.1.0 when the time interval between two consumer poll operations exceeded max.poll.interval.ms, but it was feasible in v2.0.2.

When Group myTopicName join state changed wait-unassign-to-complete -> init:

v2.0.2: Group myTopicName join with 1 subscribed topic(s)…

v2.1.0: Requseting metadata for 1/1 topics: periodic topic and broker list refresh…

client config:

request.timeout.ms: 60000
retries: 5
socket.timeout.ms: 60000

consumer config:

session.timeout.ms: 45000
max.poll.interval.ms: 60000
heartbeat.interval.ms: 10000
socket.timeout.ms: 60000
enable.auto.commit: false
enable.auto.offset.store: false
group.id: myTopicName

How to reproduce

pseudo-code:

c,err := NewConsumer(..)
c.subscribe(topic,nil)
for {
	ev := c.poll(100)
	time.sleep(65000) // set `sleep` 65s to make `c.poll` timeout
}

Checklist

Please provide the following information:

  • [√] confluent-kafka-go and librdkafka version (v2.0.2 to v2.1.0):
  • [√] Apache Kafka broker version: 2.3.0
  • [√] Client configuration: ConfigMap{...}
  • [√] Operating system:linux x86_64
  • [√] Provide client logs (with "debug": ".." as necessary)
  • Provide broker log excerpts
  • Critical issue

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 3
  • Comments: 16 (6 by maintainers)

Most upvoted comments

+1 having the same issue when upgraded from 2.0.2 to 2.1.0. On the 2.0.2 my workers would run fine, now they exit the group and don’t rejoin with the following error:

%4|1681402127.003|MAXPOLL|rdkafka#consumer-1| [thrd:main]: Application maximum poll interval (20000ms) exceeded by 264ms (adjust max.poll.interval.ms for long-running message processing): leaving group

I am also getting this, despite poll() definitely being called.

Steps to reproduce:

  1. Create a new empty topic
  2. Have the consumer join a group and subscribe to that topic
  3. Run poll in a for loop - e.g. for { c.Poll(100) }
  4. Do not send any messages to the topic for the duration of max.poll.interval.ms

Despite poll being called (and logs showing the fetch), the consumer is still kicked out of the group after max.poll.interval.ms.

@fkarakas about Magnus, check this comment.

@fkarakas , yep, that’s what we did, we added a failing test to librdkafka, where the actual issue exists. I was talking about extending the go test suite to include more tests, too.

Fix is available in the v2.1.1-RC1 and we expect to have it in v2.1.1 when it is released after some soak testing.

OK. But it is always preferable to have a falling test before the fix, so when the fix is done and the test passes you are sure that your test and the fix are correct 😃

+1 same issue. Where is @edenhill ?