confluent-kafka-dotnet: OffsetCommitRequest timeout causes consumers rebalancing

Description

Hello, We have been using the latest Kafka client library (1.2.0) with defaults settings. Our typical Kafka topic consumption loop is to read an event and commit it one by one. Recently we have noticed a lot of random Broker: Unknown member exceptions while commiting event offset.

Logs says:

{"Message":"[thrd:GroupCoordinator]: 
GroupCoordinator/3: Timed out HeartbeatRequest in flight (after 10963ms, timeout #0): possibly held back by preceeding OffsetCommitRequest with timeout in 48457ms",
"ClientInstance":"rdkafka#consumer-1","Facility":"REQTMOUT"} 

then this:

{"Message":"[thrd:GroupCoordinator]: 
GroupCoordinator/3: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests",
"ClientInstance":"rdkafka#consumer-1","Facility":"REQTMOUT"} 

And finally this happens (because of rebalancing)

Broker: Unknown member ---> 
Confluent.Kafka.KafkaException: Broker: Unknown member\n 
at Confluent.Kafka.Impl.SafeKafkaHandle.Commit(IEnumerable`1 offsets)\n at 
Confluent.Kafka.Consumer`2.Commit(ConsumeResult`2 result)

I’m wondering why we see this preceeding OffsetCommitRequest if we just commit offsets one by one sequentially.

Could you please help to figure out what is happening?

How to reproduce

NuGet packages installed: <PackageReference Include="Confluent.Kafka" Version="1.2.0" />

while (true)
{
                consumeResult = _consumer.Consume(500ms);
                if (consumeResult == null)
                {
                    return;
                }
                _consumer.Commit(consumeResult);
}

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

@aouakki , @oleg-orlenko In my company we have been migrating everything to much more stable go-based client https://github.com/Shopify/sarama

@alex-namely - we see a lot of people migrating to the confluent go client from sarama for the same reason. the confluent go client is used heavily by some of the largest users of kafka. can’t name names, but you’re most likely using more than one product powered by it.

@aouakki - we’re looking into this.

Yes, that does sound likely. You can verify this by setting Debug = "protocol" in your config and monitor the round trip time (rtt) of OffsetCommit Requests in the emitted debug logs

i’m not sure based on the info provided. my next step would be to look at the broker logs for a clue as to why the commit is timing out.