kafkajs: KafkaJSProtocolError: The broker received an out of order sequence number

Hi

I have a single producer:

const producer = kafka.producer({
  allowAutoTopicCreation: false,
  idempotent: true,
  maxInFlightRequests: 1
});

As you can see, I’m using the idempotent feature introduced in https://github.com/tulios/kafkajs/pull/203 I’m also forcing the maxInFlightRequests to 1

It works fine when my producer sends messages to the kafka server at a slow pace. By that I mean no requests are stored in the RequestQueue.

But during spike in activity, sometimes messages are automatically sent in batch. Kafka then returns the following error:

ERROR [ReplicaManager broker=0] Error processing append operation on partition TOPIC (kafka.server.ReplicaManager)
kafka_1                 | org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producerId 74: 158 (incoming seq. number), 156 (current end sequence number)

In this bug case, we have 156158 jump, because 2 messages where sent in the batch.

According the kafka java code, it expects only a 1 unit jump so 156157:

https://github.com/apache/kafka/blob/fecb977b257888e2022c1b1e04dd7bf03e18720c/core/src/main/scala/kafka/log/ProducerStateManager.scala#L241-L251

My current understanding/intuition, is that the firstSequence number should always be incremented by 1 and not by the number of messages present in the batch request.

Which roughly means that the updateSequence method of the EOSManager should not take an increment param but always increment by 1 only. https://github.com/tulios/kafkajs/blob/eae5ea42fc8ac035b0591a705b81829e3fedf8ad/src/producer/eosManager/index.js#L171-L197

I’m still quite noob on this codebase so my reasoning is probably flawed.

@ianwsperber , as you’re the main implementer of the eos code, what’s your take on this? Thanks a lot.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 9
  • Comments: 38 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Just as planned, now that the holidays are over, #1050 and #1172 have been merged (@tulios cleverly did not specify a year). This functionality is now available in the 1.17.0-beta.1 version. It would be extremely helpful if you could try out this version and let me know if it works as expected.

Any estimated date for this to be released ?

experiencing the same issue in idempotent mode

This is with me.

I think the issue is that the proposed fix is a step forward but still has issues with failures.

I will progress.

Dear all,

I wish you a Happy New Year 2022!

@tulios: It is time to solve this issue in upstream! Since 18 Dec 2019, holidays are finsihed no?

@aikoven has done a fork with 2 current PRs from @t-d-d: https://github.com/tulios/kafkajs/pull/1050 + https://github.com/tulios/kafkajs/pull/1172

Thanks in advance.

No solutions found on my side yet. I had to disable idempotency for the moment.