sarama: Reading Azure event hubs fails with "kafka: response did not contain all the expected topic/partition blocks"

Versions

Please specify real version numbers or git SHAs, not just “Latest” since that changes fairly regularly.

Sarama Kafka Go
1.24.1 1.0 1.12.4
Configuration

What configuration values are you using for Sarama and Kafka?

k := sarama.NewConfig()

k.Version = sarama.V1_0_0_0

k.Consumer.Return.Errors = true
k.Consumer.Offsets.Retry.Max = 5
k.Consumer.Offsets.Initial = sarama.OffsetOldest

k.Net.TLS.Enable = true
k.Net.SASL.Enable = true
k.Net.SASL.User = "$ConnectionString"
k.Net.SASL.Password = connectionStr
Logs

When filing an issue please provide logs from Sarama and Kafka if at all possible. You can set sarama.Logger to a log.Logger to capture Sarama debug output.

logs: CLICK ME

[Sarama] 2019/11/15 13:52:56 Initializing new client
[Sarama] 2019/11/15 13:52:56 ClientID is the default of 'sarama', you should consider setting it to something application-specific.
[Sarama] 2019/11/15 13:52:56 ClientID is the default of 'sarama', you should consider setting it to something application-specific.
[Sarama] 2019/11/15 13:52:56 client/metadata fetching metadata for all topics from broker filebeat-test.servicebus.windows.net:9093
[Sarama] 2019/11/15 13:52:56 Successful SASL handshake. Available mechanisms: [PLAIN]
[Sarama] 2019/11/15 13:52:56 SASL authentication successful with broker filebeat-test.servicebus.windows.net:9093:4 - [0 0 0 0]
[Sarama] 2019/11/15 13:52:56 Connected to broker at filebeat-test.servicebus.windows.net:9093 (unregistered)
[Sarama] 2019/11/15 13:52:57 client/brokers registered new broker #0 at filebeat-test.servicebus.windows.net:9093
[Sarama] 2019/11/15 13:52:57 Successfully initialized new client
[Sarama] 2019/11/15 13:52:57 client/metadata fetching metadata for [kafka-west] from broker filebeat-test.servicebus.windows.net:9093
[Sarama] 2019/11/15 13:52:57 client/coordinator requesting coordinator for consumergroup $Default from filebeat-test.servicebus.windows.net:9093

Problem Description

(This is arguably an issue in Azure rather than sarama, but I’m sharing it since it potentially affects a lot of sarama users and might require a workaround. It doesn’t depend on the kafka or sarama version.)

Starting recently, Azure’s kafka implementation (“event hubs”) started responding to OffsetFetchRequest for new consumer groups with an OffsetFetchResponse containing an empty offset table. This causes sarama clients to fail when fetching the initial offset from the coordinator, with the error message kafka: response did not contain all the expected topic/partition blocks.

As described in this spec page, we expect a response of the form [TopicName [Partition Offset Metadata ErrorCode]]. For topic / partition pairs that have not been read, we expect the inner partition table to contain empty metadata and an offset of -1. This is what Azure event hubs used to do, but early this week we started seeing failures that we traced to responses with an empty partition table.

The failure in sarama happens in offset_manager.go:fetchInitialOffset. The offset is requested with resp, err := broker.FetchOffset(req), which succeeds. The broker returns a valid OffsetFetchResponse, but the block table inside is empty, which causes the following call block := resp.GetBlock(topic, partition) to fail.

As far as we can tell, this change only affects consumer groups that have never committed an explicit offset. We have a debug workaround that fixes the problem by inserting a placeholder offset when the response table is empty – after that, the event hub can be read, and as soon as the consumer commits an offset the problem goes away, and future connections will get a nonempty offset response.

We don’t know yet if this is an intentional change on Azure’s part (we have shared the issue with them and are waiting for more information), or if there is some subtlety of the Kafka spec that allows for an empty table in new consumer groups. If Azure reverts to the old behavior, this issue may be moot, but in the meantime I wanted to share what we’ve found in case others are encountering the same problem.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 23 (2 by maintainers)

Most upvoted comments

Hi! Is this happening again?

As a bystander here and sarama maintainer, I just wanted to say thanks to all people in this thread. This is inspiring for me personally to read, even though this issue is about bugs and imperfections (happens to the best of us, in our line of work).

@arerlend - thanks for being responsive in a way that you are, here. For me personally this is a sign of a team that cares about their product. Please, pass my hugops internally to your team if you’ll have a chance and if that’s appropriate.

Apologies for getting a bit sentimental here and adding clutter to inboxes of y’all 🙏🏼

❤️

I’ve forked sarama and added the following workaround (previously mentioned on this thread https://github.com/Shopify/sarama/pull/1542) and it works https://github.com/davidandradeduarte/sarama/commit/c2258cbf425a06a85212a4fd2c065f6e2c9feb4d

This should be fixed on the EH side now. Will update the thread when the fix reaches deployment.

@faec this fix was deployed again last week. Issue should be resolved.

Sorry I missed this. Deployment had to be put on hold for the holidays but is resuming now. If anyone needs specific namespaces to be patched, feel free to reach out to me via email.