sarama: Sync Producer stalled - Unable to send messages to Kafka after a write tcp: broken pipe error
Versions
Please specify real version numbers or git SHAs, not just “Latest” since that changes fairly regularly.
Sarama | Go | Kafka v1.31.1 | 1.16.6 | 2.2.1
Configuration
What configuration values are you using for Sarama and Kafka?
config := sarama.NewConfig() config.Producer.Return.Successes = true config.Version = sarama.V2_2_1_0
Logs
When filing an issue please provide logs from Sarama and Kafka if at all
possible. You can set sarama.Logger
to a log.Logger
to capture Sarama debug
output.
logs: CLICK ME
2022/03/09 21:04:40.888730 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint1:9092 2022/03/09 21:04:41.189656 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint:9092 2022/03/09 21:04:41.295781 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint:9092 2022/03/09 21:04:41.295838 client.go:929: client/metadata got error from broker 77 while fetching metadata: write tcp test-endpoint->test-endpoint: write: broken pipe 2022/03/09 21:04:41.295914 sarama.go:122: Closed connection to broker test-broker-endpoint:9092 2022/03/09 21:04:41.295927 sarama.go:122: client/brokers deregistered broker #77 at test-broker-endpoint:9092 2022/03/09 21:04:41.295944 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint2:9092 2022/03/09 21:04:41.489721 sarama.go:122: client/brokers registered new broker #77 at test-broker-endpoint:9092 2022/03/09 21:14:40.889372 sarama.go:122: client/metadata fetching metadata for all topics from broker test-kafka-endpoint:9092 2022/03/09 21:14:41.189707 config.go:559: ClientID is the default of ‘sarama’, you should consider setting it to something application-specific. 2022/03/09 21:14:41.189739 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint:9092 2022/03/09 21:14:41.192789 sarama.go:122: Connected to broker at test-broker-endpoint:9092 (registered as #77) 2022/03/09 21:14:41.295503 config.go:559: ClientID is the default of ‘sarama’, you should consider setting it to something application-specific. 2022/03/09 21:14:41.295542 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint3:9092.compute-1.amazonaws.com:9092 2022/03/09 21:14:41.300616 sarama.go:122: Connected to broker at test-broker-endpoint3:9092.compute-1.amazonaws.com:9092 (registered as #74) 2022/03/09 21:24:40.888707 sarama.go:122: client/metadata fetching metadata for all topics from broker test-kafka-endpoint:9092 2022/03/09 21:24:41.189268 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint3:9092.compute-1.amazonaws.com:9092 2022/03/09 21:24:41.189316 client.go:929: client/metadata got error from broker 74 while fetching metadata: write tcp test-endpoint->test-endpoint: write: broken pipe 2022/03/09 21:24:41.189360 sarama.go:122: Closed connection to broker test-broker-endpoint3:9092.compute-1.amazonaws.com:9092 2022/03/09 21:24:41.189370 sarama.go:122: client/brokers deregistered broker #74 at test-broker-endpoint3:9092.compute-1.amazonaws.com:9092 2022/03/09 21:24:41.189376 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint2:9092 2022/03/09 21:24:41.189396 client.go:929: client/metadata got error from broker 47 while fetching metadata: write tcp test-endpoint->test-endpoint: write: broken pipe 2022/03/09 21:24:41.189427 sarama.go:122: Closed connection to broker test-broker-endpoint2:9092 2022/03/09 21:24:41.189436 sarama.go:122: client/brokers deregistered broker #47 at test-broker-endpoint2:9092 2022/03/09 21:24:41.189441 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint:9092 2022/03/09 21:24:41.232417 sarama.go:122: client/brokers registered new broker #47 at test-broker-endpoint2:9092 2022/03/09 21:24:41.232435 sarama.go:122: client/brokers registered new broker #74 at test-broker-endpoint3:9092.compute-1.amazonaws.com:9092 2022/03/09 21:24:41.295667 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint3:9092.compute-1.amazonaws.com:9092 2022/03/09 21:34:40.888928 sarama.go:122: client/metadata fetching metadata for all topics from broker test-kafka-endpoint:9092 2022/03/09 21:34:41.189283 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint4.compute-1.amazonaws.com:9092 2022/03/09 21:34:41.189340 client.go:929: client/metadata got error from broker 48 while fetching metadata: write tcp test-endpoint->test-endpoint: write: broken pipe 2022/03/09 21:34:41.189405 sarama.go:122: Closed connection to broker test-broker-endpoint4.compute-1.amazonaws.com:9092 2022/03/09 21:34:41.189414 sarama.go:122: client/brokers deregistered broker #48 at test-broker-endpoint4.compute-1.amazonaws.com:9092 2022/03/09 21:34:41.189420 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint:9092 2022/03/09 21:34:41.235015 sarama.go:122: client/brokers registered new broker #48 at test-broker-endpoint4.compute-1.amazonaws.com:9092 2022/03/09 21:34:41.295952 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint5.compute-1.amazonaws.com:9092 2022/03/09 21:34:41.296034 client.go:929: client/metadata got error from broker 75 while fetching metadata: write tcp test-endpoint->test-endpoint: write: broken pipe 2022/03/09 21:34:41.296111 sarama.go:122: Closed connection to broker test-broker-endpoint5.compute-1.amazonaws.com:9092 2022/03/09 21:34:41.296129 sarama.go:122: client/brokers deregistered broker #75 at test-broker-endpoint5.compute-1.amazonaws.com:9092 2022/03/09 21:34:41.296143 config.go:559: ClientID is the default of ‘sarama’, you should consider setting it to something application-specific. 2022/03/09 21:34:41.296194 sarama.go:122: client/metadata fetching metadata for all topics from broker test-broker-endpoint:9092 2022/03/09 21:34:41.299010 sarama.go:122: Connected to broker at test-broker-endpoint:9092 (registered as #77) 2022/03/09 21:34:41.346014 sarama.go:122: client/brokers registered new broker #75 at test-broker-endpoint5.compute-1.amazonaws.com:9092 2022/03/09 21:44:40.889425 sarama.go:122: client/metadata fetching metadata for all topics from broker test-kafka-endpoint:9092
Problem Description
We are using SyncProducer to push messages to a destination kafka topic. Sometimes, we see the write tcp host:47562->host:9092: write: broken pipe error in logs which is printed after the timeout. Once this happens, the metadata fetch happens every 10 mins (default), nothing else is happening between the fetches. The producer is never able to send messages after this. The producer is just struck.
On restarting the application, the producer is able to obtain connection again and start sending the messages (Until this happens again!)
Has anybody faced this? Does it have to do with any producer configuration that we missed overriding? The expectation is for the producer to connect back to broker and start sending messages automatically without requiring an app restart.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 18 (4 by maintainers)
issue continues with sarama v1.38.1… write tcp … write: broken pipe
Hi @slaunay @niamster @dnwe , I patched the changes in https://github.com/Shopify/sarama/pull/2182 and I am no longer facing the issue! Are we planning to get this merged and release a new version with the fix anytime soon?
Decided to downgrade to 1.30.1 too as I have stumbled upon same bug.
For now we have reverted back to 1.30.1 until this has been released
@shweta-fourkites I was able to reproduce what looks like #2150 yesterday.
There seem to be a race condition during retries where:
brokerProducer
is shutting down after getting an I/O error and the retry logic is triggeredpartitionProducer
a message with thefin
flag is sent to the “unheathly”brokerProducer
and thatbrokerProducer
is abandoned (closed)brokerProducer
is then createdbrokerProducer
(being closed) fails again then the retry logic is triggered againpartitionProducer
a message with thefin
flag is sent again but this time to the “heathly”brokerProducer
brokerProducer
treat thefin
flag as a regular message and use it in the nextProduceRequest
fin
message is not bubbling up to thepartitionerProducer
that will buffer new message till it sees thatfin
messagefin
message escaped to the target Kafka topic thepartitionProducer
is not producing anymore (just buffering)It’s not clear to me if this was also possible before honouring produce request pipelining.
Can you confirm that when the producer stops producing to that particular topic partition, a Kafka record with a
null
key and value was produced in the target Kafka topic?I was going to submit a simple test case to reproduce it with a fix but there seem to be a fix already in #2182.
@dnwe Should we add another
retract
directive forv1.32.0
referencing #2150 (and marking that issue as fixed) whenv1.32.1
is released and if #2182 gets merged?