pulsar: [Bug] Client with shared subscription is blocked
Search before asking
- I searched in the issues and found nothing similar.
Version
Client - 3.1.0 Pulsar - 3.1.0 (and later builds)
Also reported on 3.0.1
Minimal reproduce step
My reproducible steps:
- Create persistent topic with 3 partitions
- Publish 1 mln messages (30KB)
- Run the client and consumer:
PulsarClient client = PulsarClient.builder()
.serviceUrl(this.pulsarBrokerUrl)
.build();
Consumer consumer = client.newConsumer()
.topic(sourceTopic)
.subscriptionInitialPosition(SubscriptionInitialPosition.Earliest)
.subscriptionName(subscriptionName)
.subscriptionType(SubscriptionType.Shared)
.receiverQueueSize(8)
.ackTimeout(5, TimeUnit.SECONDS)
.subscribe();
What did you expect to see?
All messages are received
What did you see instead?
Client stops to receive messages, restart client helps, but it get stuck after some time.
Anything else?
The issue was originally created described here: #21082 @MichalKoziorowski-TomTom also faces the issue.
I’ve created new issue, because it in #21082 the author says that broker restart helps. In case of this issue, it looks like it’s client related and some race condition observed in 3.x.x. after introducing ackTimeout
Are you willing to submit a PR?
- I’m willing to submit a PR!
About this issue
- Original URL
- State: open
- Created 10 months ago
- Reactions: 1
- Comments: 19 (6 by maintainers)
It looks like I was able to reproduce the issue in the two runs today (failed 2/2).
The code is here: https://github.com/michalcukierman/pulsar-21104
In general it’s very much like in the bug description. Produce 1 mln messages of 30kb:
Read it using client with shared subscription and write to another topic:
The settings of the client are:
The retention of the topic
requests
is set using Pulsar Admin in Java to -1 -1.During two runs the consumer get stucked:
@mattisonchao I’ll have a time next week to get back to it.
I think it happens:
There may be a race condition in 3.1.0 client, as the situation was not observed with 2.10.4 (we’ve downgraded, also @MichalKoziorowski-TomTom reported this as a fix).