pulsar-dotpulsar: Breaking: Consumer always faults randomly on high throughput with ReadOnlySequence exceptions
Let me start by saying that IMHO the project looks great, uses most of the best and latest features of .NET Core, and code looks nice. So congrats @blankensteiner and all other contributors, on a most excellent start!
Under high throughput the Consumer always faults randomly (sometimes sooner sometimes later) but always 100% of the times and while trying to read a ReadOnlySequence<T>
(MacOs + SDK 3.1.101)
System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'start')
at DotPulsar.Internal.Extensions.ReadOnlySequenceExtensions.StartsWith[T](ReadOnlySequence`1 sequence, ReadOnlyMemory`1 target)
While stress testing I have seen it happen in these areas of the code but only when consuming:
It is reproducible by simply using the Samples solution, running the Producer to produce a large number of messages - say 50K, and starting the Consumer.
I tried to fix everything by attempting several solutions without success:
- Using
SequenceReader<byte>
- Using
sequence.TryGet(ref position, out var memory)
in a while loop
Researching .Net issues I did find a lot of potentially related issues, some already fixed, but not available until the next releases of the framework:
- Possible race condition in System.IO.Pipelines: InvalidCastException - exactly the same error
- ReadOnlySequence<T> seems to hand out incorrect position - this one also reports failure while stress testing
- PipeReader.CopyToAsync(destination) calls AdvanceTo(default) when destination.WriteAsync throws
- SequenceReader nextPosition fix
Potential solutions:
- Retrofit these fixes until it works and release a new version.
- Don’t use Pipes and go old school.
- Use Pipes but without exposing the
ReadOnlySequence
but instead streams, spans, memory and arrays, whatever is necessary.
I would happily contribute with a fix if I can indeed fix it, because this is a show stopper for me, my team and our new solution design.
Currently I’m going deeper and up the call stack (PulsarStream
, Connector
,ConsumeChannel
, etc…) trying to figure out if are corrupting the memory or have a memory leak.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 5
- Comments: 15 (15 by maintainers)
I’ll try to get a PR in later today for you to check it out.
I think we’re seeing exactly the same on our end, I’m trying to get a reliable(ish) reproduction ready, with access to a Pulsar instance. In contact with @blankensteiner about this as well. Your finds have been super useful. Do you reckon these fixes in the patch release could be it @RagingKore ? Seems a 3.1.2 release got out that I am not using yet (https://dotnet.microsoft.com/download/dotnet-core)
yes indeed. in an hour tops.
Great stuff! The docker-compose simply runs the pulsar standalone + pulsar express. Makes it easier to test and develop. Maybe getting a CI pipeline to runs tests and publish packages would be beneficial, but it can be done later.
Awesome!
That would be great!
Before doing too much work in regards to docker-compose, let’s align. Could you describe what you envision? 😃
It was your detailed bug report that enabled me to quickly find a solution 😃
Suggestions are always welcome!
It sounds like something we should look into and yes, creating another issue for it would be best 😃
At Danske Commodities we need .NET Standard 2.0 support (we have some old applications running .NET Framework 4.8, that either can’t be moved to .NET Core or isn’t worth the effort of moving) and I also know at least one other user that’s using it.
However, supporting .NET Standard 2.0 is very easy, it’s just some conditional NuGet dependencies and some code in Awaitor.cs and PulsarStream.cs.
It also makes sense to support .NET Standard 2.1 for runtimes (other than .NET Core) that supports it. Again, the task of doing it is just some conditional NuGet dependencies.
I agree that adding .Net Core as a target framework, without NuGet dependencies, is a good idea 😃
I was able to reproduce the problem. Updating from 3.1.1 to 3.1.2 didn’t help.
I found a quick fix though. Definitely don’t like it, but unless we find a better option we can use it until Microsoft fixes it.
The fix is easy. Just change PulsarStream.cs line 136 to
Can you confirm that this also fix the problem for you?
Hi @RagingKore
First, thanks for the kind words! I’m glad you appreciate the work 😃
Also, thank you for the detailed bug report, it’s most helpful! I’ll drive into that and see what I can do.
Help debugging and PRs are always most welcome. Btw, have you tried with the code in master or are your using the latest released version?
/db