pulsar-dotpulsar: Breaking: Consumer always faults randomly on high throughput with ReadOnlySequence exceptions

Let me start by saying that IMHO the project looks great, uses most of the best and latest features of .NET Core, and code looks nice. So congrats @blankensteiner and all other contributors, on a most excellent start!

Under high throughput the Consumer always faults randomly (sometimes sooner sometimes later) but always 100% of the times and while trying to read a ReadOnlySequence<T> (MacOs + SDK 3.1.101)

System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'start')
   at DotPulsar.Internal.Extensions.ReadOnlySequenceExtensions.StartsWith[T](ReadOnlySequence`1 sequence, ReadOnlyMemory`1 target)

While stress testing I have seen it happen in these areas of the code but only when consuming:

It is reproducible by simply using the Samples solution, running the Producer to produce a large number of messages - say 50K, and starting the Consumer.

I tried to fix everything by attempting several solutions without success:

  • Using SequenceReader<byte>
  • Using sequence.TryGet(ref position, out var memory) in a while loop

Researching .Net issues I did find a lot of potentially related issues, some already fixed, but not available until the next releases of the framework:

Potential solutions:

  • Retrofit these fixes until it works and release a new version.
  • Don’t use Pipes and go old school.
  • Use Pipes but without exposing the ReadOnlySequence but instead streams, spans, memory and arrays, whatever is necessary.

I would happily contribute with a fix if I can indeed fix it, because this is a show stopper for me, my team and our new solution design.

Currently I’m going deeper and up the call stack (PulsarStream, Connector,ConsumeChannel, etc…) trying to figure out if are corrupting the memory or have a memory leak.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 5
  • Comments: 15 (15 by maintainers)

Most upvoted comments

I’ll try to get a PR in later today for you to check it out.

I think we’re seeing exactly the same on our end, I’m trying to get a reliable(ish) reproduction ready, with access to a Pulsar instance. In contact with @blankensteiner about this as well. Your finds have been super useful. Do you reckon these fixes in the patch release could be it @RagingKore ? Seems a 3.1.2 release got out that I am not using yet (https://dotnet.microsoft.com/download/dotnet-core)

yes indeed. in an hour tops.

Great stuff! The docker-compose simply runs the pulsar standalone + pulsar express. Makes it easier to test and develop. Maybe getting a CI pipeline to runs tests and publish packages would be beneficial, but it can be done later.

Out of office now, but I did a quick stress test an hour ago, and that code change seems to have fixed the issue!

Awesome!

If you allow me to do the PR I would even add some integration tests + docker-compose to cover this bug.

That would be great!
Before doing too much work in regards to docker-compose, let’s align. Could you describe what you envision? 😃

PS- I feel kind of dumb for going through all the code flow and then this simple line of code fixes it. At least I had an excuse to learn the code base and now I can even suggest some changes here and there 😃

It was your detailed bug report that enabled me to quickly find a solution 😃
Suggestions are always welcome!

One last thing, by supporting netcoreapp3.1 we get to explore other possibilities, such as building the Pulsar Binary Protocol on top of Bedrock by David Fowler,

Perhaps it would be better to discuss this suggestion in a new issue?

It sounds like something we should look into and yes, creating another issue for it would be best 😃

I was actually going to challenge the idea of supporting net standard 2.0 and 2.1.

At Danske Commodities we need .NET Standard 2.0 support (we have some old applications running .NET Framework 4.8, that either can’t be moved to .NET Core or isn’t worth the effort of moving) and I also know at least one other user that’s using it.
However, supporting .NET Standard 2.0 is very easy, it’s just some conditional NuGet dependencies and some code in Awaitor.cs and PulsarStream.cs.

It also makes sense to support .NET Standard 2.1 for runtimes (other than .NET Core) that supports it. Again, the task of doing it is just some conditional NuGet dependencies.

I agree that adding .Net Core as a target framework, without NuGet dependencies, is a good idea 😃

I was able to reproduce the problem. Updating from 3.1.1 to 3.1.2 didn’t help.

I found a quick fix though. Definitely don’t like it, but unless we find a better option we can use it until Microsoft fixes it.

The fix is easy. Just change PulsarStream.cs line 136 to

yield return new ReadOnlySequence<byte>(buffer.Slice(4, frameSize).ToArray());

Can you confirm that this also fix the problem for you?

Hi @RagingKore

First, thanks for the kind words! I’m glad you appreciate the work 😃

Also, thank you for the detailed bug report, it’s most helpful! I’ll drive into that and see what I can do.

Help debugging and PRs are always most welcome. Btw, have you tried with the code in master or are your using the latest released version?

/db