protoactor-dotnet: Bug? Same message is received over and over

We have a strange issue where under load over 200K RPS one of grain instances starts receiving the same request over and over. I’m sure that we don’t send this request multiple times. And of course responses from such repeated requests never reach original request sender. We run everything on the same machine without remoting. The issue appears usually after few minutes of load when I use Release configuration and run without debugging (but later I attach after the issue reproduces).

Can you suggest where to start debugging it?

    <PackageReference Include="Proto.Actor" Version="1.1.0" />
    <PackageReference Include="Proto.Cluster" Version="1.1.0" />
    <PackageReference Include="Proto.Cluster.CodeGen" Version="1.1.0" />
    <PackageReference Include="Proto.Cluster.Consul" Version="1.1.0" />
    <PackageReference Include="Proto.Cluster.Dashboard" Version="1.1.0" />
    <PackageReference Include="Proto.Cluster.Kubernetes" Version="1.1.0" />
    <PackageReference Include="Proto.Cluster.TestProvider" Version="1.1.0" />
    <PackageReference Include="Proto.OpenTelemetry" Version="1.1.0" />
    <PackageReference Include="Proto.OpenTracing" Version="0.27.0" />
    <PackageReference Include="Proto.Persistence" Version="1.1.0" />
    <PackageReference Include="Proto.Remote" Version="1.1.0" />

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

I’m running the example right now and the first thing that comes to mind is that you are probably queueing up a lot of fire and forget tasks on the threadpool

.5987 RPS, 99% latency 17,61 ms, 95% latency 9,39 ms, max latency 167,61 ms
...60692 RPS, 99% latency 15,4 ms, 95% latency 6,51 ms, max latency 610,77 ms
...44698 RPS, 99% latency 20,9 ms, 95% latency 9,33 ms, max latency 745,69 ms
..35911 RPS, 99% latency 28,62 ms, 95% latency 11,54 ms, max latency 725,73 ms
.27488 RPS, 99% latency 33,26 ms, 95% latency 15,39 ms, max latency 999,47 ms
..31520 RPS, 99% latency 22,41 ms, 95% latency 11,55 ms, max latency 975,2 ms
.19651 RPS, 99% latency 39,24 ms, 95% latency 20,35 ms, max latency 1050,25 ms
.19856 RPS, 99% latency 39,76 ms, 95% latency 17,88 ms, max latency 1366,85 ms

The increasing latency might be that the threadpool is busy with other tasks. e.g.

omsGrain.ProccedExecutionReport(omsRequest, CancellationToken.None).AndForget(TaskOption.Safe);

Eventually, the entire threadpool queue might be filled with this kind of tasks.

I’ll dig deeper later today, but the increasing latency is very suspicious.