protoactor-dotnet: Bug? Same message is received over and over
We have a strange issue where under load over 200K RPS one of grain instances starts receiving the same request over and over. I’m sure that we don’t send this request multiple times. And of course responses from such repeated requests never reach original request sender. We run everything on the same machine without remoting. The issue appears usually after few minutes of load when I use Release configuration and run without debugging (but later I attach after the issue reproduces).
Can you suggest where to start debugging it?
<PackageReference Include="Proto.Actor" Version="1.1.0" />
<PackageReference Include="Proto.Cluster" Version="1.1.0" />
<PackageReference Include="Proto.Cluster.CodeGen" Version="1.1.0" />
<PackageReference Include="Proto.Cluster.Consul" Version="1.1.0" />
<PackageReference Include="Proto.Cluster.Dashboard" Version="1.1.0" />
<PackageReference Include="Proto.Cluster.Kubernetes" Version="1.1.0" />
<PackageReference Include="Proto.Cluster.TestProvider" Version="1.1.0" />
<PackageReference Include="Proto.OpenTelemetry" Version="1.1.0" />
<PackageReference Include="Proto.OpenTracing" Version="0.27.0" />
<PackageReference Include="Proto.Persistence" Version="1.1.0" />
<PackageReference Include="Proto.Remote" Version="1.1.0" />
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 15 (8 by maintainers)
I’m running the example right now and the first thing that comes to mind is that you are probably queueing up a lot of fire and forget tasks on the threadpool
The increasing latency might be that the threadpool is busy with other tasks. e.g.
Eventually, the entire threadpool queue might be filled with this kind of tasks.
I’ll dig deeper later today, but the increasing latency is very suspicious.