orleans: Streams do not continue working after silo failover

Hi,

I have a test project with a single grain that receives a number and returns the previous number via a stream while also persisting it to the grain’s state (backed by SQL Server). The client connects to a silo via the SystemStore (again SQL Server).

The issue I am having is with failover of the silos when running two silos (A & B) on two machines (one is a VM). When the client first connects the stream works as expected and data is returned to the client. When I shutdown the silo with the active grain (silo A) it will normally successfully failover to silo B, however, on restarting silo A and shutting down silo B the grains activate ok as silo A receives the messages from the client but the stream back to the client is not re-established and the client never receives data back until I restart everything. I have tested this using the observer pattern as well and I get the same result. Could someone point me in the right direction of what I am doing wrong? Config and code below.

Thank you in advance, Rich

Server config <SystemStore SystemStoreType ="SqlServer" DeploymentId="SimpleOrleans" DataConnectionString="Data Source=server;Initial Catalog=Orleans;User Id=user;Password=password" AdoInvariant="System.Data.SqlClient" /> <Liveness LivenessType="SqlServer" /> <StorageProviders> <Provider Type="Orleans.StorageProviders.SimpleSQLServerStorage.SimpleSQLServerStorage" Name="GrainStore" ConnectionString="Data Source=server;Initial Catalog=Orleans;User Id=user;Password=password" UseJsonFormat="false" /> <Provider Type="Orleans.StorageProviders.SimpleSQLServerStorage.SimpleSQLServerStorage" Name="PubSubStore" ConnectionString="Data Source=server;Initial Catalog=Orleans;User Id=user;Password=password" UseJsonFormat="false" /> </StorageProviders> <StreamProviders> <Provider Type="Orleans.Providers.Streams.SimpleMessageStream.SimpleMessageStreamProvider" Name="SMSProvider" FireAndForgetDelivery="False"/> </StreamProviders>

Client config <ClientConfiguration xmlns="urn:orleans"> <SystemStore SystemStoreType ="SqlServer" DeploymentId="SimpleOrleans" DataConnectionString="Data Source=server;Initial Catalog=Orleans;User Id=user;Password=password" AdoInvariant="System.Data.SqlClient" /> <GatewayProvider ProviderType="SqlServer"/> <StreamProviders> <Provider Type="Orleans.Providers.Streams.SimpleMessageStream.SimpleMessageStreamProvider" Name="SMSProvider" FireAndForgetDelivery="False"/> </StreamProviders> </ClientConfiguration>

Client stream code var streamProvider = GrainClient.GetStreamProvider("SMSProvider"); var stream = streamProvider.GetStream<int>(Guid.Empty, "NumberData"); var handle = stream.SubscribeAsync((number, token) =>{/* this is never called after silo failover */});

Grain stream code // the client does manage to call this after silo failover var streamProvider = this.GetStreamProvider("SMSProvider"); var stream = streamProvider.GetStream<int>(Guid.Empty, "NumberData"); await this.stream.OnNextAsync(number);

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 22 (19 by maintainers)

Most upvoted comments

Orleans Client can be any service in the system that also talks to the grains. If the Orleans cluster fails - many other services don’t need to restart, maybe do some recovery logic regarding the cluster.

I’ve currently implemented a streams subscription manager for the Clients which listens to ClusterConnectionLost and managing the resubscribe when the cluster is available again. Works fine so far 😃

In the docs the streams are presented under Programming Model section:

Following the philosophy of Orleans virtual actors, Orleans streams are virtual. That is, a stream always exists. It is not explicitly created or destroyed, and it can never fail.

When one read this paragraph one expect reliability 😃 (this is why I was surprised)

Anyway, I understand all about priorities, and you guys are doing an amazing job!

A client that lost connection to a cluster can reestablish a connection and continue sending grain calls. Streams however, may not survive. Streaming is built on grain calls, but it’s not the same. For client streams to survive Silo failures within a cluster, the client must maintain connectivity with the cluster. This is why I specified that Silo A must rejoin the cluster before Silo B is killed,

Do you see a warning with errorcode 103409, line “Consumer {0} on stream {1} is no longer active - permanently removing Consumer.” in the logs?

The most common cause of this warning is that a client has become unreachable by the cluster (loses connectivity with the cluster) causing a stream consumer on that client to become unreachable. This error removes that consumer from the system.