stan.net: Cannot re-connect to server after nats-streaming-server is restarted
I’ve encountered the following issue while trying to reconnect to nats-streaming-server. Here are the steps to reproduce:
- nats-streaming-server is running
- the app (see demo code below) connects to nats-streaming-server and starts publishing messages
- the nats-streaming-server is killed
- on each subsequent Publish,
STAN.Client.StanConnectionException: The NATS connection is reconnecting
is printed - after a while (server still not up), on each Publish, the app starts to receive
STAN.Client.StanConnectionClosedException: Connection closed.
- I restart nats-streaming-server
- The app’s ReconnectedEventHandler is fired, detecting nats connection restore.
- But the app’s StanConnection never recovers and the application can never resume successful publishing.
What can I do in order to allow the app to recover publishing?
Running on OSX Catalina and using latest nuget packages available at the current time:
<ItemGroup>
<PackageReference Include="NATS.Client" Version="0.10.0" />
<PackageReference Include="STAN.Client" Version="0.2.0" />
</ItemGroup>
Code used for testing:
using System;
using System.Diagnostics;
using System.Threading;
using NATS.Client;
using STAN.Client;
namespace StanTest
{
class Program
{
static void Main(string[] args)
{
var cf = new ConnectionFactory();
var sf = new StanConnectionFactory();
var natsConnection = cf.CreateConnection(GetOpts());
var stanOpts = StanOptions.GetDefaultOptions();
stanOpts.ConnectTimeout = 4000;
stanOpts.NatsConn = natsConnection;
stanOpts.PubAckWait = 40000;
var stanConnection = sf.CreateConnection("test-cluster", "uniq123", stanOpts);
var watch = Stopwatch.StartNew();
while (true)
{
try
{
stanConnection.Publish("test", new byte[0x1]);
Console.WriteLine("{0}. Published message", watch.Elapsed);
}
catch (Exception e)
{
Console.WriteLine("{0} - Type:{1}. On Publish: exception message: {2}", watch.Elapsed, e.GetType(), e.Message);
}
finally
{
Thread.Sleep(1000);
}
}
}
private static Options GetOpts()
{
var opts = ConnectionFactory.GetDefaultOptions();
opts.Url = "nats://localhost:4222";
opts.AllowReconnect = true;
opts.PingInterval = 5000;
opts.MaxPingsOut = 2;
opts.MaxReconnect = Options.ReconnectForever;
opts.ReconnectWait = 1000;
opts.Timeout = 4000;
opts.ServerDiscoveredEventHandler += (sender, args) => Console.WriteLine("NATS server discovered");
opts.ReconnectedEventHandler +=
(sender, args) => Console.WriteLine( "NATS server reconnected.");
opts.ClosedEventHandler +=
(sender, args) => Console.WriteLine("NATS connection closed");
opts.DisconnectedEventHandler += (sender, args) =>
Console.WriteLine("NATS connection disconnected");
opts.AsyncErrorEventHandler +=
(sender, args) => Console.WriteLine("NATS async error: {0}, Message={1}, Subject={2}", args.Conn.ConnectedUrl,
args.Error, args.Subscription.Subject);
return opts;
}
}
}
And here are the messages printed to console while running:
/usr/local/share/dotnet/dotnet /Users/robert/Sandbox/StanTest/bin/Debug/netcoreapp3.0/StanTest.dll
00:00:00.0237181. Published message
00:00:01.0443621. Published message
NATS connection disconnected
00:00:02.0483280 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:03.0528216 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:04.0580304 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:05.0612458 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:06.0655462 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:07.0696808 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:08.0721798 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:09.0763581 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:10.0803322 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:11.0852381 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:12.0898777 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:13.0904445 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:14.0946184 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:15.0953681 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:16.0963306 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:17.1017706 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:18.1047256 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:19.1077148 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:20.1114949 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:21.1162681 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:22.1206395 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:23.1238276 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
NATS server reconnected.
00:00:24.1287867 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:25.1324792 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:26.1373635 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:27.1406061 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:28.1452916 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:29.1501845 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
^C
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 30 (13 by maintainers)
I have encountered the same situation in version 0.3.0. when my Stan connection entered to Connection Closed Mode it never comes to connect state and all publish requests throw the Connection Close Exception.
this is my code to register stan Connection:
and I use this connection like this :
Many thanks for your time spent on this issue! Is kind of odd and I can never imagine a practical situation in which I would not like that a STAN client to recover when ping failed but the enclosed NATS connection is alive. I was led by the impression that STAN connection should be long lived (and act as a singleton) similar to NATS connection object. In my prod app, I was registering the STAN connection as a singleton with the DI container. So, to conclude: you are telling me that:
Either:
Could you reconsider the architectural decision of not recovering STAN connection in case that NATS streaming server is running in embedded mode and STAN client using a not self-managed NATS connection and make somehow transparent for the STAN client how the NATS server is ran?
I will try to reproduce this test using the java streaming server client and see if the behaviour is similar.