npgsql: Connection is closed after command timeout
I’m trying to get the code from #400 working again in Npgsql 3 (dev branch) and it now fails really badly: after the first timeout the connection is closed, so I cannot rollback to the savepoint and continue! Note that I haven’t modified the connection timeout - it’s still the default 15 seconds (though I don’t think it would even matter once the connection is open).
Code:
private static void TestTimeoutAndRetryIssue(NpgsqlConnection connection)
{
using (var transaction = connection.BeginTransaction())
{
using (var command = new NpgsqlCommand("SELECT :value FROM pg_sleep(:seconds)", connection))
{
command.Parameters.AddWithValue(":value", "Initial");
command.Parameters.AddWithValue(":seconds", 5);
var result = ExecuteScalarWithRetry(transaction, command, 5, 1);
}
}
}
private static object ExecuteScalarWithRetry(NpgsqlTransaction transaction, NpgsqlCommand command, int delaySec, int initialTimeoutSec)
{
transaction.Save("ExecuteScalarWithRetry");
command.CommandTimeout = initialTimeoutSec;
command.Parameters[":value"].Value = "Attempt 1";
try
{
return command.ExecuteScalar();
}
catch (NpgsqlException ex)
{
Console.WriteLine("Exception in ExecuteScalarWithRetry: " + ex);
Console.WriteLine("Connection state = " + command.Connection.State); // Closed!
}
}
Exception:
Npgsql.NpgsqlException (0x80004005): Exception while reading from stream —> System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. —> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags) at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) — End of inner exception stack trace — at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) at Npgsql.ReadBuffer.Ensure(Int32 count, Boolean dontBreakOnTimeouts) in E:\Source\npgsql\src\Npgsql\ReadBuffer.cs:line 115 at Npgsql.ReadBuffer.Ensure(Int32 count, Boolean dontBreakOnTimeouts) in E:\Source\npgsql\src\Npgsql\ReadBuffer.cs:line 133 at Npgsql.NpgsqlConnector.DoReadMessage(DataRowLoadingMode dataRowLoadingMode, Boolean isPrependedMessage) in E:\Source\npgsql\src\Npgsql\NpgsqlConnector.cs:line 884 at Npgsql.NpgsqlConnector.ReadMessageWithPrepended(DataRowLoadingMode dataRowLoadingMode) in E:\Source\npgsql\src\Npgsql\NpgsqlConnector.cs:line 856 at Npgsql.NpgsqlConnector.ReadMessage(DataRowLoadingMode dataRowLoadingMode) in E:\Source\npgsql\src\Npgsql\NpgsqlConnector.cs:line 836 at Npgsql.NpgsqlConnector.ReadExpectingT in E:\Source\npgsql\src\Npgsql\NpgsqlConnector.cs:line 1111 at Npgsql.NpgsqlDataReader.NextResultInternal() in E:\Source\npgsql\src\Npgsql\NpgsqlDataReader.cs:line 409 at Npgsql.NpgsqlDataReader.NextResult() in E:\Source\npgsql\src\Npgsql\NpgsqlDataReader.cs:line 320 at Npgsql.NpgsqlCommand.<Execute>d__81.MoveNext() in E:\Source\npgsql\src\Npgsql\NpgsqlCommand.cs:line 769 — End of stack trace from previous location where exception was thrown — at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter
1.GetResult() at Npgsql.NpgsqlCommand.<ExecuteScalar>d__98.MoveNext() in E:\Source\npgsql\src\Npgsql\NpgsqlCommand.cs:line 1066 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter1.GetResult() at Npgsql.NpgsqlCommand.ExecuteScalar() in E:\Source\npgsql\src\Npgsql\NpgsqlCommand.cs:line 1041 at NpgsqlTest.Program.ExecuteScalarWithRetry(NpgsqlTransaction transaction, NpgsqlCommand command, Int32 delaySec, Int32 initialTimeoutSec)
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 15 (9 by maintainers)
Wow, I can feel your pain. For cases where the server is overloaded, it may make sense to set timeout to a high value (or even disable timeouts altogether) - if the server is loaded you may as well wait for the query to complete, no matter how long it takes, rather than cancelling and rerunning it (which offers no advantage and actually repeats a lot of work).
However, for a query to take 5 minutes is probably something else. One frequent cause for this is some other connection holding a long-lived transaction which already updated a row you’re trying to update. PostgreSQL locks rows when they are modified in a transaction, so your 5-minute query is possibly waiting for the other transaction to complete. This would also explains why you can’t reproduce it - it’s a true concurrency issue. Long-lived transactions are in general a big problem, especially if you have many of them touching the same data.
While I’d really recommend attempting to solve the root cause, I do appreciate that it can be hard to track down issues like this, and hacky workarounds like timeout+retry are sometimes necessary.
If you do end up setting
statement_timeoutyourself, keep in mind that you have to reset it back after the query - it’s a session parameter that persists until changed. When Npgsql managedstatement_timeoutfor you, it would check if the timeout change is necessary on each command.I agree.
Yes, query cancellation is a best-effort asynchronous operation, which can cancel the current query at any point in time (or even not at all), so Npgsql must be prepared to consume any results from the cancelled query. Detect the end of the results of the cancelled query isn’t hard - every executed command is ultimately terminated by a Sync message, which produces a ReadyForQuery response which signals the end of the query. We’d have to track how many cancelled queries are still in-flight, and read that many ReadyForQuery responses. There are some complications but it’s doable.
Unfortunately that isn’t possible. In case of an actual network problem, this would make Npgsql block forever on the remaining results, defeating the whole purpose of the timeout. Whatever we do, an exception must be raised when the timeout occurs.
The larger issue here is that there are two very different scenarios being handled via the same timeout concept:
statement_timeoutcan be used to deal with this.To summarize, the solution here would be:
which means that a single NpgsqlCommand may produce multiple ReadyForQuerysa single NpgsqlCommand may produce multiple CommandComplete’s, but always just one ReadyForQuery.I’m not sure I’m going to have time to implement this in the very near future, there are many other things going on at the moment. But I’ll make an effort.
Better (really) late than never 🤣
Wow, fixed almost 4 years later! Thank you.
That’s (mostly) right. Since 3.1.0, CommandTimeout is a client-side socket timeout only - Npgsql will simply throw an exception an close the socket, but this doesn’t affect the query running at the server. This issue is about changing the timeout behavior to send a server cancellation when the timeout occurs, but that’s not before 3.3.
In the meantime, it’s definitely possible to use
statement_timeoutto have PostgreSQL abort your long-running query as your say. If you go down this route you may want to disable the client-side socket timeout by settings CommandTimeout to 0, or at least set it to something longer than whateverstatement_timeoutis set to. Note that the client-side socket timeout also protects you against network drops which may leave you reading forever, whichstatement_timeoutdoesn’t do.The only inaccurate thing in what you say is the last sentence… Npgsql’s CommandTimeout (30 by default) does cause your a read timeout at the client socket level, but this doesn’t (necessarily) affect your query on the server side.
Hope this clarifies things, otherwise feel free to ask for more details.