npgsql: Recover connection on failure loops

The issue

Npgql falls into failure loops, where even on connection close and reopen, Npgsql can’t recover due to (what I presume) internal (static?) state that’s been compromised.

To give an example, when writing a desktop app that reinitializes any state / connections on failure, I initially wrote this for npgsql

if (_failure)
{
    _conn.Close();
    _conn = new NpgsqlConnection(_connStr);
    await _conn.OpenAsync();
}

// execute prepared statements

When I upgraded the db one day, npgsql stopped working. Npgsql kept requesting to execute a prepared statement, but the server would throw a 26000 error “prepared statement “_p1” does not exist”. Since Npgsql persists prepared statements, “closing and re-opening” the connection did not clear this error. A potential fix in npgsql for this could be to purge prepared statements that the server says doesn’t exist

So my next attempt to reset Npgsql became:

if (_failure)
{
    _conn.Close();
    _conn = new NpgsqlConnection(_connStr);
    await _conn.OpenAsync();
    _conn.UnprepareAll();
}

However this new solution more often of not will fall into a failure loop when the client’s desktop is woken from sleep. The first couple exceptions are about not being able to reach the database (that’s fine, sometimes the network is slow to reestablish on wake), but then the same exception is thrown every time the new connection logic executes. The exception:

 Npgsql.NpgsqlException: Received backend message ReadyForQuery while expecting CommandCompleteMessage. Please file a bug.
   at Npgsql.NpgsqlConnector.<ExecuteInternalCommand>d__204.MoveNext()
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at Npgsql.NpgsqlConnector.ExecuteInternalCommand(String query)
   at Npgsql.NpgsqlConnector.UnprepareAll()
   at Npgsql.NpgsqlConnection.UnprepareAll()

The exact exception differs from time to time but the format remains the same:

 Npgsql.NpgsqlException: Received backend message ParameterDescription while expecting CommandCompleteMessage. Please file a bug.

Steps to reproduce

Putting a computer to sleep and then waking it feels weird to put into a bug report. You can take a look at the whole project or just the one file that deals with writing to postgres, but the application logic is fundamentally in the (contrived) style of:

while (true)
{
    try
    {
        Thread.Sleep(1000);
        if (_failure)
        {
            _conn.Close();
            _conn = new NpgsqlConnection(_connStr);
            await _conn.OpenAsync();
            _conn.UnprepareAll();
        }

        using (var cmd = new NpgsqlCommand("INSERT INTO abc(a) VALUES ('b')", _conn))
        {
            await cmd.PrepareAsync();
            await cmd.ExecuteNonQueryAsync();
        }

        _failure = false;
    }
    catch (Exception)
    {
        _failure = true;
        throw;
    }
}

Further technical details

Npgsql version: 4.0.3 PostgreSQL version: 10.5 Operating system: Windows

What’s the recommended way to make the connection start back at square one?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 19 (11 by maintainers)

Most upvoted comments

@nickbabcock tried to run your repro and got to 100 iterations without any issue, using both Npgsql 4.0.3 and the latest stable 4.0.10.

@austindrenski your repro runs an unbounded number of commands in parallel (since Run isn’t awaited). I assume this was intentional to make things run in parallel, but it eventually triggers pool exhaustion on my machine. Regardless, I got to 4000 commands executed before reaching pool exhaustion, without any other issue. After adding an await - which makes the program completely sequential - everything works perfectly (killed the program at 8000 iterations).

At this point I don’t have any actual repro for this, so I’m going to go ahead and close this issue. The errors above could definitely occur because of a user error where the same connection is accidentally used concurrently from different threads (via timers or otherwise); I’m not saying there definitely isn’t an Npgsql bug here, but I need to see an actual repro to work on this - will reopen if one is provided.

@austindrenski I forgot to mention, I had added cmd.Prepare() in the flawless synchronous version, so cmd.Prepare() is not the culprit. await cmd.PrepareAsync() seems more likely (as including the line will cause the error, and commenting the line out will make the execution flawless)