npgsql: Idle connection pruning kills too many connections.

Steps to reproduce

Run the test program at the bottom of this post.

The issue

The idle connection pruning kills connections that have recently been busy, causing performance issues because we need to open new (non-pooled) connections. I created a test program that runs simple queries (select null), possible in parallel with multiple connections, and tracks how many pooled and non-pooled connections were used. With a provider based on the current hotfix/4.1.3 (f615edb5229a34a29e2a8cba97a1b482d1f59370), this produces the following output:

ConnectionPruningInterval: 2
ConnectionIdleLifetime: 10

Running simple test case with 1/1/1 connections.
Clearing all connection pools.
Running 1 parallel queries... pooled: 0/1
Waiting for 8 seconds.
Running 1 parallel queries... pooled: 1/1
Waiting for 4 seconds.
Running 1 parallel queries... pooled: 0/1

Running simple test case with 2/1/2 connections.
Clearing all connection pools.
Running 2 parallel queries... pooled: 0/2
Waiting for 8 seconds.
Running 1 parallel queries... pooled: 1/1
Waiting for 4 seconds.
Running 2 parallel queries... pooled: 0/2

Running simple test case with 1/2/2 connections.
Clearing all connection pools.
Running 1 parallel queries... pooled: 0/1
Waiting for 8 seconds.
Running 2 parallel queries... pooled: 1/2
Waiting for 4 seconds.
Running 2 parallel queries... pooled: 1/2

As you can see, in the first two tests (1/1/1 and 2/1/2) the provider prunes all connections between the 8th second and the 12th second (probably at the 10th second), even though one of the connections has been used at the 8th second. In the third case it prunes one of the connections, even though both have been used at the 8th second.

Further technical details

Npgsql version: Locally built package at git hash f615edb5229a34a29e2a8cba97a1b482d1f59370. PostgreSQL version: PostgreSQL 11.4 Operating system: Server: Alpine Linux docker image / Client: Windows 10

The code of the test progran:

using Npgsql;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

namespace NugetConnectionTimeouts
{
    public class Program
    {
        private const string Username = "REPLACE_ME";
        private const string Password = "REPLACE_ME";
        private const string Host =     "REPLACE_ME";
        private const string Database = "REPLACE_ME";

        private const int ConnectionPruningInterval = 2;
        private const int ConnectionIdleLifetimeFactor = 5;
        private const int ConnectionIdleLifetime = ConnectionPruningInterval * ConnectionIdleLifetimeFactor;

        public static void Main(string[] args)
        {
            RunTestCasesAsync().GetAwaiter().GetResult();
        }

        public static async Task RunTestCasesAsync()
        {
            Console.WriteLine($"ConnectionPruningInterval: {ConnectionPruningInterval}\nConnectionIdleLifetime: {ConnectionIdleLifetime}");
            await RunSimpleTestCaseAsync(1, 1, 1);
            await RunSimpleTestCaseAsync(2, 1, 2);
            await RunSimpleTestCaseAsync(1, 2, 2);
        }

        private static async Task RunSimpleTestCaseAsync(int initialCount, int intermediateCount, int finalCount)
        {
            Console.WriteLine();
            Console.WriteLine($"Running simple test case with {initialCount}/{intermediateCount}/{finalCount} connections.");
            Console.WriteLine("Clearing all connection pools.");
            NpgsqlConnection.ClearAllPools();
            await RunTestQueriesAsync(initialCount);
            await WaithThenRunTestQueriesAsync(ConnectionPruningInterval * (ConnectionIdleLifetimeFactor - 1), intermediateCount);
            await WaithThenRunTestQueriesAsync(ConnectionPruningInterval * 2, finalCount);
        }

        private static async Task WaithThenRunTestQueriesAsync(int secondsToWait, int connectionCount)
        {
            Console.WriteLine($"Waiting for {secondsToWait} seconds.");
            await Task.Delay(TimeSpan.FromSeconds(secondsToWait));
            await RunTestQueriesAsync(connectionCount);
        }

        private static async Task RunTestQueriesAsync(int connectionCount)
        {
            Console.Write($"Running {connectionCount} parallel queries...");
            string connectionString = new NpgsqlConnectionStringBuilder
            {
                Username = Username,
                Host = Host,
                Database = Database,
                ConnectionPruningInterval = ConnectionPruningInterval,
                ConnectionIdleLifetime = ConnectionIdleLifetime
            }.ConnectionString;
            int pooledConnectionCount = connectionCount;
            ProvidePasswordCallback callback = (string host, int port, string database, string username) =>
            {
                Interlocked.Decrement(ref pooledConnectionCount);
                return Password;
            };
            var connections = new List<NpgsqlConnection>();
            try
            {
                for (int i = 0; i < connectionCount; ++i)
                {
                    var connection = new NpgsqlConnection(connectionString);
                    connection.ProvidePasswordCallback = callback;
                    connections.Add(connection);
                }
                await Task.WhenAll(connections.Select(x => x.OpenAsync()));
                await Task.WhenAll(connections.Select(async x =>
                {
                    using (var command = x.CreateCommand())
                    {
                        command.CommandText = "select null";
                        await command.ExecuteScalarAsync();
                    }
                }));
            }
            finally
            {
                foreach (var connection in connections)
                {
                    await connection.DisposeAsync();
                }
            }
            Console.WriteLine($" pooled: {pooledConnectionCount}/{connectionCount}");
        }
    }
}

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 1
  • Comments: 21 (11 by maintainers)

Most upvoted comments

@Toxantron Npgsql 5.0 will be released at the same time as .NET 5.0, which means November 10th (two weeks away!). This is why I’m pushing out issues out of the release 😃 I’ve also updated the milestone with the date, thanks.

However, nothing is stopping us from making a 5.1 release at some point after 5.0, without waiting for .NET 6.0.

@roji I don’t have a concrete proposal yet. I agree with your points, especially that #2929 could be generally useful. That feature would probably be useful for me even if if the pool was fully optimized for my “sparse usage” case, as it will improve the behavior when load suddenly increases.

@Kharos do you have a concrete proposal in mind for changing the pruning algorithm? We chose a sampling-based approach because we didn’t want pruning to impact the hot paths of opening and closing pooled connections. That indeed will always have the disadvantage of missing activity that happens between the samples (like any sampling-based approach), but the alternative seems like it would slow down everyone for an issue that doesn’t seem to affect many.

As you wrote, it’s unlikely for a single pruning algorithm to fit all needs. #2929 does seem like it could be helpful in the general case, and that (with possibly tweaked MinPoolSize and MaxPoolSize) may provide a decent solution.

I’m also not against publicly exposing the necessary APIs to allow people to do their own pooling, although to do this properly we’d need to do quite a bit of design work, I think.

In any case, since 5.0 is coming up very soon and we don’t have a clear idea of what we want to do here, I’m going to move this out to 6.0, at least for now. Of course we can continue the discussion.

I am using the default of 10, yes. I can play with that parameter as well when I revisit it. As I wrote above it is “not nice” but in the greater picture it is not a problem, we have bigger fish to fry if we want to optimize our code.