npgsql: NullReferenceException in NpgsqlConnector.PerformUserCancellation()

@vonzshik

We are observing an abnormal behavior in NpgSql 5.0.0 to 5.0.2 that doesn’t happen in earlier versions like 4.1.5:

Intermittent NullReferenceException or (less likely) ObjectDisposedException in NpgsqlConnector.PerformUserCancellation() when cancelling synchronous operation on a command instance that is registered the for the cancellation as cancellationToken.Register(command.Cancel), then the parent cancellationTokenSource is cancelled from a background timer thread.

Could be due to “dirty” coding pattern like undisposed command object, etc. per details below.

Here is the voodoo that makes the issue to happen reliably:

  • create a database “A” first, execute random successful and failure statements against it, drop “A”.
  • create a database “B” and run the following:
    const string LongRunningQuery = "SELECT * FROM mytable, pg_sleep(10);";

    using var cancellationTokenSource = new CancellationTokenSource();
    var token = cancellationTokenSource.Token;

    // Cancel on a 1s timer event
    cancellationTokenSource.CancelAfter(1000);

    using var conn = new NpgsqlConnection($"Server=127.0.0.1;User ID={user};Password={password};Database={databaseName};Timeout=30;CommandTimeout=1200;MaxPoolSize=500;MinPoolSize=0;Application Name=Test;");
    conn.Open();
    var command = conn.CreateCommand();
    command.CommandText = LongRunningQuery;

    void Cancel()
    {
        try
        {
            command.Cancel();
        }
        catch (Exception ex)
        {
            throw new Exception("Boom!", ex);
        }
    }
    var registration = token.Register(Cancel);

    try
    {
        // This line is blocked for 10 seconds max. It is in this blocked state 
        // when cancellation is triggered after about 1 second. 
	var reader = command.ExecuteReader();
        
        // While it is incorrect to place these outside of `finally` block,
        // we can't be blamed because we don't expect a race condition 
        // to be a necessary result of not disposing when `ExecuteReader`
        // is interrupted by `command.Cancel()`.
        registration.Dispose();
        reader.Dispose();
        command.Dispose();
    }
    catch (OperationCanceledException)
    {
    }
    catch (PostgresException ex) when (string.Equals(ex.SqlState, "57014", StringComparison.Ordinal))
    {
    }

I hope the Exception call stacks will talk for themselves:

Unhandled Exceptions:System.Exception: Test
 ---> System.NullReferenceException: Object reference not set to an instance of an object.
   at Npgsql.NpgsqlConnector.PerformUserCancellation()
   at Npgsql.NpgsqlCommand.Cancel()
   at SqlDatabaseTest.<>c__DisplayClass1_0.<CancellationTest>g__Cancel|3()
   --- End of inner exception stack trace ---
   at SqlDatabaseTest.<>c__DisplayClass1_0.<CancellationTest>g__Cancel|3()
   at System.Threading.CancellationToken.<>c.<.cctor>b__26_0(Object obj)
   at System.Threading.CancellationTokenSource.CallbackNode.<>c.<ExecuteCallback>b__9_0(Object s)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.CancellationTokenSource.CallbackNode.ExecuteCallback()
   at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException)

Or

Unhandled Exceptions:System.Exception: Test
 ---> Exceptions:System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'System.Net.Sockets.Socket'.
   at System.Net.Sockets.Socket.ThrowObjectDisposedException()
   at System.Net.Sockets.Socket.SetSocketOption(SocketOptionLevel optionLevel, SocketOptionName optionName, Int32 optionValue)
   at Npgsql.NpgsqlReadBuffer.set_Timeout(TimeSpan value)
   at Npgsql.NpgsqlConnector.PerformUserCancellation()
   at Npgsql.NpgsqlCommand.Cancel()
   at SqlDatabaseTest.<>c__DisplayClass1_0.<CancellationTest>g__Cancel|3()
   --- End of inner exception stack trace ---
   at SqlDatabaseTest.<>c__DisplayClass1_0.<CancellationTest>g__Cancel|3()
   at System.Threading.CancellationToken.<>c.<.cctor>b__26_0(Object obj)
   at System.Threading.CancellationTokenSource.CallbackNode.<>c.<ExecuteCallback>b__9_0(Object s)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.CancellationTokenSource.CallbackNode.ExecuteCallback()
   at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException)

The stacks almost surely put the problem into the hands of NpgsqlConnector.Cleanup() method

            ReadBuffer?.Dispose();
            ReadBuffer = null;

Note 1

The issue could be related to “forgetting” to dispose the product of token.Register(command.Cancel) and command object. When these are disposed the issue goes away (or less likely to happen).

Note 2

Randomizing “Application Name” connection parameter makes the issue to disappear. Makes feel like the connection pool may have something to do with this.

Further technical details

Npgsql version: 5.0.0 - 5.0.2 PostgreSQL version: v. 11 Operating systems:

  • Google GCE’s image-family: “ubuntu-minimal-1804-lts” + Google n1-standard-8 (8 vCPUs, 30 GB RAM )
  • Windows 10 (4 physical CPUs, 16 GB RAM)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 19 (17 by maintainers)

Commits related to this issue

Most upvoted comments

@erwan-joly hopefully today, right now @roji is dealing with #3478

Thanks, I confirm the bug seems to be fixed on 5.0.3