npgsql: Intermittent "Unknown message code: X" exceptions. affects 5.X and 6.x
We’ve been getting an intermittent exception “Unkown message code: X” where X ranges from 0-255. This was happening before we upgraded Npgsql.EntityFrameworkCore.PostgreSQL to 6.0.3 and is still continuing now. There doesn’t seem to be a pattern as to what causes it, it comes from various queries across our application.
Version info
PostgreSQL 13.4
Multiplexing=true
"Npgsql.EntityFrameworkCore.PostgreSQL" Version="6.0.3"
"Npgsql.EntityFrameworkCore.PostgreSQL.NetTopologySuite" Version="6.0.3"
"Npgsql.EntityFrameworkCore.PostgreSQL.NodaTime" Version="6.0.3"
Example stack trace
Npgsql.NpgsqlException (0x80004005): Unknown message code: 105
at Npgsql.Util.PGUtil.ValidateBackendMessageCode(BackendMessageCode code)
at Npgsql.Internal.NpgsqlConnector.ReadMessage(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications)
at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken)
...(tracing)
at Npgsql.NpgsqlCommand.ExecuteDbDataReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken)
...(tracing)
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.<>c__DisplayClass33_0`2.<<ExecuteAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()
at Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions.ToListAsync[TSource](IQueryable`1 source, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions.ToListAsync[TSource](IQueryable`1 source, CancellationToken cancellationToken)
...(API controller)
at lambda_method4720(Closure , Object )
at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.AwaitableObjectResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Logged|12_1(ControllerActionInvoker invoker)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeInnerFilterAsync>g__Awaited|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeNextExceptionFilterAsync>g__Awaited|26_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
Sample of error codes:

About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 2
- Comments: 21 (9 by maintainers)
Commits related to this issue
- Fixed reusing SslStream after IOException (#4401) Fixes #4305 — committed to npgsql/npgsql by vonzshik 2 years ago
- Fixed reusing SslStream after IOException (#4401) Fixes #4305 (cherry picked from commit 305ed51b5e08d74b81786b83bdeeb8566900f985) — committed to npgsql/npgsql by vonzshik 2 years ago
- Fixed reusing SslStream after IOException (#4401) Fixes #4305 (cherry picked from commit 305ed51b5e08d74b81786b83bdeeb8566900f985) — committed to npgsql/npgsql by vonzshik 2 years ago
- Fixed reusing SslStream after IOException (#4401) Fixes #4305 — committed to kirkbrauer/npgsql by vonzshik 2 years ago
@tomwollert @mintsoft @vonzshik
I’ve had a look at this issue using TomWollert’s reproduction code above and it seems that the issue stems from the reuse of a Sslstream after an IOException has been thrown on the Read() call. Reading the MSDN docs it states that;
I believe the read buffer data TomWollert shows above [23,3,3,1,141,…] Is the garbage data referred to in the MSDN article. This also lines up with the versions mentioned above as the query cancellation code added between versions 4.1.10 to 5.0.0 cancels the ongoing query and attempts to reuse the underlying stream in both Insecure And Secure Connections.
This can be seen in NpgsqlReadBuffer.cs where we continue back around the loop to attempt to read more data from the stream.
Update: I’ve done some more testing around this and it seem that this issue only effects .Net Framework applications, as .Net5 , .Net6 and .Net Core 3 deal with the IOException gracefully on the SslStream, Even thought the MSDN docs have the same message on all these versions of .Net .
@mintsoft @tomwollert sorry guys, was busy with personal stuff.
@AlisterGreg Yeah, you’re absolutely correct - the issue definitely was due to us attempting to save a connection after a timeout. This also explains why I haven’t ever been able to reproduce this error, since it can only reproduced if:
Just some context for the error. It can be only reproduce with Npgsql 5.0 and above because that’s when we’ve introduced cancellation (in a sense that we attempt to cancel a query after a timeout). I can guess that it can’t be reproduced on .NET Core 3 because at that point the runtime started supporting socket async operation cancellation (on .NET Framework you can’t cancel
Socket.RecieveAsyncbecause there is no overload accepting aCancellationToken), and to support the cancellationSslStreamhas been rewritten (otherwise that would have lead to the exact same problem). Graceful handling of aTimeoutExceptionmust have been just a byproduct of this.As a workaround, you can add
CancellationTimeout=-1to a connection string - ~that way, while the long running query wouldn’t be cancelled on the database, you also wouldn’t get theUnknown message codeexception~ nevermind, we always cancel the query no matter theCancellationTimeoutvalue, so that workaround should work perfectly.I’ve submitted a PR fixing this error, hopefully this shouldn’t be a problem anymore.
@tomwollert the exact error code (3, 23, 41, etc) usually don’t matter that much, since the error itself indicates that the protocol sync has been lost between Npgsql and PostgreSQL, and at that point anything is possible. I’m going to try and see whether I can reproduce it on PostgreSQL 10…
@vonzshik That sucks 😕 I’ll try to get a standalone console app that reproduces it. I’m assuming it’s some dependency.
Replicating message code 3:
Though I guess this will also fail to reproduce as the 23 on one triggers the 3 on the other statement.
K, “23” is very straight forward:
This will throw the 23.