npgsql: Race condition when creating composite/enum types while loading / reloading types

Copy of one separated problem from issue https://github.com/npgsql/npgsql/issues/2014

STACK TRACE:

Result StackTrace:	
at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
   at Npgsql.PostgresDatabaseInfo.LoadEnumLabels(DbDataReader reader, Dictionary`2 byOID) in C:\projects\npgsql\src\Npgsql\PostgresDatabaseInfo.cs:line 296
   at Npgsql.PostgresDatabaseInfo.LoadBackendTypes(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async) in C:\projects\npgsql\src\Npgsql\PostgresDatabaseInfo.cs:line 256
   at Npgsql.PostgresDatabaseInfo.LoadPostgresInfo(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async) in C:\projects\npgsql\src\Npgsql\PostgresDatabaseInfo.cs:line 72
   at Npgsql.PostgresDatabaseInfoFactory.Load(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async) in C:\projects\npgsql\src\Npgsql\PostgresDatabaseInfo.cs:line 43
   at Npgsql.NpgsqlDatabaseInfo.Load(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async) in C:\projects\npgsql\src\Npgsql\NpgsqlDatabaseInfo.cs:line 250
   at Npgsql.NpgsqlConnector.LoadDatabaseInfo(NpgsqlTimeout timeout, Boolean async) in C:\projects\npgsql\src\Npgsql\NpgsqlConnector.cs:line 466
   at Npgsql.NpgsqlConnection.ReloadTypes() in C:\projects\npgsql\src\Npgsql\NpgsqlConnection.cs:line 1431
Result Message:	
Test method ... threw exception: 
System.Collections.Generic.KeyNotFoundException: The given key '699916' was not present in the dictionary.

I don’t know how this can happen, maybe concurrent calls of PostgresDatabaseInfo.LoadPostgresInfo can cause this? I have for sure concurrent calls in my testcase if the application initialite with som concurrent Connetion.Open() but how this will cause this kind of error is not clear for me. Maybe related to use in combination with ReloadTypes()

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 18 (17 by maintainers)

Commits related to this issue

Most upvoted comments

Ok, i will give it a try. Yes, transaction read only seems a good idea, i will change the code this way. I can test this later today and i will create a PR afterwards

Thanks @FlorianRainer, merged and backported to 4.0.3.

Serializable transactions don’t cause blocking/deadlocking, as per the docs:

To guarantee true serializability PostgreSQL uses predicate locking, which means that it keeps locks which allow it to determine when a write would have had an impact on the result of a previous read from a concurrent transaction, had it run first. In PostgreSQL these locks do not cause any blocking and therefore can not play any part in causing a deadlock. They are used to identify and flag dependencies among concurrent Serializable transactions which in certain combinations can lead to serialization anomalies.

So I think it should be quite safe. I’d rather keep the C# side simple and logical, unless there’s an actual reason to avoid serializable.

BTW it may be a good idea to make the transaction read only as well (https://www.postgresql.org/docs/current/static/sql-set-transaction.html).

Note: changes title again, it could happen even on first type load (on connection open) this is the mayor problem for me atm

@FlorianRainer after looking at the code I think your analysis is right.

Type loading involves 3 different queries (load all types, load composite fields, load enum labels). If there’s a race condition between this process and the creation of composite/enum types on the same database, then it’s possible for the 2nd and 3rd steps to see enums that the 1st step did not see.

However, rather than ignoring with unknown OIDs with TryGetValue(), it seems better to use the database’s transaction isolation: if the 3 type loading queries are executed in a serializable transaction, then the code shouldn’t see any concurrent updates at all. Note that the serializable isolation mode is important, the default mode is read committed which isn’t sufficient.

@FlorianRainer do you want to test this and submit a PR?

it’s realy realy hard to reporduce it with clean code, because i’m not able (until now) to reproduce this on my dev machine. But i’m able to reproduce this on a CI Server instance even repeatable for unit tests and with some kind of logging arround.

The most important thing is i can get this on ReloadTypes() but i can get this even on connection.Open

there are some tests running paralell creating their own schema for some isolated tests.

the process is quite simple (in single thread)

  • open new connection to empty schema
  • import db schema via sql scripts
  • ReloadTypes()
  • work with NpgSql

BUT: if you do this paralell it will end up with one connection has already done the setup even if another one applies the sql scripts, this may happen while the queries for LoadBackendTypes() are running.

@roji You think it would be possible that the first query (load all types) will return less types then the next query selection enum labels? (while another connection creates types in another schema)?

i will ask this because the stack trace of the exception uppon points me to https://github.com/npgsql/npgsql/blob/dev/src/Npgsql/PostgresDatabaseInfo.cs#L367 and byOID the only Dictionary in this method, for this reason i assume the only possible get_Item call is line 367, right?

Then the only possible way to get this exception is if the oid from reader doesn’t exists in the dictionary. The only plausible reason is like i described uppon, a ooid missing in type query but existing in enum label query.

@roji what do you think about this? is is plausible? i know this is a kind of edge use case only related to my unittests, but it could rarely happen even on production in normal cases if someone updates types and a connection with type loading will be opened.

i think using byOID.TryGetValue() would not hurt in this case and make the code more stable, or do you think different?