npgsql: Race condition when creating composite/enum types while loading / reloading types
Copy of one separated problem from issue https://github.com/npgsql/npgsql/issues/2014
STACK TRACE:
Result StackTrace:
at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
at Npgsql.PostgresDatabaseInfo.LoadEnumLabels(DbDataReader reader, Dictionary`2 byOID) in C:\projects\npgsql\src\Npgsql\PostgresDatabaseInfo.cs:line 296
at Npgsql.PostgresDatabaseInfo.LoadBackendTypes(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async) in C:\projects\npgsql\src\Npgsql\PostgresDatabaseInfo.cs:line 256
at Npgsql.PostgresDatabaseInfo.LoadPostgresInfo(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async) in C:\projects\npgsql\src\Npgsql\PostgresDatabaseInfo.cs:line 72
at Npgsql.PostgresDatabaseInfoFactory.Load(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async) in C:\projects\npgsql\src\Npgsql\PostgresDatabaseInfo.cs:line 43
at Npgsql.NpgsqlDatabaseInfo.Load(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async) in C:\projects\npgsql\src\Npgsql\NpgsqlDatabaseInfo.cs:line 250
at Npgsql.NpgsqlConnector.LoadDatabaseInfo(NpgsqlTimeout timeout, Boolean async) in C:\projects\npgsql\src\Npgsql\NpgsqlConnector.cs:line 466
at Npgsql.NpgsqlConnection.ReloadTypes() in C:\projects\npgsql\src\Npgsql\NpgsqlConnection.cs:line 1431
Result Message:
Test method ... threw exception:
System.Collections.Generic.KeyNotFoundException: The given key '699916' was not present in the dictionary.
I don’t know how this can happen, maybe concurrent calls of PostgresDatabaseInfo.LoadPostgresInfo can cause this? I have for sure concurrent calls in my testcase if the application initialite with som concurrent Connetion.Open() but how this will cause this kind of error is not clear for me. Maybe related to use in combination with ReloadTypes()
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 18 (17 by maintainers)
Commits related to this issue
- fix possible inconsistent result of PostgresDatabaseInfo TypesQuery by adding a transaction fix for #2020 — committed to FlorianRainer/npgsql by FlorianRainer 6 years ago
- fix possible inconsistent result of PostgresDatabaseInfo TypesQuery by adding a transaction fix for #2020 — committed to npgsql/npgsql by FlorianRainer 6 years ago
- fix possible inconsistent result of PostgresDatabaseInfo TypesQuery by adding a transaction fix for #2020 (cherry picked from commit b2af2361d3d4ac52af8995001e3557d23cabebf7) — committed to npgsql/npgsql by FlorianRainer 6 years ago
- fix possible inconsistent result of PostgresDatabaseInfo TypesQuery by adding a transaction fix for #2020 (cherry picked from commit b2af2361d3d4ac52af8995001e3557d23cabebf7) — committed to npgsql/npgsql by FlorianRainer 6 years ago
Ok, i will give it a try. Yes, transaction read only seems a good idea, i will change the code this way. I can test this later today and i will create a PR afterwards
Thanks @FlorianRainer, merged and backported to 4.0.3.
Serializable transactions don’t cause blocking/deadlocking, as per the docs:
So I think it should be quite safe. I’d rather keep the C# side simple and logical, unless there’s an actual reason to avoid serializable.
BTW it may be a good idea to make the transaction read only as well (https://www.postgresql.org/docs/current/static/sql-set-transaction.html).
Note: changes title again, it could happen even on first type load (on connection open) this is the mayor problem for me atm
@FlorianRainer after looking at the code I think your analysis is right.
Type loading involves 3 different queries (load all types, load composite fields, load enum labels). If there’s a race condition between this process and the creation of composite/enum types on the same database, then it’s possible for the 2nd and 3rd steps to see enums that the 1st step did not see.
However, rather than ignoring with unknown OIDs with
TryGetValue(), it seems better to use the database’s transaction isolation: if the 3 type loading queries are executed in a serializable transaction, then the code shouldn’t see any concurrent updates at all. Note that the serializable isolation mode is important, the default mode is read committed which isn’t sufficient.@FlorianRainer do you want to test this and submit a PR?
it’s realy realy hard to reporduce it with clean code, because i’m not able (until now) to reproduce this on my dev machine. But i’m able to reproduce this on a CI Server instance even repeatable for unit tests and with some kind of logging arround.
The most important thing is i can get this on
ReloadTypes()but i can get this even onconnection.Openthere are some tests running paralell creating their own schema for some isolated tests.
the process is quite simple (in single thread)
BUT: if you do this paralell it will end up with one connection has already done the setup even if another one applies the sql scripts, this may happen while the queries for
LoadBackendTypes()are running.@roji You think it would be possible that the first query (load all types) will return less types then the next query selection enum labels? (while another connection creates types in another schema)?
i will ask this because the stack trace of the exception uppon points me to https://github.com/npgsql/npgsql/blob/dev/src/Npgsql/PostgresDatabaseInfo.cs#L367 and
byOIDthe onlyDictionaryin this method, for this reason i assume the only possibleget_Itemcall is line 367, right?Then the only possible way to get this exception is if the
oidfrom reader doesn’t exists in the dictionary. The only plausible reason is like i described uppon, a ooid missing in type query but existing in enum label query.@roji what do you think about this? is is plausible? i know this is a kind of edge use case only related to my unittests, but it could rarely happen even on production in normal cases if someone updates types and a connection with type loading will be opened.
i think using
byOID.TryGetValue()would not hurt in this case and make the code more stable, or do you think different?