efcore: Add support for INSERT IGNORE / ON CONFLICT DO NOTHING
I propose adding an API to indicate to EF Core that an insert (via Add or AddRange and variants) may fail. This is useful when adding data to a table that may already exist.
For instance, I have a program that collects analytics data from a source and inserts it all into a table. Some of these rows may be duplicates of previously inserted rows and so will fail unique constraints. These errors can be safely ignored since the rows already exist in the table.
Using individual INSERT IGNORE ... statements results in horrible performance (~15x slower than AddRangeAsync).
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 22
- Comments: 26 (13 by maintainers)
@ajcvickers can you explain what your doubts about its suitability for an ORM are? I’m not sure I see the difference between this or other SQL translations.
When attempting to insert data that may already be present in the database without insert or ignore, you basically need to either prepopulate some sort of bloom filter or hash set beforehand and pray you don’t run into any coherence issues or else use fine-grained transactions and throw-and-catch lots of exceptions.
While the SQL specifics are certainly provider-dependent (just like everything else), its certainly not a concept specific to a single SQL dialect.
INSERT IGNOREis very closely linked with upsert, which is quickly becoming “a thing” in the rdbms world. MySQL haseINSERT IGNOREandREPLACE INTO, SQLite hasINSERT OR IGNOREfor primary key conflicts andON CONFLICT [IGNORE|REPLACE], PostgreSQL hasON CONFLICT ____ DO [NOTHING|UPDATE ...], SQL Server hasIF NOT EXISTS(..)for old versions and nowMERGE ....Please correct me if I’m wrong, but we have a general concept (need to insert content that may or may not already exist in the database and either update/abort/ignore if it already does) that is available in the most popular database options supported/targeted by EF Core, but requires different syntax for each implementation, which to me makes it definitely ORM-worthy.
The only issue is that it does not map cleanly to LINQ because LINQ expressions don’t really have any concept of constraints. But for the most part, updates and inserts are orthogonal to querying and I imagine this could be implemented without touching the
IQueryableinterface at all; if you take the most simple example of either completely replacing or completely ignoring (i.e. disregarding the more complex options afforded by the PostgreSQL and SQL Server languages) then in its most basic form this could simply be an optional parameter to theDbSet<>.Add()method specifying the conflict resolution action withABORTor equivalent (which is basically the current behavior) being the default and the only required version to support.@tombohub I use this package which works fine - https://github.com/artiomchi/FlexLabs.Upsert
To add one solution, using EFCore.BulkExtensions this can be done with
Disc. I’m the author, and Lib. now has Dual License - a fix for OSS sustainability. (Community version is free and covers most users, Commercial one is for companies over 1 mil. revenue)
Interesting - what specific issues do you see with PG’s arbitrary column support, and how do you see INSERT IGNORE this as more ORM-friendly? At the very least, the PG provider could generate ON CONFLICT IGNORE without any columns, providing (I think) the same behavior as MySQL, no? In that sense PG seems to be a superset of the MySQL functionality?
I don’t think this is really worse than other places where different database behavior leaks - EF Core doesn’t really pretend to provide a uniform abstraction over all databases (e.g. string comparison is case-insensitive in some databases, insensitive in others). In other words, it may make sense to provide a general relational “add or ignore” mechanism, which would be implemented by ON CONFLICT on PG and INSERT IGNORE in MySQL, even if there are some behavioral differences. But I admit I haven’t gone into the fine details here.
But your points (and suggested workarounds) are interesting… Note that PG also has exclusion constraints, which could be used with the
IS DISTINCT FROMto disallow multiple nulls, if that’s desired (just as standard SQL DISTINCT does consider NULLs equal). This would obviously be a PG-specific thing.@mqudsi I don’t remember the full extent of the discussion with the team. Nevertheless, this is something we have on the backlog to consider for a future release.