efcore: Add support for INSERT IGNORE / ON CONFLICT DO NOTHING

I propose adding an API to indicate to EF Core that an insert (via Add or AddRange and variants) may fail. This is useful when adding data to a table that may already exist.

For instance, I have a program that collects analytics data from a source and inserts it all into a table. Some of these rows may be duplicates of previously inserted rows and so will fail unique constraints. These errors can be safely ignored since the rows already exist in the table.

Using individual INSERT IGNORE ... statements results in horrible performance (~15x slower than AddRangeAsync).

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 22
  • Comments: 26 (13 by maintainers)

Most upvoted comments

@ajcvickers can you explain what your doubts about its suitability for an ORM are? I’m not sure I see the difference between this or other SQL translations.

When attempting to insert data that may already be present in the database without insert or ignore, you basically need to either prepopulate some sort of bloom filter or hash set beforehand and pray you don’t run into any coherence issues or else use fine-grained transactions and throw-and-catch lots of exceptions.

While the SQL specifics are certainly provider-dependent (just like everything else), its certainly not a concept specific to a single SQL dialect. INSERT IGNORE is very closely linked with upsert, which is quickly becoming “a thing” in the rdbms world. MySQL hase INSERT IGNORE and REPLACE INTO, SQLite has INSERT OR IGNORE for primary key conflicts and ON CONFLICT [IGNORE|REPLACE], PostgreSQL has ON CONFLICT ____ DO [NOTHING|UPDATE ...], SQL Server has IF NOT EXISTS(..) for old versions and now MERGE ....

Please correct me if I’m wrong, but we have a general concept (need to insert content that may or may not already exist in the database and either update/abort/ignore if it already does) that is available in the most popular database options supported/targeted by EF Core, but requires different syntax for each implementation, which to me makes it definitely ORM-worthy.

The only issue is that it does not map cleanly to LINQ because LINQ expressions don’t really have any concept of constraints. But for the most part, updates and inserts are orthogonal to querying and I imagine this could be implemented without touching the IQueryable interface at all; if you take the most simple example of either completely replacing or completely ignoring (i.e. disregarding the more complex options afforded by the PostgreSQL and SQL Server languages) then in its most basic form this could simply be an optional parameter to the DbSet<>.Add() method specifying the conflict resolution action with ABORT or equivalent (which is basically the current behavior) being the default and the only required version to support.

@tombohub I use this package which works fine - https://github.com/artiomchi/FlexLabs.Upsert

To add one solution, using EFCore.BulkExtensions this can be done with

If we want Insert only new and skip existing ones in Db (Insert_if_not_Exist) then use BulkInsertOrUpdate with config PropertiesToIncludeOnUpdate = new List<string> { "" }

Disc. I’m the author, and Lib. now has Dual License - a fix for OSS sustainability. (Community version is free and covers most users, Commercial one is for companies over 1 mil. revenue)

An issue/problem is that arbitrary columns may be passed to PostgreSQL ON CONFLICT but it’ll fail at runtime if the columns aren’t covered by a unique index/constraint. MySQL’s INSERT IGNORE is much simpler (and therefore more ORM-friendly).

Interesting - what specific issues do you see with PG’s arbitrary column support, and how do you see INSERT IGNORE this as more ORM-friendly? At the very least, the PG provider could generate ON CONFLICT IGNORE without any columns, providing (I think) the same behavior as MySQL, no? In that sense PG seems to be a superset of the MySQL functionality?

One would be forgiven for thinking that this particular sub-feature could be elided from any ignore/upsert implementation, but such an abstraction would be severely crippled because any unique constraints that include a nullable column would never trigger (because NULL != NULL in SQL).

I don’t think this is really worse than other places where different database behavior leaks - EF Core doesn’t really pretend to provide a uniform abstraction over all databases (e.g. string comparison is case-insensitive in some databases, insensitive in others). In other words, it may make sense to provide a general relational “add or ignore” mechanism, which would be implemented by ON CONFLICT on PG and INSERT IGNORE in MySQL, even if there are some behavioral differences. But I admit I haven’t gone into the fine details here.

But your points (and suggested workarounds) are interesting… Note that PG also has exclusion constraints, which could be used with the IS DISTINCT FROM to disallow multiple nulls, if that’s desired (just as standard SQL DISTINCT does consider NULLs equal). This would obviously be a PG-specific thing.

@mqudsi I don’t remember the full extent of the discussion with the team. Nevertheless, this is something we have on the backlog to consider for a future release.