django-postgres-copy: Is ON CONFLICT DO UPDATE; possible?

Was reading up on how ON CONFLICT DO NOTHING; works in the PostgreSQL docs (which is a great addition), and saw that ON CONFLICT DO UPDATE; is also an option, which is effectively an “upsert.”

I wonder if that’s also useful in a bulk loading context?

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 1
  • Comments: 20 (6 by maintainers)

Commits related to this issue

Most upvoted comments

I like the way Django Postgres Extras handles it, but I believe copying that functionality would require some mayor rewriting of how Django Postgres Copy works.

A very rudimentary solution I propose is passing a list with the following order:

  1. Operation (DO NOTHING or DO UPDATE)
  2. Constraint (the name of the column with a constraint)
  3. Columns to update (in the case of DO UPDATE)

I replaced ignore_conflicts with on_conflict and defaults to an empty list:

def __init__(
    ...
    on_conflict=[],
    ...
):

def insert_suffix(self):
    # If on_conflict is not an empty list
    if self.on_conflict:
        # First item on list - Operation
        if self.on_conflict[0] == 'DO NOTHING':
            return """
                ON CONFLICT DO NOTHING;
            """
        elif self.on_conflict[0] == 'DO UPDATE':
            # Second item on list - Constraint
            constraint = self.on_conflict[1]
            # Delete first two items on list. Only columns to be updated remain
            del self.on_conflict[0:2]
            update_columns =
                ', '.join(["{0} = EXCLUDED.{0}".format(col) for col in self.on_conflict])
            return """
                ON CONFLICT ({0}) DO UPDATE SET {1};
                """.format(constraint, update_columns)
    else:
        return ";"

I tested it out and it seems to work well. I understand it’s not the most pythonic solution though

an example would be: Customers.objects.from_csv('/path/to/file.csv', on_conflict=['DO UPDATE', 'name', 'email']) with multiple fields to update: Customers.objects.from_csv('/path/to/file.csv', on_conflict=['DO UPDATE', 'name', 'email', 'active', 'age']) or in plain english: if field on column 'name' already exists, update 'email', 'active' and 'age' fields

Thanks a ton of the snippet. We forked the project and got everything going.

We also added temp_table_name_suffix because this will be used in Airflow with Celery, and also added static_mapping_on_conflict to meet our needs.

I would love to see this feature. We import about 11 million records a day and COPY is by far the fastest.

Dango Postgres Extras currently handles upserts.

https://django-postgres-extra.readthedocs.io/manager/#upserting