sqlalchemy: INSERT...VALUES...RETURNING simply cannot be trusted long term for multiple rows; implement sentinel columns

i’ve been researching this for two days and the answer is the same: yes, the rows seem to be ordered, but no, nobody can guarantee it. SQL Server is just the one actually breaking the rule really badly right now. Over on pep-249, we are probably going to even have cursor.executemany() no longer guarantee the order of records given.

The form of INSERT (a, b, c) VALUES…RETURNING is certainly one we can embed a hardcoded, numerically incrementing sentinel value within. However syntactically, there is no way to get it back unless a column is added to the table.

So I am proposing a new column argument called insert_sentinel=True:

class A(Base):
    __tablename__ = "a"

    id: Mapped[int] = mapped_column(primary_key=True)
    data: Mapped[str]

    _sentinel = mapped_column(insert_sentinel=True)

We then would do our INSERTs, when this column is present, in this fashion:

INSERT INTO a (data, _sentinel) VALUES (?, 1), (?, 2), (?, 3), (?, 4), ... RETURNING a.id, a._sentinel

then we sort by _sentinel. the insertion of the “sentinel” is already done in https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/4548.

Among other things this will be a real solution to the problem faced at #9603 and will also be a hedge against any other database where we can’t guarantee INSERT ordering.

for use_insertmanyvalues, it then becomes a three-state flag, like “use_insertmanyvalues=‘sentinelonly’” vs. “use_insertmanyvalues=True”. We would have to warn for tables that don’t have sentinels (either that we can’t use insertmanyvalues and things are too slow, or that we are using insertmanyvalues but the table is not “safe”).

all the use/ warnings for “sentinel” only apply if we are trying to use RETURNING. if we arent RETURNING, there is no problem - that is, we really (we could do this anyway) can turn on insertmanyvalues for SQL Server as long as the statements aren’t using RETURNING.

The use of “insertmanyvalues” with RETURNING is mostly an ORM use case. You can use this directly in 2.0 but it’s the ORM that really relies upon it. in ORM use cases, people are usually using the ORM to manage their tables.

Whether or not we make this “required” or whatever, it seems we at least should have this capability available.

It’s also supported in all databases, even SQLite, to ALTER TABLE now that simply adds a nullable column. So even for systems that use pre-existing databases, we could have functions that do something like ensure_sentinel(connection, table) that just adds this column if not already present; that is it doesn’t even even need to be within the usual migration path. it’s a simple nullable integer (or even smallint) column, only needs to store a number as big as the actual batch size, does not need indexes, does not need to be SELECTable, nothing. it is only used in INSERT as a counter.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 49 (36 by maintainers)

Commits related to this issue

Most upvoted comments

that’s a truly fascinating thread but through all of it, they forgot to act upon the most important message in the thread which is, emphasis added:

I always thought it was guaranteed, and I’ve encountered code written by other people who were obviously under the same impression: take some strings (e.g. flag names), use “insert … returning id”, map the ids back to the names, and use the values in further inserts into other tables (“flag_id foreign key references flags”).

I know one could say “returning id, name”, but there’s certainly code out there that doesn’t do this.

I personally think the return order should be guaranteed; and if not, then the documentation urgently needs some prominent warnings to tell people that they should not assume this (for any variant of RETURNING).

I’m not in a hurry to sign up for this list but WHY can’t they document this warning?