sqlalchemy: INSERT...VALUES...RETURNING simply cannot be trusted long term for multiple rows; implement sentinel columns
i’ve been researching this for two days and the answer is the same: yes, the rows seem to be ordered, but no, nobody can guarantee it. SQL Server is just the one actually breaking the rule really badly right now. Over on pep-249, we are probably going to even have cursor.executemany() no longer guarantee the order of records given.
The form of INSERT (a, b, c) VALUES…RETURNING is certainly one we can embed a hardcoded, numerically incrementing sentinel value within. However syntactically, there is no way to get it back unless a column is added to the table.
So I am proposing a new column argument called insert_sentinel=True:
class A(Base):
__tablename__ = "a"
id: Mapped[int] = mapped_column(primary_key=True)
data: Mapped[str]
_sentinel = mapped_column(insert_sentinel=True)
We then would do our INSERTs, when this column is present, in this fashion:
INSERT INTO a (data, _sentinel) VALUES (?, 1), (?, 2), (?, 3), (?, 4), ... RETURNING a.id, a._sentinel
then we sort by _sentinel. the insertion of the “sentinel” is already done in https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/4548.
Among other things this will be a real solution to the problem faced at #9603 and will also be a hedge against any other database where we can’t guarantee INSERT ordering.
for use_insertmanyvalues, it then becomes a three-state flag, like “use_insertmanyvalues=‘sentinelonly’” vs. “use_insertmanyvalues=True”. We would have to warn for tables that don’t have sentinels (either that we can’t use insertmanyvalues and things are too slow, or that we are using insertmanyvalues but the table is not “safe”).
all the use/ warnings for “sentinel” only apply if we are trying to use RETURNING. if we arent RETURNING, there is no problem - that is, we really (we could do this anyway) can turn on insertmanyvalues for SQL Server as long as the statements aren’t using RETURNING.
The use of “insertmanyvalues” with RETURNING is mostly an ORM use case. You can use this directly in 2.0 but it’s the ORM that really relies upon it. in ORM use cases, people are usually using the ORM to manage their tables.
Whether or not we make this “required” or whatever, it seems we at least should have this capability available.
It’s also supported in all databases, even SQLite, to ALTER TABLE now that simply adds a nullable column. So even for systems that use pre-existing databases, we could have functions that do something like ensure_sentinel(connection, table) that just adds this column if not already present; that is it doesn’t even even need to be within the usual migration path. it’s a simple nullable integer (or even smallint) column, only needs to store a number as big as the actual batch size, does not need indexes, does not need to be SELECTable, nothing. it is only used in INSERT as a counter.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 49 (36 by maintainers)
Commits related to this issue
- walk back SQL Server language a bit re: insertmanyvalues In #9618 we both can look to re-enable insertmanyvalues for SQL Server, and also likely *disable* its use for the ORM unit of work specificall... — committed to sqlalchemy/sqlalchemy by zzzeek a year ago
- add deterministic imv returning ordering using sentinel columns Repaired a major shortcoming which was identified in the :ref:`engine_insertmanyvalues` performance optimization feature first introduc... — committed to lelit/sqlalchemy by zzzeek a year ago
that’s a truly fascinating thread but through all of it, they forgot to act upon the most important message in the thread which is, emphasis added:
I’m not in a hurry to sign up for this list but WHY can’t they document this warning?
Question on PG mailing list; https://www.postgresql.org/message-id/CAN19dycS0oUorLOGgjxMX8NZq-hOA1RJrzwC4L3FxpMf4BQbBQ%40mail.gmail.com