sqlalchemy: add execution option for asyncpg_timeout
Describe the use case
In our company we use PostgreSQL, one of the tables that is quite big with over 1.5B records. We have few queries constructed with SQLAlchemy Core which we use on it and with help of indexes they’re usually quite fast (100’s of milliseconds at most). Lately we have discovered and issue with PG’s query planner that would omit an index and would run a plan that was stalling the connection for multiple days. After running a particular API request many times our connection pool got exhausted and application basically crashed.
Natural solution to prevent this type of issues would be to implement some sort of timeout. Value of the timeout would depend on the expected execution time of given query. There are of course multiple ways to achieve this:
- Set
statement_timeouton current connection in case of PostgreSQL. - Wrap
AsyncConnection.execute()inasyncio.wait_for(). - Add option to pass
timeoutargument toasyncpgdriver via for exampleexecution_optionsargument.
Where 1. and 2. seems to be possible options I think it would be very useful if option 3. would be implemented in SQLAlchemy. Timeout mechanism is already built in in asyncpg driver but access to it would need to be added.
Databases / Backends / Drivers targeted
asyncpg for PostgreSQL
Example Use
try:
row = await conn.execute(query, execution_options={'timeout': 3})
except TimeoutError:
... # handle case when query took to long to execute
Additional context
No response
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 1
- Comments: 18 (15 by maintainers)
If setting it globally is ok, you can use the classic
connect_argsto set any arg that’s accepted by asyncpg, documented here https://magicstack.github.io/asyncpg/current/api/index.html#asyncpg.connection.connect.So for example
create_async_engine(..., connect_args={"timeout": 42})we could have an internal feature that takes all “drivername_xyz” options and moves them into separate dictionaries like that. We do this already for the DDL arguments like Index(postgresql_where=…). it just would be an extra few dictionary operations per execute call which is likely not worth it unless it’s in cython.
yes.
Your reasoning makes sense. Since something like this is not already implemented, changes are that there isn’t a general need for it.
can I summarize your proposal as:
instead of
?
i havent looked at psycopg3 yet, and maybe they are an outlier, but the general idea of per-execute KWs is not really prevalent in all the drivers I’ve dealt with. asyncpg here only has “timeout”, that’s it. all other drivers I’ve worked with have none, ever, since it’s not part of the pep-249 interface.
this is not to say there can’t be some kind of namespacing like the above for delivering lots of dialect-specific options, which might be fine, but the above is not going to solve the issue of a user wants to use some new dialect feature X and we don’t have to change any code, because those features are usually implemented in some idiosyncratic way, as a cursor.execute() kw arg is the last place such things are added since pep-249 doesnt include it, and for asyncpg we aren’t using their “cursor” concept in most cases anyway.
that is I think “timeout” especially in an async context is one that makes sense, but as the other drivers dont even seem to have it, seems like we should not overthink this one and just make the feature available, there are not expected to really be any other “execute” arguments like this (unless psycopg3 is really a game changer in this area).
let me just emphasize that we arent going to add any “timeout” mechanisms on our end using async workers, threads etc. , someone else had asked about that. this is strictly timeout params that the async (or sync) drivers accept explicitly in some way on a per-statement basis.