dbt-trino: Transaction timeout for longer queries
Background
I have been experiencing this issue for a long time. This issue happens when we try to execute the query which takes more than 5min. To reproduce this issue use dbt seed which can insert some rows which will take more than 5mins.
Reason
The reason for this issue is that, In Trino, the default value of transaction.idle-timeout is 5minutes and dbt executes commands in the following flow which causes this error
START TRANSACTION- Some queries which will take more than 5min
COMMIT<- This will throw an error
Error Log
trino.exceptions.DatabaseError: failed to commit transaction 73dc7443-ec53-4911-95be-a372792743b4: TrinoUserError(type=USER_ERROR, name=UNKNOWN_TRANSACTION, message="Unknown transaction ID: 73dc7443-ec53-4911-95be-a372792743b4. Possibly expired? Commands ignored until end of transaction block", query_id=20211121_145311_00013_7rttf)
Is there any specific reason why transactions are explicitly started in dbt-trino in spite of using auto-commit in python connection? https://github.com/starburstdata/dbt-trino/blob/80884475eb921f27a09effa2fbc10b93309599a8/dbt/adapters/trino/connections.py#L195
Other reference:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (11 by maintainers)
@hashhar , thanks for confirmation. Will file a issue for this.
And yes, that is what we are facing
@bachng2017
Indeed, Trino doesn’t support fully ACID semantics. Trino makes sure that ONE
INSERTstatement doesn’t create corrupt data by inserting data in one or more staging table(s) and then doing on the target databaseINSERT INTO target_table SELECT * FROM staging_table(s).See https://www.trinoforum.org/t/insert-performance-on-trino-jdbc-connectors/99/3 for a more detailed explanation about the way that
INSERTworks on Trino.it might be a trino question than a dbt question, @findinpath, do you know if this kind of transaction works in trino ?
(for example seeding sequence looks like this): transaction begin insert 1 insert 2 insert 3 commit or rollback
as far as we’ve tested, if insert 3 fails, the result of insert 1 and 2 still in the seed table.
Maybe the transaction start/commit in dbt-presto/trino is just a place-holder or remains of a copy from other drivers. (btw, with snowflake, the rollback in previous sample seems to clear up the result of all inserts, as expected)