trino: Druid `__time` filter is not pushed down
In Druid queries, __time is a very important filter since applying it narrows down the scan range, otherwise, Druid will do a full table scan, which is very slow and resource-consuming.
Currently, I found the __time is not pushed down, not sure if I use it wrongly or the pushdown is not supported yet:
Trino SQL:
select sum(col1) from <table name> where __time between date'2021-06-01' and date'2021-06-30';
SQLs sent to Druid:
SELECT "__time", "col1" FROM "druid"."<table name>"
I noticed there are ongoing works (#4109 or #4313) of implementing Druid aggregation pushdown, not sure if this is part work of it
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 21 (17 by maintainers)
Commits related to this issue
- try fixing #8404 — committed to jerryleooo/trino by jerryleooo 3 years ago
@hashhar understand and we have the interest to improve this part, will see the methods you mentioned. tks
Lack of pushdown is because Druid column mapping was never implemented yet:
https://github.com/trinodb/trino/blob/825e6d82cfd3dee98038f14b38a55e0d1798ae35/plugin/trino-druid/src/main/java/io/trino/plugin/druid/DruidJdbcClient.java#L152
hence we end up calling the
timestampColumnMappingUsingSqlTimestampWithRoundingwhich disables pushdown for correctness reasons (because of rounding on read).We should impl proper type mapping for Druid. See other connectors extending from
BaseJdbcClientfor reference. and seeTestPostgreSqlTypeMappingfor the test class to follow@jerryleooo i don’t know of anyone actively working on this at the moment but we can guide anyone who wants to work on this through the changes.
Looking at the mentioned methods in the PostgreSQL connector should be all that’s needed.
cc: @dheerajkulakarni thought you might be interested.