trino: Druid `__time` filter is not pushed down

In Druid queries, __time is a very important filter since applying it narrows down the scan range, otherwise, Druid will do a full table scan, which is very slow and resource-consuming.

Currently, I found the __time is not pushed down, not sure if I use it wrongly or the pushdown is not supported yet:

Trino SQL: select sum(col1) from <table name> where __time between date'2021-06-01' and date'2021-06-30';

SQLs sent to Druid: SELECT "__time", "col1" FROM "druid"."<table name>"

I noticed there are ongoing works (#4109 or #4313) of implementing Druid aggregation pushdown, not sure if this is part work of it

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (17 by maintainers)

Commits related to this issue

Most upvoted comments

@hashhar understand and we have the interest to improve this part, will see the methods you mentioned. tks

Lack of pushdown is because Druid column mapping was never implemented yet:

https://github.com/trinodb/trino/blob/825e6d82cfd3dee98038f14b38a55e0d1798ae35/plugin/trino-druid/src/main/java/io/trino/plugin/druid/DruidJdbcClient.java#L152

hence we end up calling the timestampColumnMappingUsingSqlTimestampWithRounding which disables pushdown for correctness reasons (because of rounding on read).

We should impl proper type mapping for Druid. See other connectors extending from BaseJdbcClient for reference. and see TestPostgreSqlTypeMapping for the test class to follow

@jerryleooo i don’t know of anyone actively working on this at the moment but we can guide anyone who wants to work on this through the changes.

Looking at the mentioned methods in the PostgreSQL connector should be all that’s needed.

cc: @dheerajkulakarni thought you might be interested.