presto: Planning becomes very slow when filtering column in a huge list
my query is:
SELECT L.co1, L.col2, ..... L.col n
FROM
tab1 AS P
JOIN
(SELECT
DISTINCT C.col1, C.col2, I.col3, I.col4
FROM
tab2 AS I
JOIN
tab3 AS C
ON
I.col1 = C.col1
) AS L
ON
P.col1 = L.col1
where
P.col1 in (1,2,3,4,5, ... ) # long list very slow
GROUP BY
col1, col2, ...
The planning uses more than 10 minutes. I attached the statistics of filterexpressionrewrite in jmx
after query:
node | failures | hits | time.count | time.max
"master","0","995","7914.0","5550840.655"
before query:
"master","0","994","7909.0","5550840.655"
I also try to reduce the list length and the planning becomes very slow. I guess that the filter expression rewrite wastes most of the planning time. Is it correct?
About this issue
- Original URL
- State: open
- Created 6 years ago
- Reactions: 2
- Comments: 33 (18 by maintainers)
@hustnn You can try to workaround this problem by writing this query like this:
Notice the
values
keyword. It looks like it prevents building the whole tree, with operator for each literal in thein
clause and is performing the filtering on presto side.