presto: Planning becomes very slow when filtering column in a huge list

my query is:

SELECT L.co1, L.col2, ..... L.col n

FROM
    tab1 AS P
JOIN 
    (SELECT
        DISTINCT C.col1, C.col2, I.col3, I.col4
    FROM
        tab2 AS I
    JOIN
        tab3 AS C
    ON
        I.col1 = C.col1

    ) AS L
ON
    P.col1 = L.col1
where

    P.col1 in (1,2,3,4,5, ... ) # long list very slow

GROUP BY

    col1, col2, ...

The planning uses more than 10 minutes. I attached the statistics of filterexpressionrewrite in jmx

after query:

node          | failures | hits | time.count | time.max
"master","0","995","7914.0","5550840.655"

before query:

"master","0","994","7909.0","5550840.655"

I also try to reduce the list length and the planning becomes very slow. I guess that the filter expression rewrite wastes most of the planning time. Is it correct?

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 2
  • Comments: 33 (18 by maintainers)

Most upvoted comments

@hustnn You can try to workaround this problem by writing this query like this:

where
    P.col1 in (values 1,2,3,4,5, ... ) # long list very slow

Notice the values keyword. It looks like it prevents building the whole tree, with operator for each literal in the in clause and is performing the filtering on presto side.