presto: Performance Regressions in Presto 0.206?

I was recently benchmarking Presto 0.206 vs 0.172. The tests are run on Parquet datasets stored on S3.

We found that Presto 0.206 was generally faster on smaller datasets, there were some significant performance regressions on larger datasets. The CPU time reported by EXPLAIN ANALYZE was lower in 0.206 than 0.172, but the wall time was much longer.

This possibly indicates either stragglers or some sort of scheduling bug that adversely affects parallelism. Note that the concurrency settings like task.concurrency are the same in both clusters.

For instance, on the TPCH scale 1000 dataset, query#7 slowed down by a factor of 2x in wall time. The query was:

SELECT supp_nation,
       cust_nation,
       l_year,
       sum(volume) AS revenue
FROM
  (SELECT n1.n_name AS supp_nation,
          n2.n_name AS cust_nation,
          substr(l_shipdate, 1, 4) AS l_year,
          l_extendedprice * (1 - l_discount) AS volume
   FROM lineitem_parq,
        orders_parq,
        customer_parq,
        supplier_parq,
        nation_parq n1,
        nation_parq n2
   WHERE s_suppkey = l_suppkey
     AND o_orderkey = l_orderkey
     AND c_custkey = o_custkey
     AND s_nationkey = n1.n_nationkey
     AND c_nationkey = n2.n_nationkey
     AND ((n1.n_name = 'KENYA'
           AND n2.n_name = 'PERU')
          OR (n1.n_name = 'PERU'
              AND n2.n_name = 'KENYA'))
     AND l_shipdate BETWEEN '1995-01-01' AND '1996-12-31' ) AS shipping
GROUP BY supp_nation,
         cust_nation,
         l_year
ORDER BY supp_nation,
         cust_nation,
         l_year;

I compared the output of EXPLAIN ANALYZE from both versions of Presto and cannot find anything that could explain this. Here are some observations:

The CPU time reported by each stage was usually lower in 0.206. This probably rules out operator performance regressions.
Some of the leaf stages were using ScanProject in 0.172, but they use ScanFilterProject in 0.205. This actually reduces the output rows and leads to drastically lower CPU usage in upper stages of the query tree. This is a big improvement and should have led to faster query processing.

References

Explain analyze from 0.206 - https://gist.github.com/anoopj/40eea820c1c310dff72139d495ac98b0
Explain analyze from 0.172 - https://gist.github.com/anoopj/01985fe0ad298dad4c22b1444e1f1e21

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 39 (38 by maintainers)

Most upvoted comments

I can submit a PR to add spill to the UI and explain analyze output also.

anoopj on Oct 17, 2018

@findepi, we should also add spill to the explain analyze output

dain on Oct 17, 2018

Created pull request https://github.com/prestodb/presto/pull/11910 to add the spill stats to the web UI and query summary.

anoopj on Nov 13, 2018

@findepi

Piotr - I can rebase from the current master, just been busy with other projects. I hope to send out a PR in the next few days.

anoopj on Nov 1, 2018