presto: Performance Regressions in Presto 0.206?
I was recently benchmarking Presto 0.206 vs 0.172. The tests are run on Parquet datasets stored on S3.
We found that Presto 0.206 was generally faster on smaller datasets, there were some significant performance regressions on larger datasets. The CPU time reported by EXPLAIN ANALYZE was lower in 0.206 than 0.172, but the wall time was much longer.
This possibly indicates either stragglers or some sort of scheduling bug that adversely affects parallelism. Note that the concurrency settings like task.concurrency are the same in both clusters.
For instance, on the TPCH scale 1000 dataset, query#7 slowed down by a factor of 2x in wall time. The query was:
SELECT supp_nation,
cust_nation,
l_year,
sum(volume) AS revenue
FROM
(SELECT n1.n_name AS supp_nation,
n2.n_name AS cust_nation,
substr(l_shipdate, 1, 4) AS l_year,
l_extendedprice * (1 - l_discount) AS volume
FROM lineitem_parq,
orders_parq,
customer_parq,
supplier_parq,
nation_parq n1,
nation_parq n2
WHERE s_suppkey = l_suppkey
AND o_orderkey = l_orderkey
AND c_custkey = o_custkey
AND s_nationkey = n1.n_nationkey
AND c_nationkey = n2.n_nationkey
AND ((n1.n_name = 'KENYA'
AND n2.n_name = 'PERU')
OR (n1.n_name = 'PERU'
AND n2.n_name = 'KENYA'))
AND l_shipdate BETWEEN '1995-01-01' AND '1996-12-31' ) AS shipping
GROUP BY supp_nation,
cust_nation,
l_year
ORDER BY supp_nation,
cust_nation,
l_year;
I compared the output of EXPLAIN ANALYZE from both versions of Presto and cannot find anything that could explain this. Here are some observations:
- The CPU time reported by each stage was usually lower in 0.206. This probably rules out operator performance regressions.
- Some of the leaf stages were using ScanProject in 0.172, but they use ScanFilterProject in 0.205. This actually reduces the output rows and leads to drastically lower CPU usage in upper stages of the query tree. This is a big improvement and should have led to faster query processing.
References
- Explain analyze from 0.206 - https://gist.github.com/anoopj/40eea820c1c310dff72139d495ac98b0
- Explain analyze from 0.172 - https://gist.github.com/anoopj/01985fe0ad298dad4c22b1444e1f1e21
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 39 (38 by maintainers)
I can submit a PR to add spill to the UI and explain analyze output also.
@findepi, we should also add spill to the explain analyze output
Created pull request https://github.com/prestodb/presto/pull/11910 to add the spill stats to the web UI and query summary.
@findepi
Piotr - I can rebase from the current master, just been busy with other projects. I hope to send out a PR in the next few days.