presto: Planning is very slow on a partitioned table even when disable hive.table-statistics-enabled

My query is very simple: select * from a tab1 limit 10

The problem is tab1 has very huge # of partitions. It needs for about 6 minutes to do the planning before the real execution start. It works fine by adding some strict condition to eliminate partitions. However, when we loose it, it also becomes very slow. The query without any filtering shown in the top is just the worst case.

I tried to disable the hive statistics by setting SET SESSION hive.statistics_enabled=false; in my presto client connection and also disable is in the etc/catalog/hive.properties (hive.table-statistics-enabled=false). However, still not working.

Any ideas? Thanks!!!

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 17 (7 by maintainers)

Most upvoted comments

@hustnn , in addition to what @kokosing you may try to use a profiler. Provided you can attach a profiler (in sampling mode) to the coordinator node and assuming your problematic query is the only thing run on the cluster at the time, it should clearly show where the code is spending time in. Or, if you cannot use a regular profiler, you can try “poor’s man profiler”, i.e. do several thread dumps (jstack) in an interval of ~20s. If same stacktrace appears in subsequent dumps (of the thread doing the actual planning), this is likely we’re spending time there. (Of course, regular profiler will be much more convenient to use.)

This approach has some added advantage over the optimizer rules that were considered so far – it should show the problem even if it’s not in some optimizer rule.

With iterative optimizer turned on, can you please check JMX stats for iterative optimizer rules in order to find a rule with a high number of time.count or any of time.p*?

You can access them with following

presto:current> use jmx.current;
USE
presto:current> select * from "com.facebook.presto.sql.planner.iterative:name=iterativeoptimizer,rule=picktablelayoutforpredicate";

Below will tell you what rules there are

show tables from jmx.current

I would guess that picktablelayoutforpredicate is the culprit, but I might be wrong.