presto: Planning is very slow on a partitioned table even when disable hive.table-statistics-enabled
My query is very simple: select * from a tab1 limit 10
The problem is tab1 has very huge # of partitions. It needs for about 6 minutes to do the planning before the real execution start. It works fine by adding some strict condition to eliminate partitions. However, when we loose it, it also becomes very slow. The query without any filtering shown in the top is just the worst case.
I tried to disable the hive statistics by setting SET SESSION hive.statistics_enabled=false;
in my presto client connection and also disable is in the etc/catalog/hive.properties (hive.table-statistics-enabled=false). However, still not working.
Any ideas? Thanks!!!
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 17 (7 by maintainers)
@hustnn , in addition to what @kokosing you may try to use a profiler. Provided you can attach a profiler (in sampling mode) to the coordinator node and assuming your problematic query is the only thing run on the cluster at the time, it should clearly show where the code is spending time in. Or, if you cannot use a regular profiler, you can try “poor’s man profiler”, i.e. do several thread dumps (
jstack
) in an interval of ~20s. If same stacktrace appears in subsequent dumps (of the thread doing the actual planning), this is likely we’re spending time there. (Of course, regular profiler will be much more convenient to use.)This approach has some added advantage over the optimizer rules that were considered so far – it should show the problem even if it’s not in some optimizer rule.
With iterative optimizer turned on, can you please check JMX stats for iterative optimizer rules in order to find a rule with a high number of
time.count
or any oftime.p*
?You can access them with following
Below will tell you what rules there are
I would guess that
picktablelayoutforpredicate
is the culprit, but I might be wrong.