presto: Predicate pushdown may result in query failure
Projection IF (o.orderstatus = 'P', t.x[3], NULL) as y
combined with o.orderstatus = 'P' AND y IS NOT NULL
filter on top of a join causes pushdown of x[3] IS NOT NULL
predicate into t
. This may trigger Array subscript out of bounds
error if t
has records where x
has fewer than 3 elements. In case these records don’t survive the join, the overall query should pass, but it fails.
To reproduce, add the following to com.facebook.presto.hive.TestHiveIntegrationSmokeTest
(or execute queries manually):
@Test
public void test()
{
assertQuerySucceeds(getQueryRunner().getDefaultSession(),
"CREATE TABLE test AS " +
"SELECT\n" +
" l.linenumber,\n" +
" o.orderkey,\n" +
" o.orderstatus,\n" +
" IF (o.orderstatus = 'P', ARRAY[1, 2, 4], ARRAY[1]) AS x\n" +
"FROM lineitem l, orders o\n" +
"WHERE l.orderkey = o.orderkey");
assertQuerySucceeds(getQueryRunner().getDefaultSession(),
"SELECT *\n" +
"FROM (\n" +
" SELECT\n" +
" o.orderstatus,\n" +
" IF (o.orderstatus = 'P', x[3], NULL) AS y\n" +
" FROM test t, orders o\n" +
" WHERE t.orderkey = o.orderkey\n" +
")\n" +
"WHERE orderstatus = 'P' AND y IS NOT NULL");
assertUpdate("DROP TABLE test");
}
The stacktrace:
com.facebook.presto.spi.PrestoException: Array subscript out of bounds
at com.facebook.presto.operator.scalar.ArraySubscriptOperator.checkIndex(ArraySubscriptOperator.java:166)
at com.facebook.presto.operator.scalar.ArraySubscriptOperator.longSubscript(ArraySubscriptOperator.java:95)
at com.facebook.presto.$gen.PageFilter_20190625_013659_979.filter(Unknown Source)
at com.facebook.presto.$gen.PageFilter_20190625_013659_979.filter(Unknown Source)
at com.facebook.presto.operator.project.DictionaryAwarePageFilter.filter(DictionaryAwarePageFilter.java:83)
at com.facebook.presto.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:115)
at com.facebook.presto.operator.project.PageProcessor.process(PageProcessor.java:101)
at com.facebook.presto.operator.ScanFilterAndProjectOperator.processPageSource(ScanFilterAndProjectOperator.java:287)
at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:231)
com.facebook.presto.sql.planner.optimizations.PredicatePushDown.Rewriter#processInnerJoin
already makes sure not to push down non-deterministic predicates. It may need to block pushdown of predicates that may generate an error or wrap these in try
.
CC: @rongrong @highker @wenleix @arhimondr
P.S. The following fix introduced in 0.221 seems to expose this issue (e.g. queries that used to pass are now failing). It affects queries that use varchar columns. E.g. it would happen if type of orderstatus
column in the above repro was changed to varchar
.
aa6e60648d9d7d8ddedec70e500e89575d177707 is the first bad commit
commit aa6e60648d9d7d8ddedec70e500e89575d177707
Author: Andrii Rosa <andriirosa@fb.com>
Date: Wed Apr 24 14:21:27 2019 -0400
Fix equality inference for VARCHAR predicates
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 15 (14 by maintainers)
@rongrong @highker I don’t think this issue qualifies as a release blocker because it seems to exist for a long time. I also don’t think we should revert #12724. I suggest we remove release-blocker label and work on a fix on a timeline that makes sense.
Simpler self-contained test case:
Two suggestions of similar solutions we were talking about at lunch today were
try
) and then after the join we’d have the strict predicate that would throw an exception on errorsBoth of these would require some work on the execution side. The advantage of 1 is that it’s easier to implement. The disadvantage is that it could cause a performance regression to do the filter twice if the predicate is expensive to compute and adds more overhead for the 99% of queries that won’t have any errors.
I’ll take a look.
@wenleix Wenlei, as you pointed out #12724 only exposed the existing problem. Reverting it helps with some queries, but there can still be queries that incorrectly fail.