duckdb: Correlated SQL sub-queries crashing

I have been testing some complex SQL sub-queries on MonetDB before the Nov2019 release and they triggered crashes. Meanwhile I see the same happens on DuckDB.

I think the best plan should be fixing one query at the time, because some of the problems might be correlated altogether.

Bellow each query I output the outcome from MonetDB, but notice that it might differ on DuckDB depending where the correlation happens, either inner or outer query.

CREATE TABLE tbl_ProductSales (ColID int, Product_Category  varchar(64), Product_Name  varchar(64), TotalSales int); 
INSERT INTO tbl_ProductSales VALUES (1,'Game','Mobo Game',200),(2,'Game','PKO Game',400),(3,'Fashion','Shirt',500),(4,'Fashion','Shorts',100);
CREATE TABLE another_T (col1 INT, col2 INT, col3 INT, col4 INT, col5 INT, col6 INT, col7 INT, col8 INT);
INSERT INTO another_T VALUES (1,2,3,4,5,6,7,8), (11,22,33,44,55,66,77,88), (111,222,333,444,555,666,777,888), (1111,2222,3333,4444,5555,6666,7777,8888);

SELECT col1 IN (SELECT ColID + col1 FROM tbl_ProductSales) FROM another_T GROUP BY col1; 
	-- False
	-- False
	-- False
	-- False

PS: I know these queries are very complex and barely anyone writes them. At the same time, I have a hard time figuring out Nielsโ€™ code ๐Ÿ™

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 22 (11 by maintainers)

Commits related to this issue

Most upvoted comments

HAVING without GROUP BY is now allowed #306.

Addressed in b077e6673ca2bf02dba2135f62790eedbe5a0a91, this was basically a combination of two problems:

(1) If a query had no aggregations but only groups, no error would be thrown if there were column references bound outside of the original grouping columns and the subsequent column resolution would fail.

(2) Binding to the group inside the subquery would not work because BindTableNames() was not called prior to performing the group lookup, leading to col1 being not equal to another_T.col1.

There was also an additional problem found relating to the BoundExpression that would make a composite grouping column not bind properly in subqueries, e.g. the following query would fail to bind:

SELECT (col1 + 1) IN (SELECT ColID + (col1 + 1) FROM tbl_ProductSales) FROM another_T GROUP BY (col1 + 1);

Note that this query actually also fails to bind in PostgreSQL, but SQLite correctly binds it. This I resolved by making the BoundExpression use the underlying ParsedExpression, so now this query also works correctly.

All in all three bugs found ๐Ÿ˜ƒ Thanks for the bug report! Let us know if you find any other issues.