sqlfluff: L042 should not remove queries to CTEs when the queries reference other tables in outer query.
Search before asking
- I searched the issues and found no similar issues.
Description
The rule L042 is a good idea I think but it does not support to config more advanced options where you - on purpose - might want to use a sub-query in a left join by using the language specific functions LATERAL (in PostgreSQL) or OUTER APPLY (in TSQL).
The Anti-pattern
select
a.x, a.y, b.z
from a
join (
select x, z from b
) using(x)
is good, but the following pattern might be wanted, as the subquery is directly joining to the outer query:
PostgreSQL
select
a.x, a.y, b.z
from a
join lateral (
select x, z
from b
where b.x = a.x
order by z
limit 1
) on true
TSQL
select
a.x, a.y, b.z
from a
outer apply (
select top 1 x, y
from b
where b.x = a.x
order by z
) as b
~I suggest to add a new config option join-lateral
to the existing options join
and from
.~
Moving these queries to CTEs without refactoring the SQL will break the SQL. I would suggest automatically excluding these subqueries from L042
Use case
No response
Dialect
see above
Are you willing to work on and submit a PR to address the issue?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (15 by maintainers)
OK let’s leave this issue as just the first one. We can open a second issue for even greater restrictions for
APPLY
andLATERAL
in future if needs be.I’ve updated the Postgres example to add a
where b.x = a.x
clause which I think makes it clearer and still keeps the intent. Let me know if that is not the case.My experience in rule writing is limited, and having a quick look at L042 it looks quite complicated. I’d guess we either want to trim the results of
_find_nested_subqueries
to exclude subqueries with external references, or update_calculate_fixes
to prevent those being surfaced as real lint errors. But a bit beyond me to be honest. @barrywhart may be best to advise here.BTW, as a short term fix you can always disable L042 in your config until this is fixed.
OK so I agree it shouldn’t pull out to a CTE if it refers to other tables (like https://github.com/sqlfluff/sqlfluff/issues/3572#issuecomment-1179708478 does for
pd
).For the query plan issue for unreferenced tables (if this even is an issue) I think that’s a harder one to solve, and maybe need to just inline ignore this rule in those few cases?
Would that solve all issues for this issue? If so can I rename the issue to “L042 should not remove queries referencing other tables to CTEs”? Note I don’t do to much work on rules so won’t be tackling this one myself.