sqlfluff: L042 should not remove queries to CTEs when the queries reference other tables in outer query.

Search before asking

  • I searched the issues and found no similar issues.

Description

The rule L042 is a good idea I think but it does not support to config more advanced options where you - on purpose - might want to use a sub-query in a left join by using the language specific functions LATERAL (in PostgreSQL) or OUTER APPLY (in TSQL).

The Anti-pattern

select
    a.x, a.y, b.z
from a
join (
    select x, z from b
) using(x)

is good, but the following pattern might be wanted, as the subquery is directly joining to the outer query:

PostgreSQL

select
    a.x, a.y, b.z
from a
join lateral (
    select x, z
    from b
    where b.x = a.x
    order by z
    limit 1
) on true

TSQL

select
    a.x, a.y, b.z
from a
outer apply (
    select top 1 x, y
    from b
    where b.x = a.x
    order by z
) as b

~I suggest to add a new config option join-lateral to the existing options join and from.~

Moving these queries to CTEs without refactoring the SQL will break the SQL. I would suggest automatically excluding these subqueries from L042

Use case

No response

Dialect

see above

Are you willing to work on and submit a PR to address the issue?

  • Yes I am willing to submit a PR!

Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (15 by maintainers)

Most upvoted comments

OK let’s leave this issue as just the first one. We can open a second issue for even greater restrictions for APPLY and LATERAL in future if needs be.

I’ve updated the Postgres example to add a where b.x = a.x clause which I think makes it clearer and still keeps the intent. Let me know if that is not the case.

My experience in rule writing is limited, and having a quick look at L042 it looks quite complicated. I’d guess we either want to trim the results of _find_nested_subqueries to exclude subqueries with external references, or update _calculate_fixes to prevent those being surfaced as real lint errors. But a bit beyond me to be honest. @barrywhart may be best to advise here.

BTW, as a short term fix you can always disable L042 in your config until this is fixed.

OK so I agree it shouldn’t pull out to a CTE if it refers to other tables (like https://github.com/sqlfluff/sqlfluff/issues/3572#issuecomment-1179708478 does for pd).

For the query plan issue for unreferenced tables (if this even is an issue) I think that’s a harder one to solve, and maybe need to just inline ignore this rule in those few cases?

Would that solve all issues for this issue? If so can I rename the issue to “L042 should not remove queries referencing other tables to CTEs”? Note I don’t do to much work on rules so won’t be tackling this one myself.