prql: Wrong or inconsitent behaviour with `take`

I suspect some minor problem in the parser in the case of take 1

A) (wrong)

from invoices
group [billing_city] (
  take 1
)
SELECT
  DISTINCT *
FROM
  invoices

-- Generated by PRQL compiler version:0.5.2 (https://prql-lang.org)

B) (correct?)

from invoices
group [billing_city] (
  take 2
)
WITH table_1 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY billing_city) AS _expr_0
  FROM
    invoices
)
SELECT
  *
FROM
  table_1
WHERE
  _expr_0 <= 2

-- Generated by PRQL compiler version:0.5.2 (https://prql-lang.org)

C) (very wrong)

from invoices
group [billing_city] (
  take 1
)
select [a, b, c]
SELECT
  DISTINCT a,
  b,
  c
FROM
  invoices

-- Generated by PRQL compiler version:0.5.2 (https://prql-lang.org)

D) (correct?)

from invoices
group [billing_city] (
  sort [-invoice_date]
  take 1
)
WITH table_1 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY billing_city
      ORDER BY
        invoice_date DESC
    ) AS _expr_0
  FROM
    invoices
)
SELECT
  *
FROM
  table_1
WHERE
  _expr_0 <= 1

-- Generated by PRQL compiler version:0.5.2 (https://prql-lang.org)

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (10 by maintainers)

Most upvoted comments

Oh, yes I do remember šŸ˜„ It’s a thing that I was putting off for a long time.

https://github.com/PRQL/prql/issues/944

At the time it was not convenient to do, but now it may not be so anymore.

Unfortunately, I don’t see a way around this. There is a lot of tools in the toolbox, it’s the part of the job to know which tool to use. The best thing we can do here is lots of guidance šŸ˜„

I think with your guidance we’re 90% of the way there — this issue was a good case of that.

As you know, one of my priorities is to give you better leverage on your time, which we’d get from contributions requiring less guidance. Letting folks be independent would also be a better contribution experience.

It’s possible that we’ve done everything we can, and I don’t have specific examples for where we haven’t yet. But in general, I’ve found there’s often lots of potential to make things more local, requiring less context & guidance. RQ is a good example of that I reckon, even if we haven’t fully exploited it yet.

I’m going to try and pivot back to the compiler by fixing some of our bugs (like this) — I think we’re probably under-weighing how important these are for having the language feel reliable. While I’m doing that, I’ll keep an eye open for opportunities on this dimension.

Closed by #2109.

Thanks for the guidance @aljazerzen .

I’m also going to reflect on this as a case of #1840. The actual code was fairly easy. But:

  • knowing about determine_select_columns would have been hard without your guidance.
  • It’s also v helpful to look at the RQ — possible we try and make it so that’s the first place people look — lots of this is just RQ transformations, a much easier problem than understanding the full compiler.

So maybe we try and frame issues & tests as ā€œwe need to change this RQ into this RQā€. Does that resonate at all??

I had a quick look, and can look more tomorrow. This is code I’m hoping to get into myself more too (@aljazerzen wrote almost all of it).

FWIW the by is a remnant from us using by to specify the columns in a group — it just means the columns in the group.

FYI I added #2080 so I could add some dbg! statements and understand where the information was on the ctx variable.