Skip to content

The count function needs s"*" for the argument in current dev version #2717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
eitsupi opened this issue Jun 4, 2023 · 10 comments · Fixed by #2936
Closed
2 tasks done

The count function needs s"*" for the argument in current dev version #2717

eitsupi opened this issue Jun 4, 2023 · 10 comments · Fixed by #2936
Labels
compiler language-design Changes to PRQL-the-language
Milestone

Comments

@eitsupi
Copy link
Member

eitsupi commented Jun 4, 2023

What happened?

From #2710 (comment), related to #2713

We are currently using workarounds in the following following locations:

prql/README.md

Line 34 in 69571f1

count s"*", # Trailing commas are allowed

ct = count s"*",

PRQL input

from a
derive {cnt = count}

SQL output

NA

Expected SQL output

SELECT
  *,
  COUNT(*) AS cnt
FROM
  a

MVCE confirmation

  • Minimal example
  • New issue

Anything else?

No response

@eitsupi eitsupi added the bug Invalid compiler output or panic label Jun 4, 2023
@eitsupi eitsupi changed the title The count function needs s"*" for the argument The count function needs s"*" for the argument in current dev version Jun 4, 2023
@eitsupi eitsupi added this to the 0.9 milestone Jun 4, 2023
@aljazerzen aljazerzen added language-design Changes to PRQL-the-language compiler and removed bug Invalid compiler output or panic labels Jun 6, 2023
This was referenced Jun 11, 2023
@aljazerzen
Copy link
Member

aljazerzen commented Jun 12, 2023

While working on the compiler, I had to change the rule which triggers the OVER clause for window functions to not look at the result type of the function, but instead look at the parameter type. This works well, but it has a problem: some window functions don't have any parameters:

  • count (there was non_null:, but that's a named param)
  • rank,
  • row_number,

For the sake of language design this should not matter, but it lead me to the following question:

How does the count function work, if it receives no parameters? What's the input of its computation? Does it just "pull from the relation"? How do rank and row_number know how many resulting rows should they output?

This was my justification for making them have a parameter. count now has a positional non_null, and rank and row_number have a column param that is not used in compilation.

This is obviously a bit inconvenient, so how could we design the language to work around this? To have the arguments as they were before, but don't use the "pull from the relation" bullshit justification?

I've hoped to find a quick fix, but then I found the similar issue of #2723 and wanted to clear that first, so the two do not get mixed up.

@max-sixty
Copy link
Member

How does the count function work, if it receives no parameters? What's the input of its computation? Does it just "pull from the relation"? How do rank and row_number know how many resulting rows should they output?

Yes, I think we can think about it as one of:

  • it looks at the whole relation, rather than just one column.
  • it looks any column (and at least in the case of count, also counts nulls, unlike when a column is selected, as you describe)

Is this a big problem, though? Is your point that it introduces a completely new type for just a couple of functions? For the types, I was thinking the type is int rather than T?

@aljazerzen
Copy link
Member

it looks at the whole relation, rather than just one column.

You mean like this?

from a = albums
aggregate {count a}

The problem is that this does not make any sense:

let x = row_number

... while this is completely fine:

let x = sum [1, 2, -6, 45]

So my question is: how do we want to describe the difference between these two functions?


My solution was to say "there is no difference, rank requires a column too".

let x = row_number [1, -2, 5]

assert x == [1, 2, 3]

@max-sixty
Copy link
Member

You mean like this?

from a = albums
aggregate {count a}

Yes!

The problem is that this does not make any sense:

let x = row_number

... while this is completely fine:

let x = sum [1, 2, -6, 45]

So my question is: how do we want to describe the difference between these two functions?

My mental model is that they have the relation as a parameter:

let x = row_number [{a=1, b='a'}, {a=2, b='a'}]

...so indeed let x = row_number isn't correct.

Does that make sense?

@aljazerzen
Copy link
Member

Makes sense, I fully agree.

Now, we just need a way to reference current relation in all contexts (without using _frame) and a solution for #2723.

@eitsupi
Copy link
Member Author

eitsupi commented Jun 18, 2023

Since this is seemingly related to the larger issue #2723, can we mark it as a known issue to Changelog, etc. and remove it from the 0.9 milestone and do a 0.9 release?

I believe that the syntax changes between the last release and the development version are currently causing confusion for new users, so I wonder if it might be worthwhile to expedite the 0.9 release, even if there are some flaws. (Of course, if we could solve these issues right away, that would be great.)

@max-sixty

This comment was marked as off-topic.

@eitsupi

This comment was marked as off-topic.

@eitsupi

This comment was marked as off-topic.

@eitsupi

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler language-design Changes to PRQL-the-language
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants