-
Notifications
You must be signed in to change notification settings - Fork 63
Adds RANK, DENSE_RANK, LAG, LEAD, & ROW_NUMBER Window Functions & Operator #1746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1746 +/- ##
=========================================
Coverage 80.03% 80.03%
Complexity 47 47
=========================================
Files 19 19
Lines 506 506
Branches 23 23
=========================================
Hits 405 405
Misses 88 88
Partials 13 13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
6111dc6
to
eb013ab
Compare
* Adds parsing, planning, and evaluation of INTERVAL * Adds several functions pertaining to INTERVAL * Adds coercions to/from INTERVAL
Updates pull request template
… clause Adds AST modeling for window functions Adds plan modeling for window functions Deprecates old window function AST APIs due to incompatibility Adds RANK, DENSE_RANK window functions Adds ROW_NUMBER window function Adds LAG and LEAD window functions Adds support for referencing window clause from window functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished Parser and Ast Package. As always, very delightful code : ).
One preliminary comment:
Perhaps it is worth to track the feature gap between components if anything exists.
i.e., if something is supported in parsing but not supported in planning, etc.
existingWindowName=symbolPrimitive? | ||
partition=windowPartitionClause? | ||
order=orderByClause? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting: you can further partition/order an existing window...
windowClause | ||
: WINDOW windowDefinition (COMMA windowDefinition)* | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a personal notes to remind myself when review further section:
It seems like we can parse statement like
SELECT w1 FROM employee AS t WINDOW w1 AS (PARTITION BY t.b ORDER BY t.c);
Curious if we support this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do 😄 . That's how we can share the computation of the window across multiple functions.
@@ -49,61 +86,100 @@ public SFW(@NotNull Select select, @Nullable Exclude exclude, @NotNull From from | |||
this.where = where; | |||
this.groupBy = groupBy; | |||
this.having = having; | |||
this.windowClause = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backward compatible. Very nice : )
*/ | ||
@Builder(builderClassName = "Builder") | ||
@EqualsAndHashCode(callSuper = false) | ||
public static final class NoArg extends WindowFunctionType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, mind evaluating why the choice of NoArg
vs say Ranking
and RowNumber
.
So seemingly for one we are modeling using number of args, for the other we are modeling using "function characteristic" for lack of better words.
I understand that both Ranking
and RowNumber
must not contains argument... And no arg is, well, no arg... and you don't have to worry about argument type....
But perhaps using NoArg
is bit over-smart, in that it might cause difficulty when (especially a user is) constructing the ast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, well, a user can just use a factory method that we provide. We could do what you're proposing, but also the enum is also a small footprint. I'm not sold in either direction. If you feel strongly, I don't mind changing it to be that way.
* Represents a reference to a window. | ||
* @see ExprWindowFunction#getWindowReference() | ||
*/ | ||
public abstract class WindowReference extends AstNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the ast node? can we inline the two?
i.e., just use WindowSpecification
@Nullable
private final Identifier.Simple existingName;
@Nullable
private final WindowPartitionClause partitionClause;
@Nullable
private final WindowOrderClause orderClause;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure what you mean.
If you're asking why can't we just move all those attributes inside a non-abstract WindowReference, well they are syntactically different. So, we couldn't accurately unparse window references. Though, I do get what you mean, because the existingName, if not null, could just be the reference.
If it will never matter to unparse, then we can collapse. LMK your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see:
Currently:
rowNumber() OVER w1 // AST: window name
rowNumber() OVER (PARTITION BY ... ORDER BY ....) // AST: InLineSpecification
If we remove window reference and just use WindowSpecification
rowNumber() OVER w1 // windowSpec(existing window: w1, partitionBy: null, orderBy: null)
rowNumber() OVER (PARTITION BY ... ORDER BY ....) // AST: // windowSpec(existing window: null, partitionBy: ..., orderBy: ...)
Un-parsing:
rowNumber() OVER (w1) // extra parenthesis.
rowNumber() OVER (PARTITION BY ... ORDER BY ....) // AST: // windowSpec(existing window: null, partitionBy: ..., orderBy: ...)
Perhaps an more apparent example would be:
// Original input
rowNumber() OVER w1
rowNumber() OVER (w1)
// Un-parsing result:
rowNumber() OVER (w1)
rowNumber() OVER (w1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally lean over towards combining those for simplicity in the AST.
But I will leave the decision to other members who more actively working on the un-parsing use case...
val offset = ctx.offset.text.toLong() | ||
val default = ctx.default_.let { visitExpr(it) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val offset = ctx.offset.text.toLong() | |
val default = ctx.default_.let { visitExpr(it) } | |
val offset = ctx.offset?.text?.toLong() | |
val default = ctx.default_?.let { visitExpr(it) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch 👍
- Added parsing, planning, and execution of the interval type and interval values. | ||
- Added multiple operations involving datetime and interval values. | ||
- Added planning and evaluation of the LET clause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(side-comment) this PR probably needs to be rebased off of main
. currently, this PR is showing changes that I assume were not yet in main
(e.g. interval, let clause) at the time the PR was created, leading to a larger than expected diff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving some initial comments for the parsing + AST modeling. I'll need some more time to read through the rest of the PR
## [Unreleased](https://TODO.com) - YYYY-MM-DD | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need an update to the window function additions + new deprecations
/** | ||
* This test file tests Common Table Expressions. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | |
* This test file tests Common Table Expressions. | |
*/ | |
/** | |
* This test file tests window functions and window clause. | |
*/ |
*/ | ||
windowFunction | ||
: func=(LAG|LEAD) PAREN_LEFT expr ( COMMA expr (COMMA expr)?)? PAREN_RIGHT over #LagLeadFunction | ||
* RFC: https://github.com/partiql/partiql-docs/issues/31. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing some context. Why is this issue being linked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, is an updated/new RFC needed for window clause + window functions?
* EBNF 2023: | ||
* <window function> ::= <window function type> OVER <window name or specification> | ||
*/ | ||
windowFunction: funcType=windowFunctionType OVER spec=windowNameOrSpecification; | ||
|
||
/** | ||
* EBNF 2023: | ||
* <window function type> ::= | ||
* <rank function type> <left paren> <right paren> | ||
* | ROW_NUMBER <left paren> <right paren> | ||
* | <lead or lag function> | ||
* | and more... | ||
*/ | ||
windowFunctionType | ||
: rankFunctionType PAREN_LEFT PAREN_RIGHT # WindowFunctionTypeRank | ||
| ROW_NUMBER PAREN_LEFT PAREN_RIGHT # WindowFunctionTypeRowNumber | ||
| leadOrLagFunction # WindowFunctionTypeLeadOrLag | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than have the window function type be hard-coded, could we do something similar to what we do for scalar and aggregations (i.e. the modeling of functionCall
uses a qualifiedName
rather than any hard-coded names)? Doing so will make it a lot easier to add new window functions and also user-defined window functions in the future.
Relevant prior PRs to get rid of hard-coded scalar and aggregations from the antlr grammar
// TODO: In the future: @Nullable private final WindowFrameClause frameClause; | ||
|
||
@Nullable | ||
private final Identifier.Simple existingName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we just call this name
? The SQL2023 spec uses <existing window name>
but it just wraps a <window name>
/** | ||
* Represents a window specification's order by clause. | ||
* @see WindowSpecification#getOrderClause() | ||
*/ | ||
@Builder(builderClassName = "Builder") | ||
@EqualsAndHashCode(callSuper = false) | ||
public final class WindowOrderClause extends AstNode { | ||
@NotNull | ||
private final List<Sort> sorts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to reuse OrderBy
here? Syntactically they seem the same.
@Nullable | ||
private final WindowPartitionClause partitionClause; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this additional WindowPartitionClause
is unnecessary and we could just have this be a NonNull
List<WindowPartition>
? I get that the SQL2023 spec defines the clause as optional under <window specification details>
, but if you'll notice the <window partition column reference list>
will always contain at least one <window partition column reference>
. That may be harder to enforce with this AST modeling since you could define a non-null partitionClause
that has an empty list of WindowPartition
s.
Also, it can be confusing on the proper way to define a WindowSpecification
without any WindowPartition
s with this modeling. Is the correct way
- a
WindowSpecification
with a nullpartitionClause
or - a
WindowSpecification
with a non-nullpartitionClause
that has an emptyWindowPartition
s list
/** | ||
* Represents a window function type. | ||
* @see ExprWindowFunction#getFunctionType() | ||
*/ | ||
public abstract class WindowFunctionType extends AstNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the parser comment, I'm wondering if we should change the window functions to not be as hard-coded within the AST modeling and follow something more like scalar and aggregate function calls. IMO this can make it a lot more straightforward to add future support for user-defined window functions.
Trino for example models window functions (along with scalars and aggregates) using FunctionsCall. For all of the function names, they just use a QualifiedName
rather than any sort of hard-coded names within their AST, leaving the validation to a later stage. I'm not sure if we should lump all three of those under ExprCall
but modeling window functions with just an identifier name rather than the WindowFunctionType
could be a cleaner modeling as more window functions are added over time.
*/ | ||
@Builder(builderClassName = "Builder") | ||
@EqualsAndHashCode(callSuper = false) | ||
public static final class InLineSpecification extends WindowReference { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me why we need a separate in-line window specification class. Perhaps a simpler modeling could follow what Trino does with their Window interface.
The Window
interface is implemented by two separate classes
Following a similar modeling, we could have a Window
abstract class that is used by the ExprWindowFunction
.
private final Expr extent; | ||
private final Long offset; | ||
private final Expr defaultValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If window functions were modeled more closely to ExprCall
, these could all be function arguments rather than hard-coded properties.
it.specification.existingName?.let { | ||
env.listener.report(PErrors.featureNotSupported("Window referencing other window")) | ||
} | ||
windows.addWindowDef(it.name.text, it.specification) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- In case of duplicated window name, the current behavior seems to be overriding, is this intensional?
_w AS (PARTITION BY ... ORDER BY ... ASC),
_w AS (PARTITION BY ... ORDER BY ... DESC)
- Is it safe to use the text representation instead of identifier, esp when considering case sensitivity...
_w AS (PARTITION BY ... ORDER BY ... ASC),
"_w" as (PARTITION BY ... ORDER BY ... DESC);
private val definitions = mutableMapOf<String, InternalWindowDefinition>() | ||
private val inLineWindowDefs = mutableListOf<InternalWindowDefinition>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NON-BLOCKING
For planning purpose: one potential optimization we can do is to attempts to assign/reuse window definition.
Consider:
LAG() OVER (PARTITION BY foo ORDER BY BAR)
LEAD() OVER(PARTITION BY foo ORDER BY BAR)
=>
LAG() OVER $_W1, LEAD() OVER $_W1
WINDOW $_W1 AS(PARTITION BY foo ORDER BY bar)
"lag", "lead" -> lagOrLead(n, args[0], args[1], args[2], ignoreNulls) | ||
else -> null | ||
} | ||
else -> throw IllegalArgumentException("Unknown window function: $name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why the above branches returns null
for func not found and this branch throw?
private const val FALLBACK: String = "UNKNOWN" | ||
|
||
@JvmStatic | ||
fun successTestCases() = listOf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | ||
* Reset the function to its initial state. | ||
*/ | ||
abstract fun reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate on why is this necessary?
Description
Deprecated
Window Design
WindowPartition
much easier (and fairly performant).WindowFunction
and the plan and evaluator's definition of aWindowFunctionNode
andWindowFunction
. This PR acts this way to maintain a separation of concerns. For now, we are only modeling the built-ins as provided by the database system, however, the APIs can be extended in the future to accept and converts SPI-provided window functions.Future Changes
RelWindow
,WindowFunctionNode
, andWindowFunction
will be need to be updated to support frames, which is totally fine. ForWindowFunction
, we'll likely need to add a new method that takes as input some frame information, which by default should call the existing (probably deprecated in the future)eval
method. This will not break the public API, nor would it break the existing customer flow.Other Information
License Information
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.