Skip to content

Adds RANK, DENSE_RANK, LAG, LEAD, & ROW_NUMBER Window Functions & Operator #1746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

johnedquinn
Copy link
Member

@johnedquinn johnedquinn commented Mar 10, 2025

Description

  • Adds parsing, planning, and evaluation support for windows and window functions
    • Parsing support for the window clause itself (and window references) has not been implemented, however, their modeling has been carved out for future support.
  • Adds end-to-end support for the window clause of a SFW query. Window functions can reference the user-defined window -- allowing them to share the computation of the windows across multiple window functions (bringing down the computational cost several-fold).
  • Adds RANK, DENSE_RANK, LAG, LEAD, & ROW_NUMBER window functions

Deprecated

  • Deprecated the previous AST modeling of window functions. We weren't using it, so it slipped by the 1.0 package audit. It does not follow the SQL-spec and is not able to be updated to support the SQL spec.

Window Design

  • Researched implementations previously implemented in PLK (WindowFunction, Window Plan Node), Trino (Plan WindowNode, Compiled WindowOperator, Compiled WindowFunction), and PostgreSQL (Compiled window functions, Compiled Lead/Lag func).
  • This PR, given optional ORDER BY and PARTITION BY clauses on a window specification, will inject a RelOpSort as input to the compiled RelOpWindow. The RelOpSort will sort by both the partition keys and sort keys -- which allows our implementation of the RelOpWindow to grab contiguous rows of a particular partition and peer group (sort group). This makes creation of a WindowPartition much easier (and fairly performant).
    • With the modeling exposed, we can always provide additional metadata to indicate whether some columns are pre-sorted. This may be added in the future for further optimization.
  • In the future, we may add a new WindowFunction API that takes in frame information as well.
  • This PR does NOT create a mechanism for loading SPI implementations of a window function. This PR intentionally creates public interfaces in the plan and evaluator. We may, in the future, add the ability to load SPI-provided window functions and perform a conversion between SPI's definition of a WindowFunction and the plan and evaluator's definition of a WindowFunctionNode and WindowFunction. This PR acts this way to maintain a separation of concerns. For now, we are only modeling the built-ins as provided by the database system, however, the APIs can be extended in the future to accept and converts SPI-provided window functions.
    • This was and still is my personal opinion on tables, scalar functions, and aggregate functions. It allows for a tiny SPI package in the future if we'd like.

Future Changes

  • I anticipate that the definition of a RelWindow, WindowFunctionNode, and WindowFunction will be need to be updated to support frames, which is totally fine. For WindowFunction, we'll likely need to add a new method that takes as input some frame information, which by default should call the existing (probably deprecated in the future) eval method. This will not break the public API, nor would it break the existing customer flow.

Other Information

  • Updated Unreleased Section in CHANGELOG: NO: Not yet
  • Any backward-incompatible changes? NO: Not technically.
  • Any new external dependencies? NO
  • Do your changes comply with the Contributing Guidelines and Code Style Guidelines? YES

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-commenter
Copy link

codecov-commenter commented Mar 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.03%. Comparing base (4bf549b) to head (c506f02).
Report is 30 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #1746   +/-   ##
=========================================
  Coverage     80.03%   80.03%           
  Complexity       47       47           
=========================================
  Files            19       19           
  Lines           506      506           
  Branches         23       23           
=========================================
  Hits            405      405           
  Misses           88       88           
  Partials         13       13           
Flag Coverage Δ
EXAMPLES 80.03% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@johnedquinn johnedquinn force-pushed the window branch 2 times, most recently from 6111dc6 to eb013ab Compare March 11, 2025 21:46
@johnedquinn johnedquinn changed the title [DRAFT] Adds RANK, DENSE_RANK, LAG, LEAD, & ROW_NUMBER Window Functions & Operator Adds RANK, DENSE_RANK, LAG, LEAD, & ROW_NUMBER Window Functions & Operator Mar 11, 2025
@johnedquinn johnedquinn marked this pull request as ready for review March 11, 2025 22:01
@johnedquinn johnedquinn requested review from jpschorr and yliuuuu March 11, 2025 22:02
johnedquinn and others added 3 commits March 12, 2025 08:09
* Adds parsing, planning, and evaluation of INTERVAL
* Adds several functions pertaining to INTERVAL
* Adds coercions to/from INTERVAL
Updates pull request template
… clause

Adds AST modeling for window functions

Adds plan modeling for window functions

Deprecates old window function AST APIs due to incompatibility

Adds RANK, DENSE_RANK window functions

Adds ROW_NUMBER window function

Adds LAG and LEAD window functions

Adds support for referencing window clause from window functions
Copy link
Contributor

@yliuuuu yliuuuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished Parser and Ast Package. As always, very delightful code : ).

One preliminary comment:

Perhaps it is worth to track the feature gap between components if anything exists.

i.e., if something is supported in parsing but not supported in planning, etc.

Comment on lines +1061 to +1063
existingWindowName=symbolPrimitive?
partition=windowPartitionClause?
order=orderByClause?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting: you can further partition/order an existing window...

Comment on lines +212 to +214
windowClause
: WINDOW windowDefinition (COMMA windowDefinition)*
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a personal notes to remind myself when review further section:

It seems like we can parse statement like

SELECT w1 FROM employee AS t WINDOW w1 AS (PARTITION BY t.b ORDER BY t.c);

Curious if we support this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do 😄 . That's how we can share the computation of the window across multiple functions.

@@ -49,61 +86,100 @@ public SFW(@NotNull Select select, @Nullable Exclude exclude, @NotNull From from
this.where = where;
this.groupBy = groupBy;
this.having = having;
this.windowClause = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backward compatible. Very nice : )

*/
@Builder(builderClassName = "Builder")
@EqualsAndHashCode(callSuper = false)
public static final class NoArg extends WindowFunctionType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, mind evaluating why the choice of NoArg vs say Ranking and RowNumber.

So seemingly for one we are modeling using number of args, for the other we are modeling using "function characteristic" for lack of better words.

I understand that both Ranking and RowNumber must not contains argument... And no arg is, well, no arg... and you don't have to worry about argument type....

But perhaps using NoArg is bit over-smart, in that it might cause difficulty when (especially a user is) constructing the ast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, well, a user can just use a factory method that we provide. We could do what you're proposing, but also the enum is also a small footprint. I'm not sold in either direction. If you feel strongly, I don't mind changing it to be that way.

* Represents a reference to a window.
* @see ExprWindowFunction#getWindowReference()
*/
public abstract class WindowReference extends AstNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the ast node? can we inline the two?

i.e., just use WindowSpecification

    @Nullable
    private final Identifier.Simple existingName;

    @Nullable
    private final WindowPartitionClause partitionClause;

    @Nullable
    private final WindowOrderClause orderClause;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure what you mean.

If you're asking why can't we just move all those attributes inside a non-abstract WindowReference, well they are syntactically different. So, we couldn't accurately unparse window references. Though, I do get what you mean, because the existingName, if not null, could just be the reference.

If it will never matter to unparse, then we can collapse. LMK your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see:

Currently:

rowNumber() OVER w1 // AST: window name
rowNumber() OVER (PARTITION BY ... ORDER BY ....) // AST: InLineSpecification

If we remove window reference and just use WindowSpecification

rowNumber() OVER w1 // windowSpec(existing window: w1, partitionBy: null, orderBy: null)
rowNumber() OVER (PARTITION BY ... ORDER BY ....) // AST: // windowSpec(existing window: null, partitionBy: ..., orderBy: ...)

Un-parsing:

rowNumber() OVER (w1) // extra parenthesis. 
rowNumber() OVER (PARTITION BY ... ORDER BY ....) // AST: // windowSpec(existing window: null, partitionBy: ..., orderBy: ...)

Perhaps an more apparent example would be:
// Original input

rowNumber() OVER w1
rowNumber() OVER (w1)

// Un-parsing result:

rowNumber() OVER (w1)
rowNumber() OVER (w1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally lean over towards combining those for simplicity in the AST.

But I will leave the decision to other members who more actively working on the un-parsing use case...

Comment on lines 2027 to 2028
val offset = ctx.offset.text.toLong()
val default = ctx.default_.let { visitExpr(it) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val offset = ctx.offset.text.toLong()
val default = ctx.default_.let { visitExpr(it) }
val offset = ctx.offset?.text?.toLong()
val default = ctx.default_?.let { visitExpr(it) }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch 👍

@johnedquinn
Copy link
Member Author

@yliuuuu added a commit to fix the nullable references to lag/lead arguments.

Regarding any potential feature gaps, I've cut: #1748. The only other thing not supported is the IGNORE/RESPECT NULLS for the LAG/LEAD functions. See the test in the planner -- that should be sufficient for tracking.

@johnedquinn johnedquinn requested a review from yliuuuu March 19, 2025 15:31
Comment on lines +46 to +48
- Added parsing, planning, and execution of the interval type and interval values.
- Added multiple operations involving datetime and interval values.
- Added planning and evaluation of the LET clause
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(side-comment) this PR probably needs to be rebased off of main. currently, this PR is showing changes that I assume were not yet in main (e.g. interval, let clause) at the time the PR was created, leading to a larger than expected diff

Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some initial comments for the parsing + AST modeling. I'll need some more time to read through the rest of the PR

Comment on lines +26 to +27
## [Unreleased](https://TODO.com) - YYYY-MM-DD

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need an update to the window function additions + new deprecations

Comment on lines +12 to +14
/**
* This test file tests Common Table Expressions.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
* This test file tests Common Table Expressions.
*/
/**
* This test file tests window functions and window clause.
*/

*/
windowFunction
: func=(LAG|LEAD) PAREN_LEFT expr ( COMMA expr (COMMA expr)?)? PAREN_RIGHT over #LagLeadFunction
* RFC: https://github.com/partiql/partiql-docs/issues/31.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing some context. Why is this issue being linked?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, is an updated/new RFC needed for window clause + window functions?

Comment on lines 991 to 1008
* EBNF 2023:
* <window function> ::= <window function type> OVER <window name or specification>
*/
windowFunction: funcType=windowFunctionType OVER spec=windowNameOrSpecification;

/**
* EBNF 2023:
* <window function type> ::=
* <rank function type> <left paren> <right paren>
* | ROW_NUMBER <left paren> <right paren>
* | <lead or lag function>
* | and more...
*/
windowFunctionType
: rankFunctionType PAREN_LEFT PAREN_RIGHT # WindowFunctionTypeRank
| ROW_NUMBER PAREN_LEFT PAREN_RIGHT # WindowFunctionTypeRowNumber
| leadOrLagFunction # WindowFunctionTypeLeadOrLag
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than have the window function type be hard-coded, could we do something similar to what we do for scalar and aggregations (i.e. the modeling of functionCall uses a qualifiedName rather than any hard-coded names)? Doing so will make it a lot easier to add new window functions and also user-defined window functions in the future.

Relevant prior PRs to get rid of hard-coded scalar and aggregations from the antlr grammar

// TODO: In the future: @Nullable private final WindowFrameClause frameClause;

@Nullable
private final Identifier.Simple existingName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we just call this name? The SQL2023 spec uses <existing window name> but it just wraps a <window name>

Comment on lines +10 to +18
/**
* Represents a window specification's order by clause.
* @see WindowSpecification#getOrderClause()
*/
@Builder(builderClassName = "Builder")
@EqualsAndHashCode(callSuper = false)
public final class WindowOrderClause extends AstNode {
@NotNull
private final List<Sort> sorts;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to reuse OrderBy here? Syntactically they seem the same.

Comment on lines +24 to +25
@Nullable
private final WindowPartitionClause partitionClause;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this additional WindowPartitionClause is unnecessary and we could just have this be a NonNull List<WindowPartition>? I get that the SQL2023 spec defines the clause as optional under <window specification details>, but if you'll notice the <window partition column reference list> will always contain at least one <window partition column reference>. That may be harder to enforce with this AST modeling since you could define a non-null partitionClause that has an empty list of WindowPartitions.

Also, it can be confusing on the proper way to define a WindowSpecification without any WindowPartitions with this modeling. Is the correct way

  1. a WindowSpecification with a null partitionClause or
  2. a WindowSpecification with a non-null partitionClause that has an empty WindowPartitions list

Comment on lines +13 to +17
/**
* Represents a window function type.
* @see ExprWindowFunction#getFunctionType()
*/
public abstract class WindowFunctionType extends AstNode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the parser comment, I'm wondering if we should change the window functions to not be as hard-coded within the AST modeling and follow something more like scalar and aggregate function calls. IMO this can make it a lot more straightforward to add future support for user-defined window functions.

Trino for example models window functions (along with scalars and aggregates) using FunctionsCall. For all of the function names, they just use a QualifiedName rather than any sort of hard-coded names within their AST, leaving the validation to a later stage. I'm not sure if we should lump all three of those under ExprCall but modeling window functions with just an identifier name rather than the WindowFunctionType could be a cleaner modeling as more window functions are added over time.

*/
@Builder(builderClassName = "Builder")
@EqualsAndHashCode(callSuper = false)
public static final class InLineSpecification extends WindowReference {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why we need a separate in-line window specification class. Perhaps a simpler modeling could follow what Trino does with their Window interface.

The Window interface is implemented by two separate classes

Following a similar modeling, we could have a Window abstract class that is used by the ExprWindowFunction.

Comment on lines +67 to +69
private final Expr extent;
private final Long offset;
private final Expr defaultValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If window functions were modeled more closely to ExprCall, these could all be function arguments rather than hard-coded properties.

it.specification.existingName?.let {
env.listener.report(PErrors.featureNotSupported("Window referencing other window"))
}
windows.addWindowDef(it.name.text, it.specification)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. In case of duplicated window name, the current behavior seems to be overriding, is this intensional?
_w AS (PARTITION BY ... ORDER BY ... ASC),
_w AS (PARTITION BY ... ORDER BY ... DESC)
  1. Is it safe to use the text representation instead of identifier, esp when considering case sensitivity...
_w AS (PARTITION BY ... ORDER BY ... ASC),
"_w" as (PARTITION BY ... ORDER BY ... DESC);

Comment on lines +584 to +585
private val definitions = mutableMapOf<String, InternalWindowDefinition>()
private val inLineWindowDefs = mutableListOf<InternalWindowDefinition>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NON-BLOCKING

For planning purpose: one potential optimization we can do is to attempts to assign/reuse window definition.

Consider:

LAG() OVER (PARTITION BY foo ORDER BY BAR) 
LEAD() OVER(PARTITION BY foo ORDER BY BAR)

=>

LAG() OVER $_W1, LEAD() OVER $_W1
WINDOW $_W1 AS(PARTITION BY foo ORDER BY bar)

"lag", "lead" -> lagOrLead(n, args[0], args[1], args[2], ignoreNulls)
else -> null
}
else -> throw IllegalArgumentException("Unknown window function: $name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why the above branches returns null for func not found and this branch throw?

private const val FALLBACK: String = "UNKNOWN"

@JvmStatic
fun successTestCases() = listOf(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more test I think would be beneficial here:

Using window function with group by: ref

Using window function with subquery: ref

Window order does not affect outer order: ref

/**
* Reset the function to its initial state.
*/
abstract fun reset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate on why is this necessary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants