Skip to content

feat(expr): implement array_flatten #21640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 6, 2025
Merged

feat(expr): implement array_flatten #21640

merged 5 commits into from
May 6, 2025

Conversation

stdrc
Copy link
Member

@stdrc stdrc commented Apr 29, 2025

Many query engines/databases have builtin function array_flatten or flatten to flatten an array of arrays. And it is also requested by our user. This PR adds it for RisingWave.

For the sake of simplicity and clarity, we implement this function following Snowflake, Trino, DuckDB, which means, this function works in a non-recursive way, only flattening one level of nesting.

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

@stdrc stdrc changed the title implement array_flatten feat(expr): implement array_flatten Apr 29, 2025
Copy link
Member Author

stdrc commented Apr 29, 2025

@stdrc stdrc marked this pull request as ready for review April 29, 2025 08:00
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements an array_flatten function for RisingWave, adding native support for flattening one level of nesting in arrays. The key changes include:

  • Updating optimizer and expression visitors to handle ARRAY_FLATTEN.
  • Adding a new builtin scalar function in the binder for array_flatten.
  • Implementing the array_flatten logic in a new scalar module and updating type handling and proto definitions accordingly.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/frontend/src/optimizer/plan_expr_visitor/strong.rs Added ARRAY_FLATTEN support in the strong visitor.
src/frontend/src/expr/pure.rs Added ARRAY_FLATTEN into the impure analyzer’s type list.
src/frontend/src/binder/expr/function/builtin_scalar.rs Introduced the array_flatten function with appropriate type checks and error messages.
src/expr/impl/src/scalar/mod.rs Included the new array_flatten module.
src/expr/impl/src/scalar/array_flatten.rs New implementation for array_flatten following Snowflake/DuckDB behavior.
src/common/src/types/mod.rs Added a new conversion method into_list_element_type for type consistency.
proto/expr.proto Extended the enum to include ARRAY_FLATTEN.
Comments suppressed due to low confidence (1)

src/common/src/types/mod.rs:513

  • Consider renaming the 'as_list' method or updating its usage so that the naming is consistent with 'into_list_element_type' elsewhere in the code, which would improve clarity for the list element extraction methods.
/// TODO(rc): rename to `as_list_element_type`

@stdrc stdrc added the user-facing-changes Contains changes that are visible to users label Apr 29, 2025
Copy link
Contributor

Hi, there.

📝 Telemetry Reminder:
If you're implementing this feature, please consider adding telemetry metrics to track its usage. This helps us understand how the feature is being used and improve it further.
You can find the function report_event of telemetry reporting in the following files. Feel free to ask questions if you need any guidance!

  • src/frontend/src/telemetry.rs
  • src/meta/src/telemetry.rs
  • src/stream/src/telemetry.rs
  • src/storage/compactor/src/telemetry.rs
    Or calling report_event_common (src/common/telemetry_event/src/lib.rs) as if finding it hard to implement.
    ✨ Thank you for your contribution to RisingWave! ✨

This is an automated comment created by the peaceiris/actions-label-commenter. Responding to the bot or mentioning it won't have any effect.

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

///
/// ```slt
/// query T
/// select array_flatten(array[array[1, 2], array[3, 4]]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized array_flatten(arr) is equivalent to genuine_array_concat(VARIADIC arr). However:

  • This hypothetical genuine_array_concat is different from PostgreSQL/RisingWave array_cat, which has a wired behavior on >= 2d array due to its weak array typing (int[][] is same as int[])
  • #14753 implements VARIADIC as separate exprs but in PostgreSQL there is a single concat_ws / jsonb_extract_path

Just sharing some related ideas. Nothing to change in this PR.

stdrc added 3 commits May 6, 2025 14:14
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
@stdrc stdrc force-pushed the rc/array-flatten branch from 70976fa to de4869a Compare May 6, 2025 06:14
Signed-off-by: Richard Chien <[email protected]>
@stdrc stdrc enabled auto-merge May 6, 2025 07:01
@stdrc stdrc requested review from BugenZhao and xiangjinwu May 6, 2025 07:12
Signed-off-by: Richard Chien <[email protected]>
@stdrc stdrc added this pull request to the merge queue May 6, 2025
Merged via the queue into main with commit cce7b77 May 6, 2025
31 checks passed
@stdrc stdrc deleted the rc/array-flatten branch May 6, 2025 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants