Skip to content

Use a set instead of vec to represent the group-by clause in the Agg operator #8388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chenzl25 opened this issue Mar 7, 2023 · 5 comments · Fixed by #9390
Closed

Use a set instead of vec to represent the group-by clause in the Agg operator #8388

chenzl25 opened this issue Mar 7, 2023 · 5 comments · Fixed by #9390
Assignees
Milestone

Comments

@chenzl25
Copy link
Contributor

chenzl25 commented Mar 7, 2023

          > > Theoretically, the ordering in group-by clause doesn't matter, however, in our LogicalAgg implementation, we store them in a vec so the ordering matter in terms of common sub-plan detection.

Does this mean that extracting common plan node with Eq is too strict? It'll be nice if we can share Agg in this case with a Project automatically inserted.

Actually, I prefer to replace the group-by vec with abitmap.

Originally posted by @chenzl25 in #8159 (comment)

The advantages of using a set:

  • Dedup group key automatically.
  • Provide more opportunities for common sub-plan sharing optimization.
@github-actions github-actions bot added this to the release-0.1.18 milestone Mar 7, 2023
@liurenjie1024
Copy link
Contributor

Or an ordered vec?

@chenzl25
Copy link
Contributor Author

chenzl25 commented Mar 7, 2023

Or an ordered vec?

The ordered vec might have duplicate keys.

@liurenjie1024
Copy link
Contributor

LGTM

@st1page
Copy link
Contributor

st1page commented Mar 7, 2023

Sometimes group keys matter e.g. we need to permute the group keys with watermarks #6452. or we need to break some assumption of the streaming executor( the group key is not the fitst N columns in the state table's primary key).

@chenzl25
Copy link
Contributor Author

chenzl25 commented Mar 7, 2023

Sometimes group keys matter e.g. we need to permute the group keys with watermarks #6452. or we need to break some assumption of the streaming executor( the group key is not the fitst N columns in the state table's primary key).

Yes, the group-by set defines the schema of the Agg operator, but we can choose to implement the state table with a primary key in any order you want e.g. based on the watermark information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants