Skip to content

Clean state in HashJoinExecutor by watermark #6268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #6042
yuhao-su opened this issue Nov 9, 2022 · 3 comments · Fixed by #7379
Closed
Tracked by #6042

Clean state in HashJoinExecutor by watermark #6268

yuhao-su opened this issue Nov 9, 2022 · 3 comments · Fixed by #7379
Assignees
Labels
type/feature Type: New feature.

Comments

@yuhao-su
Copy link
Contributor

yuhao-su commented Nov 9, 2022

as title

@yuhao-su yuhao-su added the type/feature Type: New feature. label Nov 9, 2022
@yuhao-su yuhao-su self-assigned this Nov 9, 2022
@github-actions github-actions bot added this to the release-0.1.14 milestone Nov 9, 2022
@yuhao-su yuhao-su mentioned this issue Nov 9, 2022
22 tasks
@jon-chuang
Copy link
Contributor

Oh, actually, how to do this? Since the Join state key is a HashKey, so one can't easily clean by scanning a watermark range.

@yuhao-su
Copy link
Contributor Author

Oh, actually, how to do this? Since the Join state key is a HashKey, so one can't easily clean by scanning a watermark range.

You are right. Cleaning the state in storage is easier.

However, to clean the operator cache, we should do more"

  1. we are using a hash table with join key as key, we should used a ordered data structure.
  2. we are using HashKey as the key in JoinHashMap. We should make sure the event time order is consistent with the HashKey order (not hard to do that).

@jon-chuang
Copy link
Contributor

jon-chuang commented Nov 18, 2022

Cleaning the state in storage is easier.

Oh yes, indeed, we will use the join key columns rather than their hash as the key for the StateTable. So if the first column is watermark column, then we can clean the StateTable.

Well, perhaps we do not need to worry about hashjoin operator cache, as LRU can take care of the state cleaning in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Type: New feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants