-
Notifications
You must be signed in to change notification settings - Fork 70
CHIP : Incremental feature aggregation #979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
accuracy=Accuracy.SNAPSHOT | ||
) | ||
``` | ||
To compute above groupBy incrementally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative approach is to store an intermediate 'tiled' representation: each day store the aggregate for just that day, then compute the longer windows from the intermediate.
e.g. For the above example, store the count of inp_col
each day, then your 3 and 10 day windows just need to sum those intermediate counts to get the final values.
The benefit here is it works for almost any kind of aggregation, including max, min etc.
I'm fairly sure this is how the 'tiled architecture' works for the online flow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blrnw3 Yes. I am going to change the architecture. Going to get the daily aggregations and store it in table. The only change would be the way we store the IRs. For example, for avg, we need to store both sum/count.
Summary
Proposal to support incremental aggregations.
Why / Goal
Test Plan
Checklist
Reviewers