-
Notifications
You must be signed in to change notification settings - Fork 0
Add additional sub partitions to wait for in query API #752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update introduces support for specifying additional partition dependencies in Airflow by extending the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PythonAPI
participant ThriftAPI
participant AirflowHelpers
User->>PythonAPI: Call Query(..., sub_partitions_to_wait_for)
PythonAPI->>ThriftAPI: Construct Query with subPartitionsToWaitFor
PythonAPI->>AirflowHelpers: Pass additional_partitions to create_airflow_dependency
AirflowHelpers-->>AirflowHelpers: Append additional_partitions to dependency spec
Poem
Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (10)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
5feadc3
to
2936e3d
Compare
2936e3d
to
a4a32d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
api/python/ai/chronon/airflow_helpers.py (2)
51-56
: Simplify function name for consistency.Current name is unnecessarily verbose.
-def _get_additional_subPartitionsToWaitFor_from_query(query): +def _get_sub_partitions_from_query(query): """Gets additional subPartitionsToWaitFor from query if available""" if query: return query.subPartitionsToWaitFor return None
75-77
: Improve tuple assignment readability.The current format packs unrelated values together in a confusing way.
# For the events case: - source_partition_column, additional_partitions = ( - _get_partition_col_from_query(source.events.query) or partition_column, _get_additional_subPartitionsToWaitFor_from_query(source.events.query) - ) + source_partition_column = _get_partition_col_from_query(source.events.query) or partition_column + additional_partitions = _get_additional_subPartitionsToWaitFor_from_query(source.events.query) # And similar change for the entities caseAlso applies to: 84-86
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
api/python/ai/chronon/airflow_helpers.py
(4 hunks)api/python/ai/chronon/group_by.py
(2 hunks)api/python/ai/chronon/query.py
(2 hunks)api/thrift/api.thrift
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
api/python/ai/chronon/airflow_helpers.py (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)
query
(382-390)
🪛 Ruff (0.8.2)
api/python/ai/chronon/group_by.py
21-21: ai.chronon.airflow_helpers
imported but unused
Remove unused import: ai.chronon.airflow_helpers
(F401)
🔇 Additional comments (4)
api/thrift/api.thrift (1)
43-47
: New field added for additional partition wait conditions.The new
subPartitionsToWaitFor
field allows specifying multiple partition dependencies.api/python/ai/chronon/query.py (2)
31-31
: New parameter added to support sub-partitions.This parameter correctly exposes the new Thrift field.
85-95
: Improved function call with named parameters.Converting to named parameters enhances readability.
api/python/ai/chronon/airflow_helpers.py (1)
8-42
: Enhanced dependency creation with additional partitions.Function now supports multiple partition dependencies.
:param sub_partitions_to_wait_for: | ||
Additional partitions to be used in sensing that the source data has landed. Should be a full partition string, such as `hr=23:00' | ||
:type partition_column: List[str], optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix parameter type documentation.
The type annotation references the wrong parameter.
:param sub_partitions_to_wait_for:
Additional partitions to be used in sensing that the source data has landed. Should be a full partition string, such as `hr=23:00'
-:type partition_column: List[str], optional
+:type sub_partitions_to_wait_for: List[str], optional
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
:param sub_partitions_to_wait_for: | |
Additional partitions to be used in sensing that the source data has landed. Should be a full partition string, such as `hr=23:00' | |
:type partition_column: List[str], optional | |
:param sub_partitions_to_wait_for: | |
Additional partitions to be used in sensing that the source data has landed. Should be a full partition string, such as `hr=23:00' | |
:type sub_partitions_to_wait_for: List[str], optional |
## Summary Add additional sub partitions to wait for in Query to ultimately compile into GroupBy and Join ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added support for specifying additional sub-partitions when defining data dependencies, allowing for more granular detection of data arrival. - **Documentation** - Updated function and parameter documentation to reflect new options for specifying sub-partitions in queries. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: ezvz <[email protected]>
## Summary Add additional sub partitions to wait for in Query to ultimately compile into GroupBy and Join ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added support for specifying additional sub-partitions when defining data dependencies, allowing for more granular detection of data arrival. - **Documentation** - Updated function and parameter documentation to reflect new options for specifying sub-partitions in queries. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: ezvz <[email protected]>
## Summary Add additional sub partitions to wait for in Query to ultimately compile into GroupBy and Join ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added support for specifying additional sub-partitions when defining data dependencies, allowing for more granular detection of data arrival. - **Documentation** - Updated function and parameter documentation to reflect new options for specifying sub-partitions in queries. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: ezvz <[email protected]>
Summary
Add additional sub partitions to wait for in Query to ultimately compile into GroupBy and Join
Checklist
Summary by CodeRabbit