Skip to content

feat: compute tableDependencies #582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Conversation

tchow-zlai
Copy link
Collaborator

@tchow-zlai tchow-zlai commented Apr 3, 2025

Summary

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • New Features

    • Enhanced the data processing workflow by integrating a new step that captures and manages dependencies and metadata before finalizing data outputs. This update ensures proper handling of interrelated data elements.
    • Introduced new functions to manage table dependencies and metadata for various node types, improving the robustness of data processing.
  • Bug Fixes

    • Improved the join operation by defaulting missing inputs to safe values. This change prevents potential errors during data aggregation and enhances overall system reliability.

Copy link

coderabbitai bot commented Apr 3, 2025

Walkthrough

This pull request enhances the Join function by adding default initialization for certain parameters to prevent errors. It also modifies the object writing process by integrating a dependency management step before JSON serialization. A new module, dag_builder.py, has been created to manage table metadata and dependencies, introducing several functions with error handling for various node types.

Changes

File(s) Change Summary
api/python/ai/chronon/join.py Updated Join function: added default initialization for bootstrap_parts, online_external_parts, and right_parts (set to empty lists when None), retaining deep copy logic and events validation.
api/python/ai/chronon/repo/compile.py Added add_dependencies call in _write_obj before JSON serialization.
api/python/ai/chronon/repo/dag_builder.py Introduced functions for managing table metadata and dependencies: add_table_info, add_dependencies, to_dependency, with error handling for unsupported node types.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Writer as _write_obj
    participant Dep as add_dependencies
    participant JSON as _write_obj_as_json

    Client->>Writer: Call _write_obj(obj)
    Writer->>Dep: add_dependencies(obj)
    Dep-->>Writer: Dependencies processed
    Writer->>JSON: _write_obj_as_json(obj)
    JSON-->>Writer: JSON written
Loading

Poem

In our code, a change unfolds,
Defaults now shine as bright as gold.
Dependencies join the rhythmic flow,
As new helpers make the data glow.
A module born, with functions neat—
Code sings in harmony, oh so sweet! 🚀

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between f1709a4 and c5ec3c9.

📒 Files selected for processing (1)
  • api/python/ai/chronon/repo/dag_builder.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/python/ai/chronon/repo/dag_builder.py
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@tchow-zlai tchow-zlai changed the title wip feat: compute tableDependencies Apr 3, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
api/python/ai/chronon/group_by.py (1)

343-343: Debug print statement should use logging.

Replace print with proper logging to follow best practices.

-            print(agg, group_by.metaData.name)
+            LOGGER.debug(f"Validating aggregation {agg} for group_by {group_by.metaData.name}")
api/python/ai/chronon/dag_builder.py (1)

1-1: Unused import.

List is imported but never used.

-from typing import List, Union
+from typing import Union
🧰 Tools
🪛 Ruff (0.8.2)

1-1: typing.List imported but unused

Remove unused import: typing.List

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 0ee9fc1 and 3e5bca5.

📒 Files selected for processing (3)
  • api/python/ai/chronon/dag_builder.py (1 hunks)
  • api/python/ai/chronon/group_by.py (4 hunks)
  • api/python/ai/chronon/join.py (3 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
api/python/ai/chronon/group_by.py (1)
api/python/ai/chronon/dag_builder.py (1)
  • to_dependency (11-54)
api/python/ai/chronon/dag_builder.py (2)
api/python/ai/chronon/group_by.py (3)
  • Window (245-246)
  • TimeUnit (179-181)
  • GroupBy (431-678)
api/python/ai/chronon/join.py (2)
  • JoinPart (31-109)
  • BootstrapPart (360-395)
🪛 Ruff (0.8.2)
api/python/ai/chronon/dag_builder.py

1-1: typing.List imported but unused

Remove unused import: typing.List

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
🔇 Additional comments (13)
api/python/ai/chronon/join.py (4)

26-26: Added import for dependency tracking.

Good addition of dag_builder module for table dependency tracking.


515-520: Adds robust default handling for parameters.

Properly initializes empty lists when parameters are None, preventing potential NoneType errors.


573-574: Creates table dependencies for tracking.

Correctly constructs dependency list by applying to_dependency to left, right_parts, and bootstrap_parts.


582-582: Adds table dependencies to ExecutionInfo.

Properly integrates dependency information into the execution context.

api/python/ai/chronon/group_by.py (3)

25-25: Added import for dependency tracking.

Good addition for consistency with join.py changes.


635-635: Creates table dependencies from sources.

Good implementation of dependency tracking consistent with join.py.


646-646: Adds table dependencies to ExecutionInfo.

Properly integrates dependency information into the execution context.

api/python/ai/chronon/dag_builder.py (6)

2-6: Good imports and organization.

Properly imports necessary types and utilities.


8-10: Clear docstring.

Concise explanation of the function's purpose.


11-15: Handles JoinPart type recursively.

Correctly extracts GroupBy dependency from JoinPart.


15-24: Handles GroupBy type.

Properly constructs TableDependency with metaData.executionInfo.outputTableInfo.


25-38: Handles BootstrapPart type.

Correctly extracts table and partition information to build TableDependency.


39-54: Handles Source type.

Uses utility functions to extract table information and builds dependency correctly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
api/python/ai/chronon/dag_builder.py (3)

5-5: Remove unused import

JoinSource is imported but never used in the code.

-from ai.chronon.api.ttypes import BootstrapPart, GroupBy, JoinPart, Source, JoinSource
+from ai.chronon.api.ttypes import BootstrapPart, GroupBy, JoinPart, Source
🧰 Tools
🪛 Ruff (0.8.2)

5-5: ai.chronon.api.ttypes.JoinSource imported but unused

Remove unused import: ai.chronon.api.ttypes.JoinSource

(F401)


10-10: Unused parameter in function signature

The lag parameter is defined but never used in function body.

Either utilize the lag parameter or remove it from signature if not needed.


40-57: Source node dependency handling lacks explicit type check

Assumes 'else' case is a Source without verifying type.

-    else: # When type of node is a Source
+    elif isinstance(node, Source):

Add an else clause to handle unexpected types:

+    else:
+        raise TypeError(f"Expected Source, GroupBy, JoinPart, or BootstrapPart, got {type(node).__name__}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 3e5bca5 and eb204d3.

📒 Files selected for processing (3)
  • api/python/ai/chronon/dag_builder.py (1 hunks)
  • api/python/ai/chronon/group_by.py (3 hunks)
  • api/python/ai/chronon/join.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/python/ai/chronon/group_by.py
🧰 Additional context used
🧬 Code Definitions (2)
api/python/ai/chronon/dag_builder.py (2)
api/python/ai/chronon/group_by.py (3)
  • TimeUnit (179-181)
  • Window (245-246)
  • GroupBy (430-677)
api/python/ai/chronon/join.py (2)
  • BootstrapPart (360-395)
  • JoinPart (31-109)
api/python/ai/chronon/join.py (1)
api/python/ai/chronon/dag_builder.py (1)
  • to_dependency (10-57)
🪛 Ruff (0.8.2)
api/python/ai/chronon/dag_builder.py

5-5: ai.chronon.api.ttypes.JoinSource imported but unused

Remove unused import: ai.chronon.api.ttypes.JoinSource

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
🔇 Additional comments (8)
api/python/ai/chronon/join.py (4)

24-24: Import module added for dependency generation

The import adds dag_builder module which provides the to_dependency function used for creating table dependencies.


515-520: Good defensive programming with null parameter handling

Ensures None parameters have proper defaults, preventing potential NullPointerExceptions.


574-576: Adds table dependency tracking

Creates dependency list from left source, right parts, and bootstrap parts for improved dependency management.


584-584: Dependency propagation to ExecutionInfo

Adds table dependencies to the execution info object to propagate dependency information.

api/python/ai/chronon/dag_builder.py (4)

1-9: New module for DAG dependency conversion

Introduces utilities for converting various node types to TableDependency objects.

🧰 Tools
🪛 Ruff (0.8.2)

5-5: ai.chronon.api.ttypes.JoinSource imported but unused

Remove unused import: ai.chronon.api.ttypes.JoinSource

(F401)


10-14: JoinPart dependency handling

Correctly recursively resolves JoinPart dependencies to their underlying GroupBy.


14-23: GroupBy dependency handling

Creates TableDependency with appropriate metadata from the GroupBy node.


24-39: BootstrapPart dependency handling

Properly extracts partition information from bootstrap query when available.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
api/python/ai/chronon/dag_builder.py (4)

8-11: Expand docstring

Docstring needs parameter and return type descriptions.


25-40: Consider refactoring overlapping logic

BootstrapPart and Source handling contain similar code patterns.


49-49: Redundant null check

Query already obtained on line 43, null check unnecessary.

-        start_cutoff = query.startPartition if query is not None else None
+        start_cutoff = query.startPartition if hasattr(query, 'startPartition') else None

50-50: Redundant null check

Query already obtained on line 43, null check unnecessary.

-        end_cutoff = query.endPartition if query is not None else None
+        end_cutoff = query.endPartition if hasattr(query, 'endPartition') else None
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between eb204d3 and d4dd8a5.

📒 Files selected for processing (3)
  • api/python/ai/chronon/dag_builder.py (1 hunks)
  • api/python/ai/chronon/group_by.py (3 hunks)
  • api/python/ai/chronon/join.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • api/python/ai/chronon/join.py
  • api/python/ai/chronon/group_by.py
🧰 Additional context used
🧬 Code Definitions (1)
api/python/ai/chronon/dag_builder.py (2)
api/python/ai/chronon/group_by.py (3)
  • TimeUnit (179-181)
  • Window (245-246)
  • GroupBy (430-676)
api/python/ai/chronon/join.py (1)
  • BootstrapPart (360-395)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
🔇 Additional comments (1)
api/python/ai/chronon/dag_builder.py (1)

1-58: Add tests for new function

Add unit tests to verify all node type conversions.

Co-authored-by: Thomas Chow <[email protected]>
@tchow-zlai tchow-zlai force-pushed the tchow/table-dependencies branch from fbe31ca to 30ca1a4 Compare April 3, 2025 01:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
api/python/ai/chronon/dag_builder.py (3)

42-42: Add type verification for Source

The else block assumes the node is a Source without explicit verification.

-    else:  # When type of node is a Source
+    elif isinstance(node, Source):
+    else:
+        raise TypeError(f"Unexpected node type: {type(node).__name__}. Expected Source, GroupBy, JoinPart, or BootstrapPart.")

1-60: Missing comprehensive docstring

The function would benefit from a detailed docstring explaining parameters, return values, and behavior.

 def to_dependency(node: Union[Source, GroupBy, JoinPart, BootstrapPart], lag: int = 0) -> TableDependency:
+    """
+    Converts a node into a TableDependency object.
+
+    Parameters:
+    - node: A node of type Source, GroupBy, JoinPart, or BootstrapPart.
+    - lag: An integer representing the lag (default is 0).
+
+    Returns:
+    - A TableDependency object representing the dependency of the provided node.
+
+    Raises:
+    - TypeError: If the node is not one of the expected types.
+    """

17-25: Consider parameterizing forceCompute

The forceCompute flag is hardcoded to False. Consider making it a parameter if different use cases need different values.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between d4dd8a5 and 30ca1a4.

📒 Files selected for processing (3)
  • api/python/ai/chronon/dag_builder.py (1 hunks)
  • api/python/ai/chronon/group_by.py (3 hunks)
  • api/python/ai/chronon/join.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/python/ai/chronon/group_by.py
🧰 Additional context used
🧬 Code Definitions (2)
api/python/ai/chronon/join.py (1)
api/python/ai/chronon/dag_builder.py (1)
  • to_dependency (12-59)
api/python/ai/chronon/dag_builder.py (2)
api/python/ai/chronon/group_by.py (3)
  • TimeUnit (179-181)
  • Window (245-246)
  • GroupBy (430-676)
api/python/ai/chronon/join.py (2)
  • BootstrapPart (360-395)
  • JoinPart (31-109)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
🔇 Additional comments (6)
api/python/ai/chronon/join.py (4)

24-24: Good addition: Import dag_builder module

This import is needed for the new table dependency functionality.


515-520: Defensive initialization of parameters

Adding default empty lists for null parameters prevents potential null pointer errors.


573-575: New table dependency tracking implementation

Properly collects dependency information from left source, right parts, and bootstrap parts.


583-583: Dependency injection into ExecutionInfo

Successfully passes table dependencies to the execution info object.

api/python/ai/chronon/dag_builder.py (2)

12-12: Parameter use corrected

The lag parameter is now properly utilized, addressing a previous review comment.


14-14: Properly propagate lag parameter

Excellent - the lag parameter is now correctly passed to the recursive call.

tchow-zlai and others added 4 commits April 2, 2025 20:35
Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
api/python/ai/chronon/repo/compile.py (1)

276-278: Handle unexpected node types safely.
Calling add_dependencies(obj) might fail if obj is not GroupBy, Join, or JoinPart. Consider pre-checking object type or catching the ValueError.

api/python/ai/chronon/repo/dag_builder.py (2)

33-46: Watch for circular references.
add_dependencies recurses for JoinPart, which might risk cycles in complex graphs.


48-101: Ensure robust metadata checks.
node.metaData.executionInfo is assumed non-null. Add safeguards or validations if that's not guaranteed.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 30ca1a4 and 99ef105.

📒 Files selected for processing (3)
  • api/python/ai/chronon/join.py (1 hunks)
  • api/python/ai/chronon/repo/compile.py (2 hunks)
  • api/python/ai/chronon/repo/dag_builder.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/python/ai/chronon/join.py
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
🔇 Additional comments (1)
api/python/ai/chronon/repo/compile.py (1)

33-33: No concerns with the new import.

Comment on lines 12 to 31
def add_table_info(node: Union[GroupBy, Join, BootstrapPart]):
table_info = None
if isinstance(node, GroupBy):
table_info = TableInfo(
table=c_utils.group_by_output_table_name(node),
)
elif isinstance(node, Join):
table_info = TableInfo(
table=c_utils.get_join_output_table_name(node),
)
elif isinstance(node, BootstrapPart):
part_col = node.query.partitionColumn
table_info = TableInfo(
table=c_utils.get_table,
partitionColumn=part_col
)
else:
return
node.metaData.executionInfo.outputTableInfo = table_info

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Possible function call omission.
Line 25 references c_utils.get_table instead of calling it. Likely intended:

-            table=c_utils.get_table,
+            table=c_utils.get_table(node),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def add_table_info(node: Union[GroupBy, Join, BootstrapPart]):
table_info = None
if isinstance(node, GroupBy):
table_info = TableInfo(
table=c_utils.group_by_output_table_name(node),
)
elif isinstance(node, Join):
table_info = TableInfo(
table=c_utils.get_join_output_table_name(node),
)
elif isinstance(node, BootstrapPart):
part_col = node.query.partitionColumn
table_info = TableInfo(
table=c_utils.get_table,
partitionColumn=part_col
)
else:
return
node.metaData.executionInfo.outputTableInfo = table_info
def add_table_info(node: Union[GroupBy, Join, BootstrapPart]):
table_info = None
if isinstance(node, GroupBy):
table_info = TableInfo(
table=c_utils.group_by_output_table_name(node),
)
elif isinstance(node, Join):
table_info = TableInfo(
table=c_utils.get_join_output_table_name(node),
)
elif isinstance(node, BootstrapPart):
part_col = node.query.partitionColumn
table_info = TableInfo(
table=c_utils.get_table(node),
partitionColumn=part_col
)
else:
return
node.metaData.executionInfo.outputTableInfo = table_info

Co-authored-by: Thomas Chow <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
api/python/ai/chronon/repo/dag_builder.py (1)

25-25: ⚠️ Potential issue

Fix function reference passed without invocation.

Function reference is passed instead of result.

-            table=c_utils.get_table,
+            table=c_utils.get_table(node),
🧹 Nitpick comments (5)
api/python/ai/chronon/repo/dag_builder.py (5)

45-45: Improve error message for unsupported node types.

Current error message shows object instance, not type.

-        raise ValueError(f"Obj is of type: {node}")
+        raise ValueError(f"Unsupported node type: {type(node).__name__}")

23-24: Add null check for node.query.

Similar to other nodes, add protection against NoneType errors.

-        part_col = node.query.partitionColumn
+        part_col = node.query.partitionColumn if node.query is not None else None

12-31: Check if metaData and executionInfo exist before assignment.

Add initialization if fields don't exist.

def add_table_info(node: Union[GroupBy, Join, BootstrapPart]):
    table_info = None
    if isinstance(node, GroupBy):
        table_info = TableInfo(
            table=c_utils.group_by_output_table_name(node),
        )
    elif isinstance(node, Join):
        table_info = TableInfo(
            table=c_utils.get_join_output_table_name(node),
        )
    elif isinstance(node, BootstrapPart):
        part_col = node.query.partitionColumn if node.query is not None else None
        table_info = TableInfo(
            table=c_utils.get_table(node),
            partitionColumn=part_col
        )
    else:
        return
+    if not hasattr(node, 'metaData'):
+        node.metaData = type('obj', (object,), {})
+    if not hasattr(node.metaData, 'executionInfo'):
+        node.metaData.executionInfo = type('obj', (object,), {})
    node.metaData.executionInfo.outputTableInfo = table_info

33-46: Add similar metaData initialization for add_dependencies.

Consistent with previous suggestion.

def add_dependencies(node: Union[GroupBy, Join, JoinPart]):
    if isinstance(node, GroupBy):
        raw_deps = node.sources
        table_deps = [to_dependency(s) for s in raw_deps]
+        if not hasattr(node, 'metaData'):
+            node.metaData = type('obj', (object,), {})
+        if not hasattr(node.metaData, 'executionInfo'):
+            node.metaData.executionInfo = type('obj', (object,), {})
        node.metaData.executionInfo.tableDependencies=table_deps
    elif isinstance(node, Join):
        raw_deps = node.bootstrapParts + node.joinParts + [node.left]
        table_deps = [to_dependency(up) for up in raw_deps]
+        if not hasattr(node, 'metaData'):
+            node.metaData = type('obj', (object,), {})
+        if not hasattr(node.metaData, 'executionInfo'):
+            node.metaData.executionInfo = type('obj', (object,), {})
        node.metaData.executionInfo.tableDependencies=table_deps
    elif isinstance(node, JoinPart):
        add_dependencies(node.groupBy)
    else:
        raise ValueError(f"Unsupported node type: {type(node).__name__}")

7-10: Consider expanding module docstring.

Add more details about purpose and usage.

"""
-Given a node that represents an upstream, turn it into a TableDependency.
+Module for building table dependency graphs.
+
+Provides functions to extract table information and dependencies
+from different node types and convert them to TableDependency objects.
"""
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 99ef105 and 0ec4d5b.

📒 Files selected for processing (1)
  • api/python/ai/chronon/repo/dag_builder.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
🔇 Additional comments (3)
api/python/ai/chronon/repo/dag_builder.py (3)

1-6: LGTM! Imports look good.

Required imports are properly organized.


93-94: ⚠️ Potential issue

Check if query exists before accessing properties.

Current code will raise exception if query is None.

-        start_cutoff = query.startPartition if query is not None else None
-        end_cutoff = query.endPartition if query is not None else None
+        start_cutoff = None
+        end_cutoff = None
+        if query is not None:
+            start_cutoff = query.startPartition
+            end_cutoff = query.endPartition

Likely an incorrect or invalid review comment.


74-75: ⚠️ Potential issue

Check if query exists before accessing properties.

Current code will raise exception if query is None.

-        start_cutoff = node.query.startPartition if node.query is not None else None
-        end_cutoff = node.query.endPartition if node.query is not None else None
+        start_cutoff = None
+        end_cutoff = None
+        if node.query is not None:
+            start_cutoff = node.query.startPartition
+            end_cutoff = node.query.endPartition

Likely an incorrect or invalid review comment.

Co-authored-by: Thomas Chow <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (7)
api/python/ai/chronon/repo/dag_builder.py (7)

24-26: ⚠️ Potential issue

Fix function call in BootstrapPart table_info.

c_utils.get_table reference needs to be called with node parameter.

-            table=c_utils.get_table,
+            table=c_utils.get_table(node),

58-60: ⚠️ Potential issue

Fix casing inconsistency - endCutOff.

Variable name should be consistent.

-            if source_dep.endCutOff is not None:
-                if min_end_cutoff is None or min_end_cutoff > source_dep.endCutOff:
-                    min_end_cutoff = source_dep.endCutOff
+            if source_dep.endCutoff is not None:
+                if min_end_cutoff is None or min_end_cutoff > source_dep.endCutoff:
+                    min_end_cutoff = source_dep.endCutoff

61-63: ⚠️ Potential issue

Fix casing inconsistency - startCutOff.

Variable name should be consistent.

-            if source_dep.startCutOff is not None:
-                if max_start_cutoff is None or max_start_cutoff < source_dep.startCutOff:
-                    max_start_cutoff = source_dep.startCutOff
+            if source_dep.startCutoff is not None:
+                if max_start_cutoff is None or max_start_cutoff < source_dep.startCutoff:
+                    max_start_cutoff = source_dep.startCutoff

69-70: ⚠️ Potential issue

Fix variable casing in constructor.

Use consistent variable naming.

-            startCutOff=max_start_cutoff,
-            endCutOff=min_end_cutoff,
+            startCutoff=max_start_cutoff,
+            endCutoff=min_end_cutoff,

81-82: ⚠️ Potential issue

Fix variable casing in constructor.

Use consistent variable naming.

-            startCutOff=start_cutoff,
-            endCutOff=end_cutoff,
+            startCutoff=start_cutoff,
+            endCutoff=end_cutoff,

87-88: ⚠️ Potential issue

Add null check for query.

Prevent NoneType exception.

        query = c_utils.get_query(node)
-        partition_column = query.partitionColumn
+        partition_column = query.partitionColumn if query is not None else None

99-100: ⚠️ Potential issue

Fix variable casing in constructor.

Use consistent variable naming.

-            startCutOff=start_cutoff,
-            endCutOff=end_cutoff,
+            startCutoff=start_cutoff,
+            endCutoff=end_cutoff,
🧹 Nitpick comments (1)
api/python/ai/chronon/repo/dag_builder.py (1)

7-10: Docstring could be more comprehensive.

Consider expanding to describe all three functions.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 0ec4d5b and f1709a4.

📒 Files selected for processing (1)
  • api/python/ai/chronon/repo/dag_builder.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
🔇 Additional comments (2)
api/python/ai/chronon/repo/dag_builder.py (2)

1-6: Imports look good.

Appropriate typing and module imports for the functionality.


33-46: Clean implementation of dependency addition.

Good error handling for unsupported types.

Co-authored-by: Thomas Chow <[email protected]>
raise ValueError(f"Obj is of type: {node}")


def to_dependency(node: Union[Source, GroupBy, JoinPart, BootstrapPart], lag: int = 0) -> TableDependency:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi lag not incorporated yet .

@tchow-zlai tchow-zlai closed this Apr 4, 2025
@tchow-zlai tchow-zlai deleted the tchow/table-dependencies branch April 4, 2025 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant