-
Notifications
You must be signed in to change notification settings - Fork 0
stub/part_1: physical graph for workflow submission + column lineage #493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request removes the Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant BF as Backfill Command
participant CP as Compile Function
participant PI as PhysicalIndex
participant PG as Graph Functions
U->>BF: Invoke backfill(conf, start, end)
BF->>CP: Call compile(conf)
CP-->>BF: Return compile results
BF->>PI: Create PhysicalIndex from results
BF->>PG: Request backfill physical graph
PG-->>BF: Return physical graph
BF->>PG: Submit physical graph
PG-->>BF: Confirm submission
Possibly related PRs
Suggested reviewers
Poem
Warning Review ran into problems🔥 ProblemsGitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
api/py/ai/chronon/cli/entrypoint.py (1)
23-23
: Consider renaming the function.Shadowing Python’s built-in
compile
might cause confusion.- def compile() -> Dict[ConfigType, CompileResult]: + def compile_configs() -> Dict[ConfigType, CompileResult]:api/py/ai/chronon/cli/plan/physical_index.py (4)
10-22
:PhysicalNode
outline is good.
construct_lineage()
is a stub; fill it once lineage logic is known.
24-25
:get_physical_nodes
is a placeholder.Implement logic to build nodes from compiled objects.
43-48
:from_compiled_obj
needs an implementation.Currently just a stub; confirm usage before merging.
50-53
:get_backfill_physical_graph
is empty.Add logic to shape the physical graph for backfill.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (4)
api/py/ai/chronon/cli/entrypoint.py
(3 hunks)api/py/ai/chronon/cli/plan/physical_index.py
(1 hunks)api/thrift/api.thrift
(2 hunks)api/thrift/lineage.thrift
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/cli/entrypoint.py
1-1: dataclasses.dataclass
imported but unused
Remove unused import: dataclasses.dataclass
(F401)
2-2: typing.List
imported but unused
Remove unused import: typing.List
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: non_spark_tests
- GitHub Check: non_spark_tests
🔇 Additional comments (16)
api/thrift/api.thrift (2)
6-6
: Looks good.No immediate issues with the new include statement.
261-262
: Ensure null-safety for the new field.If
stageColumnLineages
is unset or empty, confirm that consumers handle it gracefully.api/thrift/lineage.thrift (6)
1-2
: Namespaces look fine.No concerns here.
4-8
: New structColumn
is clear.No issues found in these fields.
10-15
: EnumColumnLineageStage
is concise.Seems sufficient for known stages.
18-42
: EnumColumnLineageType
is well-structured.Covers typical lineage scenarios effectively.
44-49
:ColumnLineage
usage is logical.The fields capture lineage details clearly.
51-54
:StageColumnLineage
is straightforward.Nicely groups multiple lineage entries by stage.
api/py/ai/chronon/cli/entrypoint.py (3)
6-12
: Import statements look fine otherwise.No further concerns here.
42-43
: Thebranch
parameter is not consumed.Either remove the option or handle it in the function.
44-50
: Backfill logic looks acceptable.Appropriately leverages compiled results and physical graph.
api/py/ai/chronon/cli/plan/physical_index.py (5)
1-8
: Imports are overall fine.No immediate issues.
28-34
:PhysicalGraph
structure is fine.Captures node, dependencies, and date range clearly.
36-42
:PhysicalIndex
is well-defined.Storing table-to-node and column lineage is appropriate.
56-57
:get_deploy_physical_graph
is also a stub.Implementation pending.
60-64
:submit_physical_graph
skeleton is fine.Implement submission once node resolution is clear.
api/thrift/lineage.thrift
Outdated
4: optional ColumnLineageType lineageType // not present means direct | ||
} | ||
|
||
struct StageColumnLineage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more, since we always know the list of stages for a given node, should we just make structs for each node. I.e.
struct GroupByColumnLineageStages {
1. list<ColumnLineage> selectLineage
2. list<ColumnLineage> whereLineage
3. list<ColumnLineage> aggLineage
...
}
instead of doing StageColumnLineage
with a ColumnLineageStage
property (which is a bit confusing naming at the very least)?
Basically these stages are always fixed per node type, so might as well reflect that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, classic tradeoff between generality and type safely. on the other suggestion about naming, i was wondering how to make it less confusing, open to suggestions.
making the types specific wouldn't change the producer code too much, but will blow up the consumer side by an unreasonable amount
- all consumers of lineage will need to if/else
- the lineage thrift that is supposed to live in metadata now has knowledge of conf types - before it was agnostic
- any changes or new stages or nodes etc will require thrift changes to lineage and consumer changes. we want the lineage system to work with as little effort as possible when we add models, embeddings etc. in theory none of the consumer code should change when we make these changes.
also staging query wont fit a static structure like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yeah that all makes sense, let's leave it alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thought on renaming this: StageWithLineage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped a few minor comments. I'm a bit fuzzy at a high level on how some of this meshes exactly with concrete staging query -> groupby -> joins. Could we drop a readme / detailed example in the thrift?
(I'll let Varant / Kumar stamp as they're directly integrating with this)
f"Backfilling with scope {scope} for config {conf} from {start_date} to {end_date}" | ||
@click.option("--branch", default=get_current_branch, help="Branch to sync") | ||
def backfill(conf: str, start_date: str, end_date: str): | ||
# compile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: redundant comment? (could expand a bit more if you think that's helpful)
api/thrift/api.thrift
Outdated
@@ -257,6 +258,8 @@ struct MetaData { | |||
// column -> tag_key -> tag_value | |||
7: optional map<string, map<string, string>> columnTags | |||
|
|||
8: optional list<lineage.StageColumnLineage> stageColumnLineages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop a comment on what this captures at a high level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought on renaming this: stagesWithLineage
api/thrift/lineage.thrift
Outdated
|
||
struct Column { | ||
1: optional string logicalNodeName | ||
2: optional string stageName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trying to confirm my understanding - is stageName going to be something like 'Join' or 'GroupBy'? or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good call, @nikhil-zlai I think you can use the enum here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change
api/thrift/lineage.thrift
Outdated
TIMESTAMP, | ||
DIRECT | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdyt of moving this to the top of the enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
/** | ||
* select x+y as a, c, json_extract(payload, "$.meta.user") as user where b = 10 | ||
* group_by user | ||
* agg - count(a, window=7d, bucket=c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we include timestamp in this example too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question - i think all input columns in a events source and temporal entities will automatically need time as a indirect lineage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
api/py/ai/chronon/cli/plan/physical_graph.py (1)
14-23
: Consider cycle detection for dependencies.
If there’s a loop, flatten() could recurse infinitely.api/py/ai/chronon/cli/plan/controller_iface.py (2)
4-4
: Remove unused import.
It’s not referenced anywhere else.-from ai.chronon.cli.plan.physical_index import PhysicalNode
🧰 Tools
🪛 Ruff (0.8.2)
4-4:
ai.chronon.cli.plan.physical_index.PhysicalNode
imported but unusedRemove unused import:
ai.chronon.cli.plan.physical_index.PhysicalNode
(F401)
8-9
: Empty constructor in an ABC.
Consider removing or making it abstract if needed.🧰 Tools
🪛 Ruff (0.8.2)
8-9:
OrchestratorIface.__init__
is an empty method in an abstract base class, but has no abstract decorator(B027)
api/py/ai/chronon/cli/plan/physical_index.py (1)
62-66
: Add error handling.
upload_conf
may fail; consider exception handling or retries.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (6)
api/py/ai/chronon/cli/plan/controller_iface.py
(1 hunks)api/py/ai/chronon/cli/plan/physical_graph.py
(1 hunks)api/py/ai/chronon/cli/plan/physical_index.py
(1 hunks)api/py/ai/chronon/cli/plan/physical_node.py
(1 hunks)api/thrift/common.thrift
(2 hunks)api/thrift/orchestration.thrift
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/cli/plan/controller_iface.py
4-4: ai.chronon.cli.plan.physical_index.PhysicalNode
imported but unused
Remove unused import: ai.chronon.cli.plan.physical_index.PhysicalNode
(F401)
8-9: OrchestratorIface.__init__
is an empty method in an abstract base class, but has no abstract decorator
(B027)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: non_spark_tests
- GitHub Check: non_spark_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (9)
api/py/ai/chronon/cli/plan/physical_graph.py (2)
1-2
: All good.
4-4
: False positive from static analysis.
Import is clearly used for type annotation.api/thrift/common.thrift (3)
46-51
: Clearer dependency logic.
Documentation looks good.
58-58
: Additional comments appreciated.
These details will help clarify partition enumeration.
100-101
: Good explanation for cron usage.
No issues here.api/py/ai/chronon/cli/plan/controller_iface.py (2)
1-2
: Neat imports.
No concerns.
11-17
: Methods declared properly.
Interface design looks solid.api/thrift/orchestration.thrift (2)
34-40
: New enum value looks fine.
STAGING_QUERY addition is straightforward.
181-231
: Workflow docs are clear.
Nice high-level overview of the triggering logic.
@dataclass | ||
class PhysicalNode: | ||
name: str | ||
node_type: str # NodeType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use a typed enum.
String usage for node_type
is fragile. Consider an actual enum for stronger type safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
api/py/ai/chronon/cli/entrypoint.py (2)
22-22
: Consider renaming to avoid overshadowing built-in 'compile'.
25-26
: Address the TODO on column lineage.Would you like help creating a stub implementation or opening a new issue?
api/py/ai/chronon/cli/plan/physical_index.py (2)
20-31
: Custom init overrides dataclass defaults.Consider simpler dataclass usage or remove @DataClass for clarity.
49-66
: Provide error handling if uploads fail.Consider try/except or status checks after controller uploads.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (3)
api/py/ai/chronon/cli/entrypoint.py
(3 hunks)api/py/ai/chronon/cli/plan/physical_index.py
(1 hunks)api/py/ai/chronon/cli/plan/physical_node.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- api/py/ai/chronon/cli/plan/physical_node.py
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: non_spark_tests
- GitHub Check: non_spark_tests
🔇 Additional comments (10)
api/py/ai/chronon/cli/entrypoint.py (3)
1-1
: No issues with the new import.
5-11
: Imports are used properly.
41-42
: 'branch' parameter is unused.Confirm if this is intentional or remove it to keep the function concise.
api/py/ai/chronon/cli/plan/physical_index.py (7)
1-2
: Mixing @DataClass with manual init can be confusing.Ensure this approach won't conflict with generated dataclass features.
4-9
: No concerns with these imports.
12-19
: Fields set by dataclass but also overridden in init.Decide if you really need a custom init with dataclass.
32-34
: Method remains unimplemented.
36-41
: Method remains unimplemented.
42-46
: Method remains unimplemented.
47-48
: Method remains unimplemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
api/py/ai/chronon/cli/plan/controller_iface.py (3)
6-7
: Clean up extra blank line.from ai.chronon.cli.plan.physical_graph import PhysicalGraph from ai.chronon.cli.plan.physical_index import PhysicalNode - class ControllerIface(ABC):
14-17
: Add docstring forfetch_missing_confs
.Missing documentation for this method. Add a clear docstring like the other methods have.
@abstractmethod def fetch_missing_confs(self, node_to_hash: Dict[str, str]) -> List[str]: + """ + Get names of configurations that need to be uploaded to the orchestrator. + """ pass
18-21
: Add docstring forupload_conf
.Missing documentation for this method. Add a clear docstring like the other methods have.
@abstractmethod def upload_conf(self, name: str, hash: str, content: str) -> None: + """ + Upload a configuration to the orchestrator. + """ pass
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
.github/workflows/test_python.yaml
(1 hunks)api/py/ai/chronon/cli/plan/controller_iface.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/cli/plan/controller_iface.py
5-5: ai.chronon.cli.plan.physical_index.PhysicalNode
imported but unused
Remove unused import: ai.chronon.cli.plan.physical_index.PhysicalNode
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: non_spark_tests
- GitHub Check: non_spark_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (3)
.github/workflows/test_python.yaml (2)
63-63
: Lineage thrift command added.
Concise and clear; verify thatapi/thrift/lineage.thrift
exists and is current.
64-64
: Orchestration thrift command added.
Concise and clear; confirm thatapi/thrift/orchestration.thrift
is up-to-date and reflects intended changes.api/py/ai/chronon/cli/plan/controller_iface.py (1)
8-46
: Overall interface design is sound.This abstract interface provides a good abstraction layer for orchestrator interactions. Strongly typed parameters and return values enhance clarity.
from abc import ABC, abstractmethod | ||
from typing import Dict, List, Optional | ||
|
||
from ai.chronon.cli.plan.physical_graph import PhysicalGraph | ||
from ai.chronon.cli.plan.physical_index import PhysicalNode | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import.
Static analysis correctly identified that PhysicalNode
is imported but never used.
from abc import ABC, abstractmethod
from typing import Dict, List, Optional
from ai.chronon.cli.plan.physical_graph import PhysicalGraph
-from ai.chronon.cli.plan.physical_index import PhysicalNode
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from abc import ABC, abstractmethod | |
from typing import Dict, List, Optional | |
from ai.chronon.cli.plan.physical_graph import PhysicalGraph | |
from ai.chronon.cli.plan.physical_index import PhysicalNode | |
from abc import ABC, abstractmethod | |
from typing import Dict, List, Optional | |
from ai.chronon.cli.plan.physical_graph import PhysicalGraph |
🧰 Tools
🪛 Ruff (0.8.2)
5-5: ai.chronon.cli.plan.physical_index.PhysicalNode
imported but unused
Remove unused import: ai.chronon.cli.plan.physical_index.PhysicalNode
(F401)
@@ -257,6 +258,8 @@ struct MetaData { | |||
// column -> tag_key -> tag_value | |||
7: optional map<string, map<string, string>> columnTags | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// A stage is a "sub-transformation" of a given node. For example a `GroupBy` can consist of selects (with SQL expressions), filters (in the form of where clauses), followed by aggregations defined in the Zipline DSL. | |
// Each of this is a `stage` with its own column level lineage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🔭 Outside diff range comments (1)
api/py/ai/chronon/cli/plan/physical_node.py (1)
1-24
: 🛠️ Refactor suggestionComplete file restructuring needed.
Based on imports and function signatures, a proper class structure is needed:
from typing import List from ai.chronon.api.ttypes import GroupBy, Join, Model, StagingQuery from ai.chronon.orchestration.ttypes import PhysicalNode class PhysicalNodeFactory: @classmethod def from_group_by(cls, group_by: GroupBy) -> List[PhysicalNode]: raise NotImplementedError("Method not yet implemented") @classmethod def from_join(cls, join: Join) -> List[PhysicalNode]: raise NotImplementedError("Method not yet implemented") @classmethod def from_staging_query(cls, staging_query: StagingQuery) -> List[PhysicalNode]: raise NotImplementedError("Method not yet implemented") @classmethod def from_model(cls, model: Model) -> List[PhysicalNode]: raise NotImplementedError("Method not yet implemented")🧰 Tools
🪛 Ruff (0.8.2)
1-1:
dataclasses.dataclass
imported but unusedRemove unused import:
dataclasses.dataclass
(F401)
4-4:
ai.chronon.api.common.ttypes.TableDependency
imported but unusedRemove unused import:
ai.chronon.api.common.ttypes.TableDependency
(F401)
6-6:
ai.chronon.cli.compile.compile_context.CompiledObj
imported but unusedRemove unused import:
ai.chronon.cli.compile.compile_context.CompiledObj
(F401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
api/py/ai/chronon/cli/plan/physical_node.py
(1 hunks)api/thrift/orchestration.thrift
(3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/cli/plan/physical_node.py
1-1: dataclasses.dataclass
imported but unused
Remove unused import: dataclasses.dataclass
(F401)
4-4: ai.chronon.api.common.ttypes.TableDependency
imported but unused
Remove unused import: ai.chronon.api.common.ttypes.TableDependency
(F401)
6-6: ai.chronon.cli.compile.compile_context.CompiledObj
imported but unused
Remove unused import: ai.chronon.cli.compile.compile_context.CompiledObj
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: non_spark_tests
- GitHub Check: non_spark_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
api/thrift/orchestration.thrift (2)
33-40
: New enum valueSTAGING_QUERY
added.Added STAGING_QUERY=3 to LogicalType enum.
195-244
: Extensive documentation added for workflow trigger process.Documentation explains physical node workflow, dependency handling, and scheduling logic.
def from_group_by(cls, group_by: GroupBy) -> List[PhysicalNode]: | ||
raise NotImplementedError("Method not yet implemented") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing class definition for methods.
Functions have cls
parameter indicating they should be class methods.
-def from_group_by(cls, group_by: GroupBy) -> List[PhysicalNode]:
+@classmethod
+def from_group_by(cls, group_by: GroupBy) -> List[PhysicalNode]:
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def from_group_by(cls, group_by: GroupBy) -> List[PhysicalNode]: | |
raise NotImplementedError("Method not yet implemented") | |
@classmethod | |
def from_group_by(cls, group_by: GroupBy) -> List[PhysicalNode]: | |
raise NotImplementedError("Method not yet implemented") |
def from_model(cls, model: Model) -> List[PhysicalNode]: | ||
raise NotImplementedError("Method not yet implemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing class definition for methods (4).
Same issue as previous method.
-def from_model(cls, model: Model) -> List[PhysicalNode]:
+@classmethod
+def from_model(cls, model: Model) -> List[PhysicalNode]:
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def from_model(cls, model: Model) -> List[PhysicalNode]: | |
raise NotImplementedError("Method not yet implemented") | |
@classmethod | |
def from_model(cls, model: Model) -> List[PhysicalNode]: | |
raise NotImplementedError("Method not yet implemented") |
from dataclasses import dataclass | ||
from typing import List | ||
|
||
from ai.chronon.api.common.ttypes import TableDependency | ||
from ai.chronon.api.ttypes import GroupBy, Join, Model, StagingQuery | ||
from ai.chronon.cli.compile.compile_context import CompiledObj | ||
from ai.chronon.orchestration.ttypes import PhysicalNode | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused imports detected.
Several imports are not used in the code.
-from dataclasses import dataclass
from typing import List
-from ai.chronon.api.common.ttypes import TableDependency
from ai.chronon.api.ttypes import GroupBy, Join, Model, StagingQuery
-from ai.chronon.cli.compile.compile_context import CompiledObj
from ai.chronon.orchestration.ttypes import PhysicalNode
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from dataclasses import dataclass | |
from typing import List | |
from ai.chronon.api.common.ttypes import TableDependency | |
from ai.chronon.api.ttypes import GroupBy, Join, Model, StagingQuery | |
from ai.chronon.cli.compile.compile_context import CompiledObj | |
from ai.chronon.orchestration.ttypes import PhysicalNode | |
from typing import List | |
from ai.chronon.api.ttypes import GroupBy, Join, Model, StagingQuery | |
from ai.chronon.orchestration.ttypes import PhysicalNode |
🧰 Tools
🪛 Ruff (0.8.2)
1-1: dataclasses.dataclass
imported but unused
Remove unused import: dataclasses.dataclass
(F401)
4-4: ai.chronon.api.common.ttypes.TableDependency
imported but unused
Remove unused import: ai.chronon.api.common.ttypes.TableDependency
(F401)
6-6: ai.chronon.cli.compile.compile_context.CompiledObj
imported but unused
Remove unused import: ai.chronon.cli.compile.compile_context.CompiledObj
(F401)
def from_join(cls, join: Join) -> List[PhysicalNode]: | ||
raise NotImplementedError("Method not yet implemented") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing class definition for methods (2).
Same issue as previous method.
-def from_join(cls, join: Join) -> List[PhysicalNode]:
+@classmethod
+def from_join(cls, join: Join) -> List[PhysicalNode]:
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def from_join(cls, join: Join) -> List[PhysicalNode]: | |
raise NotImplementedError("Method not yet implemented") | |
@classmethod | |
def from_join(cls, join: Join) -> List[PhysicalNode]: | |
raise NotImplementedError("Method not yet implemented") |
def from_staging_query(cls, staging_query: StagingQuery) -> List[PhysicalNode]: | ||
raise NotImplementedError("Method not yet implemented") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing class definition for methods (3).
Same issue as previous method.
-def from_staging_query(cls, staging_query: StagingQuery) -> List[PhysicalNode]:
+@classmethod
+def from_staging_query(cls, staging_query: StagingQuery) -> List[PhysicalNode]:
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def from_staging_query(cls, staging_query: StagingQuery) -> List[PhysicalNode]: | |
raise NotImplementedError("Method not yet implemented") | |
@classmethod | |
def from_staging_query(cls, staging_query: StagingQuery) -> List[PhysicalNode]: | |
raise NotImplementedError("Method not yet implemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
api/thrift/lineage.thrift (3)
10-15
: Consider using this enum for stageName.Based on past review comments, stageName in Column struct should use this enum instead of string.
struct Column { - 2: optional string stageName + 2: optional ColumnLineageStage stageName }
18-42
: Nice documentation with examples.Consider moving DIRECT to the top of the enum as suggested in past reviews.
enum ColumnLineageType { + DIRECT, WHERE, JOIN_KEY, GROUP_BY_KEY, DERIVE, TIMESTAMP, - DIRECT
33-33
: Consider including timestamp in the example.As mentioned in past review comments, temporal data would have timestamp as indirect lineage.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
api/thrift/api.thrift
(2 hunks)api/thrift/lineage.thrift
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: non_spark_tests
- GitHub Check: non_spark_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (6)
api/thrift/api.thrift (2)
6-6
: Good addition of the lineage include.
261-263
: Excellent documentation for stagesWithLineage field.The comment clearly explains the concept of stages as sub-transformations with their own column level lineage.
api/thrift/lineage.thrift (4)
1-2
: LGTM: Proper namespace definition.
4-8
: Column struct looks good.Provides necessary fields to identify a column within the lineage context.
44-49
: ColumnLineage struct looks good.Captures all necessary information for tracking column transformations.
51-54
: StageWithLineage struct is well-defined.Proper structure for grouping related column lineage information by stage.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
ea35bab
to
27785ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
api/py/ai/chronon/cli/entrypoint.py (2)
25-25
: Address the TODO.
Consider implementing or tracking "column lineage."
43-49
: Add error handling.
Check compile_result or catch exceptions before proceeding.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (10)
.github/workflows/test_python.yaml
(1 hunks)api/py/ai/chronon/cli/entrypoint.py
(3 hunks)api/py/ai/chronon/cli/plan/controller_iface.py
(1 hunks)api/py/ai/chronon/cli/plan/physical_graph.py
(1 hunks)api/py/ai/chronon/cli/plan/physical_index.py
(1 hunks)api/py/ai/chronon/cli/plan/physical_node.py
(1 hunks)api/thrift/api.thrift
(2 hunks)api/thrift/common.thrift
(2 hunks)api/thrift/lineage.thrift
(1 hunks)api/thrift/orchestration.thrift
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- api/py/ai/chronon/cli/plan/physical_graph.py
- .github/workflows/test_python.yaml
- api/thrift/common.thrift
- api/thrift/api.thrift
- api/thrift/lineage.thrift
- api/thrift/orchestration.thrift
- api/py/ai/chronon/cli/plan/physical_index.py
🧰 Additional context used
🧬 Code Definitions (1)
api/py/ai/chronon/cli/plan/controller_iface.py (1)
api/py/ai/chronon/cli/plan/physical_graph.py (1) (1)
PhysicalGraph
(8-23)
🪛 Ruff (0.8.2)
api/py/ai/chronon/cli/plan/physical_node.py
1-1: dataclasses.dataclass
imported but unused
Remove unused import: dataclasses.dataclass
(F401)
4-4: ai.chronon.api.common.ttypes.TableDependency
imported but unused
Remove unused import: ai.chronon.api.common.ttypes.TableDependency
(F401)
6-6: ai.chronon.cli.compile.compile_context.CompiledObj
imported but unused
Remove unused import: ai.chronon.cli.compile.compile_context.CompiledObj
(F401)
api/py/ai/chronon/cli/plan/controller_iface.py
5-5: ai.chronon.cli.plan.physical_index.PhysicalNode
imported but unused
Remove unused import: ai.chronon.cli.plan.physical_index.PhysicalNode
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: non_spark_tests
- GitHub Check: non_spark_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (6)
api/py/ai/chronon/cli/plan/controller_iface.py (1)
5-5
: Remove unused import.
PhysicalNode is unused.🧰 Tools
🪛 Ruff (0.8.2)
5-5:
ai.chronon.cli.plan.physical_index.PhysicalNode
imported but unusedRemove unused import:
ai.chronon.cli.plan.physical_index.PhysicalNode
(F401)
api/py/ai/chronon/cli/plan/physical_node.py (5)
1-1
: Remove unused imports.Also applies to: 4-4, 6-6
🧰 Tools
🪛 Ruff (0.8.2)
1-1:
dataclasses.dataclass
imported but unusedRemove unused import:
dataclasses.dataclass
(F401)
10-12
: Add @classmethod.
Functions with acls
parameter should be class methods.
14-16
: Add @classmethod.
Same issue as above.
18-20
: Add @classmethod.
Same issue as above.
22-23
: Add @classmethod.
Same issue as above.
@click.option("--branch", default=get_current_branch, help="Branch to sync") | ||
def backfill(conf: str, start_date: str, end_date: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix mismatch in function signature.
@click.option provides branch
but the function has no branch parameter.
Apply this diff:
-def backfill(conf: str, start_date: str, end_date: str):
+def backfill(conf: str, start_date: str, end_date: str, branch: str):
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@click.option("--branch", default=get_current_branch, help="Branch to sync") | |
def backfill(conf: str, start_date: str, end_date: str): | |
@click.option("--branch", default=get_current_branch, help="Branch to sync") | |
def backfill(conf: str, start_date: str, end_date: str, branch: str): |
…493) Adding function stubs and thrift schemas for orchestration with column level lineage. - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - **New Features** - Streamlined CLI operations with a simplified backfill command and removal of an obsolete synchronization command. - Enhanced configuration processing that now automatically compiles and returns results. - Expanded physical planning capabilities with new mechanisms for managing computation graphs. - Extended tracking of data transformations through improved column lineage support in metadata. - Introduction of new classes and methods for managing physical nodes and graphs, enhancing data processing workflows. - New interface for orchestrators to handle configuration fetching and uploading. - New structures for detailed column lineage tracking throughout data processing stages. - Added support for generating Python code from new Thrift definitions in the testing workflow. - **Bug Fixes** - Improved documentation for clarity on the functionality and logic of certain fields. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: ezvz <[email protected]>
…493) ## Summary Adding function stubs and thrift schemas for orchestration with column level lineage. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Streamlined CLI operations with a simplified backfill command and removal of an obsolete synchronization command. - Enhanced configuration processing that now automatically compiles and returns results. - Expanded physical planning capabilities with new mechanisms for managing computation graphs. - Extended tracking of data transformations through improved column lineage support in metadata. - Introduction of new classes and methods for managing physical nodes and graphs, enhancing data processing workflows. - New interface for orchestrators to handle configuration fetching and uploading. - New structures for detailed column lineage tracking throughout data processing stages. - Added support for generating Python code from new Thrift definitions in the testing workflow. - **Bug Fixes** - Improved documentation for clarity on the functionality and logic of certain fields. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: ezvz <[email protected]>
…493) ## Summary Adding function stubs and thrift schemas for orchestration with column level lineage. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Streamlined CLI operations with a simplified backfill command and removal of an obsolete synchronization command. - Enhanced configuration processing that now automatically compiles and returns results. - Expanded physical planning capabilities with new mechanisms for managing computation graphs. - Extended tracking of data transformations through improved column lineage support in metadata. - Introduction of new classes and methods for managing physical nodes and graphs, enhancing data processing workflows. - New interface for orchestrators to handle configuration fetching and uploading. - New structures for detailed column lineage tracking throughout data processing stages. - Added support for generating Python code from new Thrift definitions in the testing workflow. - **Bug Fixes** - Improved documentation for clarity on the functionality and logic of certain fields. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: ezvz <[email protected]>
…493) ## Summary Adding function stubs and thrift schemas for orchestration with column level lineage. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Streamlined CLI operations with a simplified backfill command and removal of an obsolete synchronization command. - Enhanced configuration processing that now automatically compiles and returns results. - Expanded physical planning capabilities with new mechanisms for managing computation graphs. - Extended tracking of data transformations through improved column lineage support in metadata. - Introduction of new classes and methods for managing physical nodes and graphs, enhancing data processing workflows. - New interface for orchestrators to handle configuration fetching and uploading. - New structures for detailed column lineage tracking throughout data processing stages. - Added support for generating Python code from new Thrift definitions in the testing workflow. - **Bug Fixes** - Improved documentation for clarity on the functionality and logic of certain fields. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: ezvz <[email protected]>
…493) ## Summary Adding function stubs and thrift schemas for orchestration with column level lineage. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Streamlined CLI operations with a simplified backfill command and removal of an obsolete synchronization command. - Enhanced configuration processing that now automatically compiles and returns results. - Expanded physical planning capabilities with new mechanisms for managing computation graphs. - Extended tracking of data transformations through improved column lineage support in metadata. - Introduction of new classes and methods for managing physical nodes and graphs, enhancing data processing workflows. - New interface for orchestrators to handle configuration fetching and uploading. - New structures for detailed column lineage tracking throughout data processing stages. - Added support for generating Python code from new Thrift definitions in the testing workflow. - **Bug Fixes** - Improved documentation for clarity on the functionality and logic of certain fields. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: ezvz <[email protected]>
…493) ## Summary Adding function stubs and thrift schemas for orchestration with column level lineage. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Streamlined CLI operations with a simplified baour clientsfill command and removal of an obsolete synchronization command. - Enhanced configuration processing that now automatically compiles and returns results. - Expanded physical planning capabilities with new mechanisms for managing computation graphs. - Extended traour clientsing of data transformations through improved column lineage support in metadata. - Introduction of new classes and methods for managing physical nodes and graphs, enhancing data processing workflows. - New interface for orchestrators to handle configuration fetching and uploading. - New structures for detailed column lineage traour clientsing throughout data processing stages. - Added support for generating Python code from new Thrift definitions in the testing workflow. - **Bug Fixes** - Improved documentation for clarity on the functionality and logic of certain fields. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: ezvz <[email protected]>
Summary
Checklist
Summary by CodeRabbit
New Features
Bug Fixes