Branch versioning logic #163

nikhil-zlai · 2024-12-28T20:15:51Z

Summary

Writing down the cases & logic behind branching. Will continue writing about how these are internally represented.

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

Release Notes

Bug Fixes
- Corrected a typo in TableDependency struct, renaming forceComputae to forceCompute.
New Features
- Added support for branch-based workflows in data orchestration.
- Introduced RepoIndex for managing repository data and versions.
- Created SequenceMap utility for managing unique value sequences.
- Added TablePrinter for formatted data output.
- Introduced VersionUpdate class for tracking version changes.
- Added StringExtensions for MD5 hashing of strings.
Refactoring
- Removed LineageIndex and LogicalSet classes.
- Updated naming conventions for logical nodes.
- Restructured repository parsing and version tracking mechanisms.
Documentation
- Added comprehensive README for branch workflow management.
- Expanded documentation for global planning and batch workload processing.
Chores
- Updated logging dependencies.
- Modified build configuration.
- Added new test specifications.

coderabbitai · 2024-12-28T20:15:58Z

Walkthrough

This pull request introduces corrections and enhancements to the Chronon orchestration system. Key changes include fixing a typographical error in the orchestration.thrift file, the addition of documentation regarding branch support in the README.md, and the introduction of new classes such as RepoIndex, RepoNodes, and VersionUpdate. Additionally, several legacy classes were removed, and naming conventions for logical nodes were standardized.

Changes

File	Change Summary
`api/thrift/orchestration.thrift`	Fixed typo: Renamed `forceComputae` to `forceCompute`
`orchestration/README.md`	Added documentation on branch support and pipeline management
`orchestration/src/main/scala/ai/chronon/orchestration/LineageIndex.scala`	Removed `LineageIndex` class and related functionality
`orchestration/src/main/scala/ai/chronon/orchestration/LogicalSet.scala`	Removed `LogicalSet` class
`orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala`	New implementation for repository data management
`orchestration/src/main/scala/ai/chronon/orchestration/RepoNodes.scala`	New class for managing repository nodes
`orchestration/src/main/scala/ai/chronon/orchestration/logical/*NodeImpl.scala`	Updated `name` methods with namespace prefixes
`orchestration/src/main/scala/ai/chronon/orchestration/RepoParser.scala`	Restructured methods for file handling and hash computation
`orchestration/src/main/scala/ai/chronon/orchestration/VersionUpdate.scala`	New class for tracking version updates
`orchestration/src/main/scala/ai/chronon/orchestration/utils/*`	Introduced utility classes for string extensions and sequence mapping
`build.sbt`	Updated logging dependencies and resource configurations
`orchestration/src/test/scala/ai/chronon/orchestration/test/*Spec.scala`	Added tests for `RepoIndex` and `SequenceMap`

Possibly related PRs

feat: use log4j2 everywhere in spark consistently #99: The changes in this PR involve modifications to the build.sbt file, which is also relevant to the main PR's context of ensuring consistent logging practices across the codebase.

Suggested reviewers

piyush-zlai

Poem

🌟 Code's symphony, a refactored dance,
Nodes renamed with elegant stance,
Repositories bloom, versions take flight,
Orchestration's magic shines so bright! 🎵

Warning

Review ran into problems

🔥 Problems

GitHub Actions: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 9c58f34 and de72d1f.

📒 Files selected for processing (1)

build.sbt (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

build.sbt

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: join_spark_tests
GitHub Check: other_spark_tests
GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: mutation_spark_tests
GitHub Check: no_spark_scala_tests

Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

orchestration/README.md (1)

14-16: Fix abbreviations in examples

Use proper abbreviations: "e.g.," instead of "eg.," and add period after "etc."

🧰 Tools

🪛 LanguageTool

[uncategorized] ~14-~14: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...tic updates - that change output data - eg., logic within where-s, selects, aggrega...

(E_G)

[style] ~14-~14: In American English, abbreviations like “etc.” require a period.
Context: ...c within where-s, selects, aggregations etc - make non-semantic updates - that don'...

(ETC_PERIOD)

[uncategorized] ~15-~15: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...dates - that don't change output data - eg., spark exec memory, # of pods / workers...

(E_G)

[style] ~15-~15: In American English, abbreviations like “etc.” require a period.
Context: ... spark exec memory, # of pods / workers etc - *note that everything that is store...

(ETC_PERIOD)

[uncategorized] ~16-~16: Did you mean the plural noun “APIs” instead of the possessive?
Context: ... is stored in metadata field within our API's are non-semantic fields* - add new node...

(APOS_ARE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 6d3b6ca and 76d15ee.

📒 Files selected for processing (2)

api/thrift/orchestration.thrift (1 hunks)
orchestration/README.md (1 hunks)

✅ Files skipped from review due to trivial changes (1)

api/thrift/orchestration.thrift

🧰 Additional context used

🪛 LanguageTool

orchestration/README.md

[uncategorized] ~14-~14: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...tic updates - that change output data - eg., logic within where-s, selects, aggrega...

(E_G)

[style] ~14-~14: In American English, abbreviations like “etc.” require a period.
Context: ...c within where-s, selects, aggregations etc - make non-semantic updates - that don'...

(ETC_PERIOD)

[uncategorized] ~15-~15: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...dates - that don't change output data - eg., spark exec memory, # of pods / workers...

(E_G)

[style] ~15-~15: In American English, abbreviations like “etc.” require a period.
Context: ... spark exec memory, # of pods / workers etc - *note that everything that is store...

(ETC_PERIOD)

[uncategorized] ~16-~16: Did you mean the plural noun “APIs” instead of the possessive?
Context: ... is stored in metadata field within our API's are non-semantic fields* - add new node...

(APOS_ARE)

[style] ~19-~19: Consider shortening or rephrasing this to strengthen your wording.
Context: ...add new nodes - delete existing nodes - make changes to several compute nodes at once - decide ...

(MAKE_CHANGES)

[uncategorized] ~36-~36: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...anges to node notated using a plus "+". Eg., Join J3 becomes J3+ Non-Semantic ...

(E_G)

[grammar] ~85-~85: It seems that a verb or a comma is missing.
Context: ...ory of the user is behind remote, > we will a lot more changes than the user intend...

(MD_DT_JJ)

[style] ~87-~87: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ... mitigate this is to, only make the CLI only pick up changes to files listed as edit...

(ADVERB_REPETITION_PREMIUM)

[uncategorized] ~90-~90: Possible missing article found.
Context: ... branch. Another approach is to force user to rebase on any change to the repo. Ho...

(AI_HYDRA_LEO_MISSING_THE)

[grammar] ~93-~93: This expression is usually spelled with a hyphen.
Context: ...e job is running is accounted for. ### Non Semantic updates Instead, if sq1 changes non-s...

(NON_ANTI_JJ)

[grammar] ~115-~115: The verb ‘compute’ does not usually follow articles like ‘the’. Check that ‘compute’ is spelled correctly; using ‘compute’ as a noun may be non-standard.
Context: ...s who haven't updated sq1 are causing the compute, we should use sq1 - if another branc...

(A_INFINITIVE)

[style] ~116-~116: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ng sq1 non semantically to sq1** we need to use that instead. ### Adding new node...

(REP_NEED_TO_VB)

[uncategorized] ~167-~167: Possible missing article found.
Context: ...his is not a leaf node, absorbing it as same as a leaf node change would be the rig...

(AI_HYDRA_LEO_MISSING_THE)

[uncategorized] ~191-~191: The adjective or adverb “straightforward” (= easy, precise) is spelled as one word.
Context: ... existing nodes Deleting leaf nodes is straight forward. We just need to program a cleanup mech...

(STRAIGHT_FORWARD_COMPOUND)

[formatting] ~234-~234: Consider inserting a comma after an introductory phrase for better readability.
Context: ... node - or the compilation would fail. In these cases it would be ideal to garbage collect up...

(IN_THAT_CASE_COMMA)

[uncategorized] ~234-~234: Possible missing article found.
Context: ...es it would be ideal to garbage collect upstream chain of the deleted node. ```mermaid ...

(AI_HYDRA_LEO_MISSING_THE)

[typographical] ~260-~260: Consider inserting a comma for improved readability.
Context: ...e development on a branch is in progress we need to create temporary data assets fo...

(MISSING_COMMAS)

🪛 Markdownlint (0.37.0)

orchestration/README.md

28-28: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (5)

orchestration/README.md (5)

84-92: Elaborate on synchronization solutions

The synchronization approaches need more detail:

How will the CLI detect file changes from git?
What happens if remote changes occur during job execution?

Would you like me to help draft a more detailed synchronization strategy?

🧰 Tools

🪛 LanguageTool

[grammar] ~85-~85: It seems that a verb or a comma is missing.
Context: ...ory of the user is behind remote, > we will a lot more changes than the user intend...

(MD_DT_JJ)

[style] ~87-~87: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ... mitigate this is to, only make the CLI only pick up changes to files listed as edit...

(ADVERB_REPETITION_PREMIUM)

[uncategorized] ~90-~90: Possible missing article found.
Context: ... branch. Another approach is to force user to rebase on any change to the repo. Ho...

(AI_HYDRA_LEO_MISSING_THE)

113-117: Clarify version conflict resolution

Add details about:

Priority order when multiple branches modify the same node
Conflict resolution strategy

🧰 Tools

🪛 LanguageTool

[grammar] ~115-~115: The verb ‘compute’ does not usually follow articles like ‘the’. Check that ‘compute’ is spelled correctly; using ‘compute’ as a noun may be non-standard.
Context: ...s who haven't updated sq1 are causing the compute, we should use sq1 - if another branc...

(A_INFINITIVE)

[style] ~116-~116: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ng sq1 non semantically to sq1** we need to use that instead. ### Adding new node...

(REP_NEED_TO_VB)

166-169: Expand on SQL migration scenario

Explain:

Why is this case treated differently?
What are the implications for data consistency?

🧰 Tools

🪛 LanguageTool

[uncategorized] ~167-~167: Possible missing article found.
Context: ...his is not a leaf node, absorbing it as same as a leaf node change would be the rig...

(AI_HYDRA_LEO_MISSING_THE)

191-193: Detail cleanup mechanism

Document:

Cleanup timing and triggers
Data retention policy
Pipeline removal process

🧰 Tools

🪛 LanguageTool

[uncategorized] ~191-~191: The adjective or adverb “straightforward” (= easy, precise) is spelled as one word.
Context: ... existing nodes Deleting leaf nodes is straight forward. We just need to program a cleanup mech...

(STRAIGHT_FORWARD_COMPOUND)

273-278: Expand merge documentation

Add critical details about:

Merge conflict resolution
Data migration process
Validation steps

Would you like me to help draft a comprehensive merge process guide?

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (9)

orchestration/src/main/scala/ai/chronon/orchestration/utils/SequenceMap.scala (3)

5-8: Doc is brief. Maybe clarify thread safety.

9-23: In-memory maps not thread-safe.

51-59: Potential index for missing keys could surprise callers.

orchestration/src/test/scala/ai/chronon/orchestration/test/SequenceMapSpec.scala (1)

7-134: Unit tests are thorough. Consider concurrency tests if needed.

orchestration/src/main/scala/ai/chronon/orchestration/RepoParser.scala (2)

20-25: Ensure resource closure on exception.
Consider a try-finally or using loanPattern for safer file reads.

43-50: Implementation is fine.
Could benefit from unit tests verifying partial/empty names.

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (1)

20-104: Method too large.
Consider splitting into smaller functions for clarity.

orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala (1)

19-35: Processor logic is partial

Implement parse(...) or document why it's unimplemented.

orchestration/README.md (1)

1-290: Minor grammar & styling suggestions

• Use “e.g.” (with periods) and “etc.” in American English.
• “straightforward” is one word.
Consider small refinements to ensure clarity.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~14-~14: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...tic updates - that change output data - eg., logic within where-s, selects, aggrega...

(E_G)

[style] ~14-~14: In American English, abbreviations like “etc.” require a period.
Context: ...c within where-s, selects, aggregations etc - make non-semantic updates - that don'...

(ETC_PERIOD)

[uncategorized] ~15-~15: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...dates - that don't change output data - eg., spark exec memory, # of pods / workers...

(E_G)

[style] ~15-~15: In American English, abbreviations like “etc.” require a period.
Context: ... spark exec memory, # of pods / workers etc - *note that everything that is store...

(ETC_PERIOD)

[uncategorized] ~16-~16: Did you mean the plural noun “APIs” instead of the possessive?
Context: ... is stored in metadata field within our API's are non-semantic fields* - add new node...

(APOS_ARE)

[style] ~19-~19: Consider shortening or rephrasing this to strengthen your wording.
Context: ...add new nodes - delete existing nodes - make changes to several compute nodes at once - decide ...

(MAKE_CHANGES)

[uncategorized] ~36-~36: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...anges to node notated using a plus "+". Eg., Join J3 becomes J3+ Non-Semantic ...

(E_G)

[grammar] ~85-~85: It seems that a verb or a comma is missing.
Context: ...ory of the user is behind remote, > we will a lot more changes than the user intend...

(MD_DT_JJ)

[style] ~87-~87: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ... mitigate this is to, only make the CLI only pick up changes to files listed as edit...

(ADVERB_REPETITION_PREMIUM)

[grammar] ~93-~93: This expression is usually spelled with a hyphen.
Context: ...e job is running is accounted for. ### Non Semantic updates Instead, if sq1 changes non-s...

(NON_ANTI_JJ)

[grammar] ~115-~115: The verb ‘compute’ does not usually follow articles like ‘the’. Check that ‘compute’ is spelled correctly; using ‘compute’ as a noun may be non-standard.
Context: ...s who haven't updated sq1 are causing the compute, we should use sq1 - if another branc...

(A_INFINITIVE)

[style] ~116-~116: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ng sq1 non semantically to sq1** we need to use that instead. ### Adding new node...

(REP_NEED_TO_VB)

[uncategorized] ~141-~141: Possible missing article found.
Context: ...olor:black,stroke:#333 ``` But adding non-leaf node - as parent to existing node...

(AI_HYDRA_LEO_MISSING_A)

[uncategorized] ~142-~142: Possible missing article found.
Context: ...ut adding non-leaf node - as parent to existing node - would almost always cause semant...

(AI_HYDRA_LEO_MISSING_AN)

[uncategorized] ~167-~167: Possible missing article found.
Context: ...his is not a leaf node, absorbing it as same as a leaf node change would be the rig...

(AI_HYDRA_LEO_MISSING_THE)

[uncategorized] ~191-~191: The adjective or adverb “straightforward” (= easy, precise) is spelled as one word.
Context: ... existing nodes Deleting leaf nodes is straight forward. We just need to program a cleanup mech...

(STRAIGHT_FORWARD_COMPOUND)

[formatting] ~234-~234: Consider inserting a comma after an introductory phrase for better readability.
Context: ... node - or the compilation would fail. In these cases it would be ideal to garbage collect up...

(IN_THAT_CASE_COMMA)

[typographical] ~260-~260: Consider inserting a comma for improved readability.
Context: ...e development on a branch is in progress we need to create temporary data assets fo...

(MISSING_COMMAS)

🪛 Markdownlint (0.37.0)

28-28: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

273-273: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 76d15ee and ad0c7c4.

📒 Files selected for processing (18)

orchestration/README.md (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/LineageIndex.scala (0 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/LogicalSet.scala (0 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/README.md (0 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/RepoNodes.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/RepoParser.scala (2 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/VersionUpdate.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/logical/GroupByNodeImpl.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/logical/JoinNodeImpl.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/logical/LogicalNodeImpl.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/logical/ModelNodeImpl.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/logical/TabularDataNodeImpl.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/utils/SequenceMap.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/utils/TablePrinter.scala (1 hunks)
orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala (1 hunks)
orchestration/src/test/scala/ai/chronon/orchestration/test/SequenceMapSpec.scala (1 hunks)

💤 Files with no reviewable changes (3)

orchestration/src/main/scala/ai/chronon/orchestration/README.md
orchestration/src/main/scala/ai/chronon/orchestration/LineageIndex.scala
orchestration/src/main/scala/ai/chronon/orchestration/LogicalSet.scala

🧰 Additional context used

🪛 LanguageTool

orchestration/README.md

[uncategorized] ~14-~14: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...tic updates - that change output data - eg., logic within where-s, selects, aggrega...

(E_G)

[style] ~14-~14: In American English, abbreviations like “etc.” require a period.
Context: ...c within where-s, selects, aggregations etc - make non-semantic updates - that don'...

(ETC_PERIOD)

[uncategorized] ~15-~15: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...dates - that don't change output data - eg., spark exec memory, # of pods / workers...

(E_G)

[style] ~15-~15: In American English, abbreviations like “etc.” require a period.
Context: ... spark exec memory, # of pods / workers etc - *note that everything that is store...

(ETC_PERIOD)

[uncategorized] ~16-~16: Did you mean the plural noun “APIs” instead of the possessive?
Context: ... is stored in metadata field within our API's are non-semantic fields* - add new node...

(APOS_ARE)

[style] ~19-~19: Consider shortening or rephrasing this to strengthen your wording.
Context: ...add new nodes - delete existing nodes - make changes to several compute nodes at once - decide ...

(MAKE_CHANGES)

[uncategorized] ~36-~36: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...anges to node notated using a plus "+". Eg., Join J3 becomes J3+ Non-Semantic ...

(E_G)

[grammar] ~85-~85: It seems that a verb or a comma is missing.
Context: ...ory of the user is behind remote, > we will a lot more changes than the user intend...

(MD_DT_JJ)

[style] ~87-~87: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ... mitigate this is to, only make the CLI only pick up changes to files listed as edit...

(ADVERB_REPETITION_PREMIUM)

[grammar] ~93-~93: This expression is usually spelled with a hyphen.
Context: ...e job is running is accounted for. ### Non Semantic updates Instead, if sq1 changes non-s...

(NON_ANTI_JJ)

[grammar] ~115-~115: The verb ‘compute’ does not usually follow articles like ‘the’. Check that ‘compute’ is spelled correctly; using ‘compute’ as a noun may be non-standard.
Context: ...s who haven't updated sq1 are causing the compute, we should use sq1 - if another branc...

(A_INFINITIVE)

[style] ~116-~116: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ng sq1 non semantically to sq1** we need to use that instead. ### Adding new node...

(REP_NEED_TO_VB)

[uncategorized] ~141-~141: Possible missing article found.
Context: ...olor:black,stroke:#333 ``` But adding non-leaf node - as parent to existing node...

(AI_HYDRA_LEO_MISSING_A)

[uncategorized] ~142-~142: Possible missing article found.
Context: ...ut adding non-leaf node - as parent to existing node - would almost always cause semant...

(AI_HYDRA_LEO_MISSING_AN)

[uncategorized] ~167-~167: Possible missing article found.
Context: ...his is not a leaf node, absorbing it as same as a leaf node change would be the rig...

(AI_HYDRA_LEO_MISSING_THE)

[uncategorized] ~191-~191: The adjective or adverb “straightforward” (= easy, precise) is spelled as one word.
Context: ... existing nodes Deleting leaf nodes is straight forward. We just need to program a cleanup mech...

(STRAIGHT_FORWARD_COMPOUND)

[formatting] ~234-~234: Consider inserting a comma after an introductory phrase for better readability.
Context: ... node - or the compilation would fail. In these cases it would be ideal to garbage collect up...

(IN_THAT_CASE_COMMA)

[typographical] ~260-~260: Consider inserting a comma for improved readability.
Context: ...e development on a branch is in progress we need to create temporary data assets fo...

(MISSING_COMMAS)

🪛 Markdownlint (0.37.0)

orchestration/README.md

28-28: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

273-273: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (36)

orchestration/src/main/scala/ai/chronon/orchestration/utils/SequenceMap.scala (4)

1-4: Looks good.

27-39: Insert logic is clear. Null checks are solid.

41-49: Containment check is straightforward.

61-70: Preconditions guard against invalid access well.

orchestration/src/test/scala/ai/chronon/orchestration/test/SequenceMapSpec.scala (1)

1-6: Test imports and setup look fine.

orchestration/src/main/scala/ai/chronon/orchestration/RepoParser.scala (4)

4-4: Looks good.

8-8: No issues.

27-40: Check error handling for invalid directories.
If compileRoot is missing or incorrect, consider handling it gracefully.

52-52: No changes needed.

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (9)

3-9: Imports look fine.

10-10: New class definition is ok.

12-15: Evaluate concurrency.
If used in parallel, consider thread-safe structures.

16-19: Sequencer usage is straightforward.

106-122: Implementation is clear.
Check for potential performance issues with large incomingFileHashes.

124-129: Short and clean.

131-150: Good content-pruning logic.
Be mindful if data is reintroduced.

152-160: File-to-nodes mapping is neat.

164-184: RepoIndex object looks fine.
Helper methods are concise.

orchestration/src/main/scala/ai/chronon/orchestration/logical/LogicalNodeImpl.scala (1)

8-8: Consistent new requirement for naming.

orchestration/src/main/scala/ai/chronon/orchestration/logical/TabularDataNodeImpl.scala (1)

8-8: Prefix usage is consistent.

orchestration/src/main/scala/ai/chronon/orchestration/logical/ModelNodeImpl.scala (1)

12-12: Good prefix alignment with other nodes.

orchestration/src/main/scala/ai/chronon/orchestration/RepoNodes.scala (3)

7-10: Clean and clear data structure.

12-16: Simple composition.

18-24: Neat operator methods.

orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala (1)

12-12: Consistent naming prefix.

orchestration/src/main/scala/ai/chronon/orchestration/logical/GroupByNodeImpl.scala (1)

12-12: Naming alignment looks good.

orchestration/src/main/scala/ai/chronon/orchestration/VersionUpdate.scala (3)

8-19: Concise representation.

23-36: Efficient set manipulation.

38-44: Clear printing logic.

orchestration/src/main/scala/ai/chronon/orchestration/logical/JoinNodeImpl.scala (1)

13-13: Consistent naming prefix is helpful

Works fine. Good consistent naming convention with the prefix "joins."

orchestration/src/main/scala/ai/chronon/orchestration/utils/TablePrinter.scala (3)

3-30: Looks solid & straightforward

Clear printing logic. Calculation of column widths and usage of format strings is concise.

32-35: Typed printer is handy

Encourages reuse with implicit mapping. Nicely done.

38-63: Sample usage

Nice demonstration of the table printer usage.

orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala (3)

11-18: TestConf nicely structured

Your case class is simple and effective for test data.

39-52: Example driver is good

Demonstrates usage with minimal overhead.

54-116: Tests appear thorough

They comprehensively cover branch updates. Great coverage.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (1)

50-78: Watch for deep recursion.
Consider cycle detection or limit depth.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between ad0c7c4 and 4beeee3.

📒 Files selected for processing (3)

build.sbt (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/RepoParser.scala (1 hunks)

🔇 Additional comments (15)

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (8)

11-34: Check concurrency.
Using mutable maps might cause issues if used in parallel. Otherwise looks fine.

36-45: Content retrieval logic is concise.
Looks good.

81-108: Versioning approach seems fine.
Integration with versionSequencer is clear.

110-125: Mismatch detection is straightforward.
Works as intended.

127-133: Branch pruning is fine.
Removes branch reference and triggers cleanup.

134-153: Content cleanup logic is neat.
Filters valid hashes across branches.

155-163: Integration of parse + addNodes is clean.
No obvious issues.

169-185: Helper classes and aliases are well-organized.
Implementation looks solid.

build.sbt (2)

402-403: Resource directory addition is correct.
No problem.

411-414: Dependency updates look fine.
Check that log4j config is safely managed.

orchestration/src/main/scala/ai/chronon/orchestration/RepoParser.scala (5)

3-3: Imports are fine.
No concerns.

10-10: Extending App is convenient.
Keeps it simple.

12-17: Resource closure is good.
Frees file resources properly.

19-32: Hash generation is clear.
Minimal overhead.

35-44: Maps names to contents well.
Reads files succinctly.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (6)

orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala (1)

54-157: Add specific assertions for version updates

The test verifies basic functionality but lacks assertions for specific version update behaviors.

Add assertions to verify:

Version increments

Branch-specific version states

Revert behaviors
orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (5)
11-20: Consider thread safety implications

Multiple mutable collections without synchronization could lead to race conditions in concurrent scenarios.

Consider using synchronized collections or actor-based state management.

21-114: Split method into smaller functions

Method handles too many responsibilities. Consider extracting:

Node content validation

Hash computation

Version update logic

71-71: Consider using debug level for hash computation logs

Production logs should use appropriate log levels.
-      logger.info(s"codeString: $codeString")
+      logger.debug(s"codeString: $codeString")
117-132: Rename method for clarity

diff is too generic. Consider getMissingFiles or findUnindexedFiles.

134-160: Add branch existence validation

pruneBranch should verify branch exists before removal.
 def pruneBranch(branch: Branch): Unit = {
+  require(branchToFileHash.contains(branch), s"Branch $branch does not exist")
   branchToFileHash.remove(branch)
   pruneContents()
 }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4beeee3 and 866e5c6.

📒 Files selected for processing (3)

orchestration/src/main/resources/log4j2.properties (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (1 hunks)
orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala (1 hunks)

✅ Files skipped from review due to trivial changes (1)

orchestration/src/main/resources/log4j2.properties

🔇 Additional comments (2)

orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala (1)

68-82: LGTM!

Well-structured helper methods with good separation of concerns.

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (1)

174-194: Clean implementation of utility functions

Well-structured helper methods with clear type aliases.

coderabbitai · 2025-01-04T07:41:44Z

orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala

+    )
+
+
+    override def parse(name: String, fileContent: FileContent): Seq[TestConf] = ???


⚠️ Potential issue

Implement parse method or mark as unsupported

The unimplemented parse method should either be implemented or throw UnsupportedOperationException with a clear message.

- override def parse(name: String, fileContent: FileContent): Seq[TestConf] = ??? + override def parse(name: String, fileContent: FileContent): Seq[TestConf] = + throw new UnsupportedOperationException("Parse method not needed for tests")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

override def parse(name: String, fileContent: FileContent): Seq[TestConf] = ???

override def parse(name: String, fileContent: FileContent): Seq[TestConf] =

throw new UnsupportedOperationException("Parse method not needed for tests")

coderabbitai · 2025-01-04T07:41:44Z

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala

+  def addFiles(fileHashes: mutable.Map[Name, FileHash], updatedFiles: Map[String, String], branch: Branch): Unit = {
+
+    val nodes: Seq[T] = updatedFiles.iterator.flatMap {
+      case (name, content) =>
+        proc.parse(name, FileContent(content))
+    }.distinct
+
+    addNodes(fileHashes, nodes, branch)
+  }


⚠️ Potential issue

Handle parse failures explicitly

Add error handling for failed node parsing.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (8)

orchestration/src/main/scala/ai/chronon/orchestration/RepoTypes.scala (3)

15-16: LocalData structure.
Clear fields. Consider unit tests.

23-25: FileContent hash.
Consider cryptographic hash for collisions.

29-32: ConfProcessor trait.
Well-defined. Test coverage recommended.

api/py/ai/chronon/staging_query.py (5)

1-4: Add minimal docstring.

A concise docstring in this file would help clarify the module's intent and usage.

10-17: Prevent confusion on parameter ordering.

These assertions are good, but consider a more explicit error message to guide future maintainers if the constructor signature shifts.

18-24: Cross-check parameter overlap at runtime.

Raising a clear error on intersected params is crucial; consider also logging which param triggered it for more clarity.

26-35: Safeguard against large custom JSON.

Storing all leftover args in customJson might cause unwieldy payloads. Consider bounding or logging if it's large.

42-42: Use a new name to avoid re-shadowing.

Reassigning StagingQuery can be confusing if other modules expect the original. A new alias might be clearer.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 866e5c6 and 8f1145d.

📒 Files selected for processing (2)

api/py/ai/chronon/staging_query.py (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/RepoTypes.scala (1 hunks)

🔇 Additional comments (11)

orchestration/src/main/scala/ai/chronon/orchestration/RepoTypes.scala (10)

1-2: Clear package definition.
Looks good.

3-4: Concise object declaration.
No issues found.

5-6: Simple wrapper class.
Works as intended.

7-8: Branch class for naming.
Seems fine.

9-10: Hash class usage.
Straightforward.

11-12: LocalHash approach.
Consistent with FileHash.

13-14: GlobalHash representation.
Looks correct.

17-19: Branch singleton 'main'.
Good default reference.

21-22: Version class.
No concerns.

27-28: Table class.
No issues.

api/py/ai/chronon/staging_query.py (1)

9-9: Neat pattern for dynamic dispatch.

The _metadata_shim approach is clever and flexible.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala (2)
13-17: Add scaladoc for TestConf

Document the purpose and fields of this test configuration class.
+/**
+ * Test configuration case class representing a node in the repository.
+ *
+ * @param name Node identifier
+ * @param queryVersion Version of the query
+ * @param params Configuration parameters
+ * @param parents Parent node identifiers
+ * @param outputs Output node identifiers
+ */
 case class TestConf(name: String,
36-139: Break down the test case for better maintainability

Large test case should be split into smaller, focused test cases for each scenario:

Initial setup

Branch creation

Version updates

Reverts

Node deletions

New node additions
+"RepoIndex" should {
+  // Setup code here...
+
+  it should "handle initial node setup" in {
+    // Test initial setup
+  }
+
+  it should "handle version updates in branch" in {
+    // Test version updates
+  }
+
+  // More test cases...
+}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 8f1145d and 2fa86a4.

📒 Files selected for processing (1)

orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala (1 hunks)

🔇 Additional comments (2)

orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala (2)

1-11: LGTM!

Clean imports and proper test class setup.

33-33: Implement parse method or mark as unsupported

The unimplemented parse method needs attention.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (1)
34-81: Optimize hash computation

Long method with nested logic. Consider extracting parent hash computation.
- def computeGlobalHash(name: Name): GlobalHash = {
+ private def computeParentHashes(parents: Seq[Name]): String = {
+   parents.map { parent =>
+     val parentHash = computeGlobalHash(parent).hash
+     s"${parent.name}:$parentHash"
+   }.mkString(",")
+ }
+
+ def computeGlobalHash(name: Name): GlobalHash = {

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 2fa86a4 and a77ee24.

📒 Files selected for processing (6)

api/py/ai/chronon/utils.py (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/RepoTypes.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/VersionUpdate.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/utils/StringExtensions.scala (1 hunks)
orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala (1 hunks)

✅ Files skipped from review due to trivial changes (1)

api/py/ai/chronon/utils.py

🚧 Files skipped from review as they are similar to previous changes (3)

orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala
orchestration/src/main/scala/ai/chronon/orchestration/VersionUpdate.scala
orchestration/src/main/scala/ai/chronon/orchestration/RepoTypes.scala

🔇 Additional comments (3)

orchestration/src/main/scala/ai/chronon/orchestration/utils/StringExtensions.scala (1)

14-21: 8-char truncation significantly reduces collision resistance

Taking only first 8 chars of hash increases collision probability.

Run this to check for potential collisions:

✅ Verification successful

Truncated MD5 usage is appropriate for current implementation

The 8-char MD5 truncation is used for caching and identification purposes, not for security-critical operations. Current usage patterns in the codebase show this is an acceptable trade-off between uniqueness and brevity.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Find potential hash collisions in codebase
rg -l 'md5' | xargs cat | grep -o '"[^"]*"' | sort | uniq -d
Length of output: 415

Script:
#!/bin/bash
# Find actual usage of md5 function
ast-grep --pattern 'md5'

# Also search for direct string references
rg -B 2 -A 2 '\.md5'
Length of output: 7922
orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (2)

14-20: Document mutable state management

Multiple mutable maps without thread safety controls. Consider documenting thread-safety requirements.

166-174: Handle parse failures explicitly

Previous review comment still applies. Add error handling for failed node parsing.

orchestration/src/main/scala/ai/chronon/orchestration/utils/StringExtensions.scala

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (1)
284-284: 🛠️ Refactor suggestion

Avoid using null

Replace null with Option[T] for better type safety.

Apply this diff:
- val artifactContent = NodeContent[T](artifactData, null)
+ val artifactContent = NodeContent[T](artifactData, None)

🧹 Nitpick comments (3)

orchestration/src/main/scala/ai/chronon/orchestration/VersionUpdate.scala (1)

21-33: Consider filtering unchanged entries

In the join method, you might filter out VersionUpdates where no changes occurred to focus on relevant updates.
orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (2)
133-133: Typo in log message

Avoid repeated 'into' in the log message.

Apply this diff:
- logger.info("Not a dry run! Inserting new nodes into the index into branch: " + branch.name)
+ logger.info("Not a dry run! Inserting new nodes into the index for branch: " + branch.name)
164-172: Use Name type for consistency

In addFiles, change updatedFiles keys to Name for consistency.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between a77ee24 and 1f14c98.

📒 Files selected for processing (4)

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/VersionUpdate.scala (1 hunks)
orchestration/src/main/scala/ai/chronon/orchestration/utils/SequenceMap.scala (1 hunks)
orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala

🔇 Additional comments (1)

orchestration/src/main/scala/ai/chronon/orchestration/utils/SequenceMap.scala (1)

14-79: Looks good

The SequenceMap implementation is clear and effective.

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

build.sbt (1)

404-405: Consolidate redundant main class configurations.

Both configurations point to the same class. Consider using a variable:

+  lazy val mainClassName = "ai.chronon.orchestration.RepoParser"
-  assembly / mainClass := Some("ai.chronon.orchestration.RepoParser"),
-  Compile / run / mainClass := Some("ai.chronon.orchestration.RepoParser"),
+  assembly / mainClass := Some(mainClassName),
+  Compile / run / mainClass := Some(mainClassName),

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 1f14c98 and 35fe7ea.

📒 Files selected for processing (1)

build.sbt (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: other_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: join_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: no_spark_scala_tests

🔇 Additional comments (2)

build.sbt (2)

406-406: LGTM!

Standard configuration for including resources.

409-409: Verify log4j-api-scala version.

Version "13.1.0" seems unusually high. The latest version is 2.20.0.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (1)

61-164: Consider breaking down addNodes for better maintainability

The method is complex with nested functions. Consider extracting computeGlobalHash and version computation logic into separate methods.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 35fe7ea and ce411b4.

📒 Files selected for processing (2)

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (1 hunks)
orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: other_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: join_spark_tests
GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: fetcher_spark_tests
GitHub Check: no_spark_scala_tests

🔇 Additional comments (10)

orchestration/src/test/scala/ai/chronon/orchestration/test/RepoIndexSpec.scala (6)

40-40: Implement parse method or mark as unsupported
-    override def parse(name: String, fileContent: FileContent): Seq[TestConf] = ???
+    override def parse(name: String, fileContent: FileContent): Seq[TestConf] = 
+      throw new UnsupportedOperationException("Parse method not needed for tests")
58-85: LGTM! Well-structured helper method

87-111: LGTM! Comprehensive test setup

113-143: LGTM! Thorough test coverage for branch updates

144-154: LGTM! Good coverage of deletion scenarios

156-180: LGTM! Comprehensive test coverage
orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala (4)

12-52: LGTM! Well-documented class structure

287-290: Avoid null in Scala
-        val artifactContent = NodeContent[T](artifactData, null)
+        val artifactContent = NodeContent[T](artifactData, None)
185-190: Prevent exception when fileHashMap is None
-          lazy val fileHashAbsentForName = !fileHashMap.get.contains(incomingHash)
+          lazy val fileHashAbsentForName = !fileHashMap.exists(_.contains(incomingHash))
200-231: LGTM! Clean implementation of pruning logic

piyush-zlai

Largely looks good to me. Dropped a few questions to clarify stuff and some comments. As Varant was also going to look as he's directly integrating immediately, I'll leave to him to stamp once he's gone through.

piyush-zlai · 2025-01-08T16:23:53Z

orchestration/src/main/scala/ai/chronon/orchestration/RepoTypes.scala

+  }
+
+  /**
+    * Node content represents the actual data that is stored in the index.


this comment seems off right? Should be for the NodeContent case class below

piyush-zlai · 2025-01-09T15:21:01Z

api/py/ai/chronon/utils.py

@@ -262,6 +262,7 @@ def join_part_output_table_name(join, jp, full_name: bool = False):
    def partOutputTable(jp: JoinPart): String = (Seq(join.metaData.outputTable) ++ Option(jp.prefix) :+
      jp.groupBy.metaData.cleanName).mkString("_")
    """
+    print(join)


piyush-zlai · 2025-01-09T15:21:47Z

build.sbt

+    libraryDependencies ++= Seq(
+      "org.apache.logging.log4j" %% "log4j-api-scala" % "13.1.0",
+      "org.apache.logging.log4j" % "log4j-core" % "2.20.0",
+//      "org.slf4j" % "slf4j-api" % slf4jApiVersion,


piyush-zlai · 2025-01-09T15:24:43Z

orchestration/README.md

+
+## Scenarios within experimentation
+
+While developing on a branch - users could


A q - wdyt about supporting environments (e.g. qa / preprod / ..). In my exp at Stripe and other places you might have a branch say named master but the datasets and topics in 'qa' would differ from 'prod'. Would we model envs differently or map them to branches?
(Can discuss this async as well - thought of it so wanted to flag)

i am modeling this more along the lines of sqlmesh - one prod data environment in which branches operate.

ml needs full scale parallel end-points so i was treating that as prio. Env's can be added on later also.

piyush-zlai · 2025-01-09T15:28:13Z

orchestration/README.md

+One approach to mitigate this is to, only make the CLI only pick up changes to files listed as edited by 
+commits to the git branch. 
+
+Another approach is to force user to rebase on any change to the repo. However, this does not 


I feel like this might be the better / safer option if we're trying to reuse etc. A super outdated user repo and prod might lead us to funky / hard to debug issues I fear..

versioning should prevent that.

I am afraid that forcing a rebase would force progress to be lost - expensive backfills, model training runs etc

piyush-zlai · 2025-01-09T15:38:50Z

orchestration/README.md

+
+### Merging changes into `main` 
+
+- Deletes should actually trigger asset and pipeline clean up.


some of these we might want to consider 'soft' deletes (mark for hard delete n days down the line) to guard against users causing prod outages (this is not strictly our problem but it's a nice guard rail)

piyush-zlai · 2025-01-09T15:43:53Z

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala

+  *
+  * NOTE: This class is not thread safe. Only one thread should access this index at a time.
+  */
+class RepoIndex[T >: Null](proc: ConfProcessor[T]) extends Logging {


curious is the plan to also persist addFiles invocations to the KV store or will we add a different api to sync the contents of the repo index to the kv store when we've added all the relevant files to a given branch?

i am hoping temporal will magically make persistence work. I might be wrong.

piyush-zlai · 2025-01-09T15:48:37Z

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala

+    */
+  private val versionSequencer: SequenceMap[Name, GlobalHash] = new SequenceMap[Name, GlobalHash]
+
+  /**


missing desc of method?

piyush-zlai · 2025-01-09T15:55:48Z

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala

+        case (name, incomingHash) =>
+          val fileHashMap = fileHashToContent.get(name)
+
+          lazy val nameAbsentInIndex = fileHashMap.isEmpty


these don't need to be lazy right?

true will revert

piyush-zlai · 2025-01-09T15:59:58Z

orchestration/src/main/scala/ai/chronon/orchestration/RepoParser.scala


 import java.io.File
-import scala.util.Failure
-import scala.util.Success

 // parses a folder of


incomplete?

will change.

## Summary Writing down the cases & logic behind branching. Will continue writing about how these are internally represented. ## Checklist - [x] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [x] Documentation update  ## Summary by CodeRabbit ## Release Notes - **Bug Fixes** - Corrected a typo in `TableDependency` struct, renaming `forceComputae` to `forceCompute`. - **New Features** - Added support for branch-based workflows in data orchestration. - Introduced `RepoIndex` for managing repository data and versions. - Created `SequenceMap` utility for managing unique value sequences. - Added `TablePrinter` for formatted data output. - Introduced `VersionUpdate` class for tracking version changes. - Added `StringExtensions` for MD5 hashing of strings. - **Refactoring** - Removed `LineageIndex` and `LogicalSet` classes. - Updated naming conventions for logical nodes. - Restructured repository parsing and version tracking mechanisms. - **Documentation** - Added comprehensive README for branch workflow management. - Expanded documentation for global planning and batch workload processing. - **Chores** - Updated logging dependencies. - Modified build configuration. - Added new test specifications.

## Summary Writing down the cases & logic behind branching. Will continue writing about how these are internally represented. ## Cheour clientslist - [x] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [x] Documentation update  ## Summary by CodeRabbit ## Release Notes - **Bug Fixes** - Corrected a typo in `TableDependency` struct, renaming `forceComputae` to `forceCompute`. - **New Features** - Added support for branch-based workflows in data orchestration. - Introduced `RepoIndex` for managing repository data and versions. - Created `SequenceMap` utility for managing unique value sequences. - Added `TablePrinter` for formatted data output. - Introduced `VersionUpdate` class for traour clientsing version changes. - Added `StringExtensions` for MD5 hashing of strings. - **Refactoring** - Removed `LineageIndex` and `LogicalSet` classes. - Updated naming conventions for logical nodes. - Restructured repository parsing and version traour clientsing mechanisms. - **Documentation** - Added comprehensive README for branch workflow management. - Expanded documentation for global planning and batch workload processing. - **Chores** - Updated logging dependencies. - Modified build configuration. - Added new test specifications.

branch logic write up

76d15ee

coderabbitai bot reviewed Dec 28, 2024

View reviewed changes

[WIP] Branch versioning logic - RepoIndex

ad0c7c4

nikhil-zlai changed the title ~~[wip] branch logic write up~~ [WIP] Branch versioning logic Jan 4, 2025

coderabbitai bot reviewed Jan 4, 2025

View reviewed changes

logging - directly use log4j2-scala-api

4beeee3

coderabbitai bot reviewed Jan 4, 2025

View reviewed changes

tests

866e5c6

coderabbitai bot reviewed Jan 4, 2025

View reviewed changes

untracked files

8f1145d

coderabbitai bot reviewed Jan 4, 2025

View reviewed changes

dead code removal

2fa86a4

coderabbitai bot reviewed Jan 4, 2025

View reviewed changes

working tests

a77ee24

coderabbitai bot reviewed Jan 8, 2025

View reviewed changes

orchestration/src/main/scala/ai/chronon/orchestration/utils/StringExtensions.scala Show resolved Hide resolved

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala Show resolved Hide resolved

test checks and comments

1f14c98

coderabbitai bot reviewed Jan 9, 2025

View reviewed changes

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala Show resolved Hide resolved

varant-zlai reviewed Jan 9, 2025

View reviewed changes

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala Outdated Show resolved Hide resolved

varant-zlai reviewed Jan 9, 2025

View reviewed changes

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala Outdated Show resolved Hide resolved

varant-zlai reviewed Jan 9, 2025

View reviewed changes

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala Outdated Show resolved Hide resolved

varant-zlai reviewed Jan 9, 2025

View reviewed changes

orchestration/src/main/scala/ai/chronon/orchestration/RepoIndex.scala Outdated Show resolved Hide resolved

Merge branch 'main' into orch-2

35fe7ea

coderabbitai bot reviewed Jan 9, 2025

View reviewed changes

comments - functional global hash computation + comments fixes

ce411b4

coderabbitai bot reviewed Jan 9, 2025

View reviewed changes

sl4j in api for thrift - or it fails downstream

9c58f34

nikhil-zlai changed the title ~~[WIP] Branch versioning logic~~ Branch versioning logic Jan 9, 2025

Merge branch 'main' into orch-2

de72d1f

nikhil-zlai enabled auto-merge (squash) January 9, 2025 05:22

piyush-zlai reviewed Jan 9, 2025

View reviewed changes

piyush-zlai approved these changes Jan 10, 2025

View reviewed changes

nikhil-zlai merged commit bcc7d65 into main Jan 10, 2025
11 checks passed

nikhil-zlai deleted the orch-2 branch January 10, 2025 03:30

		)


		override def parse(name: String, fileContent: FileContent): Seq[TestConf] = ???

	override def parse(name: String, fileContent: FileContent): Seq[TestConf] = ???
	override def parse(name: String, fileContent: FileContent): Seq[TestConf] =
	throw new UnsupportedOperationException("Parse method not needed for tests")


		## Scenarios within experimentation

		While developing on a branch - users could


		### Merging changes into `main`

		- Deletes should actually trigger asset and pipeline clean up.

Branch versioning logic #163

Branch versioning logic #163

Uh oh!

Conversation

nikhil-zlai commented Dec 28, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Review ran into problems

Finishing Touches

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

piyush-zlai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikhil-zlai commented Dec 28, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 28, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

nikhil-zlai Jan 10, 2025 •

edited

Loading