-
Notifications
You must be signed in to change notification settings - Fork 2
chore: move modular architecture pieces into dedicated module #590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Thomas Chow <[email protected]>
WalkthroughThis pull request introduces a new Changes
Possibly related PRs
Suggested reviewers
Poem
Warning Review ran into problems🔥 ProblemsGitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms (14)
🔇 Additional comments (3)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Co-authored-by: Thomas Chow <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1)
78-78
: TODO note: Found a TODO regarding DF, time range, and bloom filter args. Please track or resolve soon.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (13)
.github/workflows/test_scala_2_12_spark.yaml
(1 hunks)spark/BUILD.bazel
(2 hunks)spark/src/main/scala/ai/chronon/spark/Driver.scala
(2 hunks)spark/src/main/scala/ai/chronon/spark/Join.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/JoinBase.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/BootstrapJob.scala
(2 hunks)spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
(2 hunks)spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/batch/ModularJoinTest.scala
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (8)
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)
spark/src/main/scala/ai/chronon/spark/batch/BootstrapJob.scala (1)
BootstrapJob
(25-187)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (2)
spark/src/main/scala/ai/chronon/spark/Driver.scala (1)
main
(1152-1191)spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (1)
main
(133-144)
spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala (2)
spark/src/main/scala/ai/chronon/spark/Driver.scala (1)
main
(1152-1191)spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (1)
main
(133-144)
spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (2)
spark/src/main/scala/ai/chronon/spark/Driver.scala (1)
main
(1152-1191)spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (1)
main
(133-144)
spark/src/main/scala/ai/chronon/spark/batch/BootstrapJob.scala (2)
spark/src/main/scala/ai/chronon/spark/Driver.scala (1)
main
(1152-1191)spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (1)
main
(133-144)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (1)
spark/src/main/scala/ai/chronon/spark/Join.scala (1)
Join
(69-545)
spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (2)
spark/src/main/scala/ai/chronon/spark/Driver.scala (1)
main
(1152-1191)spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (1)
main
(133-144)
spark/src/main/scala/ai/chronon/spark/Driver.scala (2)
spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (1)
SourceJob
(19-85)spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala (1)
MergeJob
(24-144)
🪛 actionlint (1.7.4)
.github/workflows/test_scala_2_12_spark.yaml
48-48: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
🔇 Additional comments (32)
spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (2)
1-1
: Package update: New package declaration aligns with dedicated batch module.
10-10
: Import additions: Added GroupBy, JoinUtils, and TableUtils import; looks good.spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala (2)
1-1
: Package update: Package declaration updated for batch module.
9-9
: Import additions: New import for JoinUtils and TableUtils added; consistent change.spark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)
173-174
: BootstrapJob import: Imported BootstrapJob from batch module—ensure call sites are updated accordingly.spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (2)
1-1
: Package update: Package declaration now uses the dedicated batch namespace.
9-9
: Import addition: Added import for TableUtils to support shared dependencies; looks fine.spark/src/main/scala/ai/chronon/spark/Join.scala (1)
253-253
: BootstrapJob import: New import for BootstrapJob from batch module added for computeRange; approved..github/workflows/test_scala_2_12_spark.yaml (2)
47-71
: New batch_tests job looks good!Adds a dedicated job for testing the batch module components.
🧰 Tools
🪛 actionlint (1.7.4)
48-48: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
48-48
:✅ Verification successful
Runner label might need verification.
The runner label
ubuntu-8_cores-32_gb
is flagged by static analysis but appears to be a custom self-hosted runner used throughout the workflow.
🏁 Script executed:
#!/bin/bash grep -n "runs-on:" .github/workflows/*.yaml | grep "ubuntu-8_cores"Length of output: 1629
Runner label confirmed. The
ubuntu-8_cores-32_gb
label is consistently used in multiple workflows (e.g., in bothtest_scala_2_12_spark.yaml
andtest_scala_2_13_spark.yaml
), so no change is needed.🧰 Tools
🪛 actionlint (1.7.4)
48-48: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
spark/src/main/scala/ai/chronon/spark/batch/BootstrapJob.scala (2)
1-1
: Package restructuring LGTM.Package declaration changed to dedicated batch module.
10-10
: Import adjustment for refactored modules.Properly imports needed classes that remain in the original package.
spark/BUILD.bazel (2)
91-100
: New batch_test suite setup looks good.Creates a dedicated test suite for the batch module that includes tests from the batch directory.
186-186
: Fixed snakeyaml input artifact.Restored correct dependency reference.
spark/src/main/scala/ai/chronon/spark/Driver.scala (2)
836-836
: Import updated for relocated SourceJob class.Properly adjusts import to use the class from its new package location.
960-960
: Import updated for relocated MergeJob class.Properly adjusts import to use the class from its new package location.
spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (4)
1-2
: Pkg update. Package renamed to new batch module.
4-4
: Import order. Adjusted Extensions import as intended.
6-6
: Constants import. Import of Constants and DateRange is proper.
10-10
: TableUtils import. New dependency correctly added.spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (6)
1-1
: Pkg update. Package updated to reflect test.batch grouping.
4-4
: API import. Direct import is fine.
5-5
: Extensions import. Looks correct and concise.
7-7
: Spark Extensions. Import is updated appropriately.
8-8
: Test utils. Import for DataFrameGen and TableTestUtils is proper.
10-10
: SparkSession. Import simplified as desired.spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (3)
17-17
: Pkg update. Package now reflects the new batch module.
22-22
: Import update. Added PartitionRange with EngineType and ParametricMacro.
24-24
: Spark utilities. New imports (Args, SparkSessionBuilder, TableUtils) are correct.spark/src/test/scala/ai/chronon/spark/test/batch/ModularJoinTest.scala (3)
1-1
: Pkg update. Package updated to test.batch.
7-13
: Import consolidation. Grouped join node imports for brevity.
16-16
: Test utils. Import for DataFrameGen and TableTestUtils is correctly refined.
@@ -833,6 +833,7 @@ object Driver { | |||
} | |||
|
|||
def run(args: Args): Unit = { | |||
import spark.src.main.scala.ai.chronon.spark.batch.SourceJob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
@@ -1,4 +1,4 @@ | |||
package ai.chronon.spark | |||
package spark.src.main.scala.ai.chronon.spark.batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesnt seem right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops fixed! Intellij got confused on the refactor.
Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>
## Summary - Make this a dedicated module, can test this separate from other things. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a dedicated CI workflow and test suite for validating batch processing operations. - **Refactor** - Reorganized package structures and streamlined import statements across batch components to enhance code clarity and maintainability. - **Chores** - Updated build configurations to restore essential dependencies for shaded libraries. - **Tests** - Realigned test packages and consolidated import statements, improving the structure and consistency of batch-related tests. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- av pr metadata This information is embedded by the av CLI when creating PRs to track the status of stacks when using Aviator. Please do not delete or edit this section of the PR. ``` {"parent":"main","parentHead":"","trunk":"main"} ``` --> --------- Co-authored-by: Thomas Chow <[email protected]>
## Summary - Make this a dedicated module, can test this separate from other things. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a dedicated CI workflow and test suite for validating batch processing operations. - **Refactor** - Reorganized package structures and streamlined import statements across batch components to enhance code clarity and maintainability. - **Chores** - Updated build configurations to restore essential dependencies for shaded libraries. - **Tests** - Realigned test packages and consolidated import statements, improving the structure and consistency of batch-related tests. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- av pr metadata This information is embedded by the av CLI when creating PRs to track the status of stacks when using Aviator. Please do not delete or edit this section of the PR. ``` {"parent":"main","parentHead":"","trunk":"main"} ``` --> --------- Co-authored-by: Thomas Chow <[email protected]>
## Summary - Make this a dedicated module, can test this separate from other things. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a dedicated CI workflow and test suite for validating batch processing operations. - **Refactor** - Reorganized package structures and streamlined import statements across batch components to enhance code clarity and maintainability. - **Chores** - Updated build configurations to restore essential dependencies for shaded libraries. - **Tests** - Realigned test packages and consolidated import statements, improving the structure and consistency of batch-related tests. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- av pr metadata This information is embedded by the av CLI when creating PRs to track the status of stacks when using Aviator. Please do not delete or edit this section of the PR. ``` {"parent":"main","parentHead":"","trunk":"main"} ``` --> --------- Co-authored-by: Thomas Chow <[email protected]>
## Summary - Make this a dedicated module, can test this separate from other things. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a dedicated CI workflow and test suite for validating batch processing operations. - **Refactor** - Reorganized package structures and streamlined import statements across batch components to enhance code clarity and maintainability. - **Chores** - Updated build configurations to restore essential dependencies for shaded libraries. - **Tests** - Realigned test packages and consolidated import statements, improving the structure and consistency of batch-related tests. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- av pr metadata This information is embedded by the av CLI when creating PRs to track the status of stacks when using Aviator. Please do not delete or edit this section of the PR. ``` {"parent":"main","parentHead":"","trunk":"main"} ``` --> --------- Co-authored-by: Thomas Chow <[email protected]>
## Summary - Make this a dedicated module, can test this separate from other things. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a dedicated CI workflow and test suite for validating batch processing operations. - **Refactor** - Reorganized paour clientsage structures and streamlined import statements across batch components to enhance code clarity and maintainability. - **Chores** - Updated build configurations to restore essential dependencies for shaded libraries. - **Tests** - Realigned test paour clientsages and consolidated import statements, improving the structure and consistency of batch-related tests. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- av pr metadata This information is embedded by the av CLI when creating PRs to traour clients the status of staour clientss when using Aviator. Please do not delete or edit this section of the PR. ``` {"parent":"main","parentHead":"","trunk":"main"} ``` --> --------- Co-authored-by: Thomas Chow <[email protected]>
Summary
Checklist
Summary by CodeRabbit
New Features
Refactor
Chores
Tests