chore: split out expensive spark tests to parallelize #382

tchow-zlai · 2025-02-14T04:57:33Z

Summary

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

New Features
- Introduced dedicated testing workflows covering multiple system components to enhance overall reliability.
- Added new test suites for various components to enhance testing granularity.
Refactor
- Streamlined code organization with improved package structures and consolidated imports across test modules.
Chores
- Upgraded automated testing configurations with optimized resource settings for improved performance and stability.

coderabbitai · 2025-02-14T04:57:39Z

Walkthrough

This pull request adds new GitHub Actions jobs and several granular Scala test suites for Spark components. It also reorganizes package declarations and consolidates import statements across test files. The Bazel build file is updated to reflect these new test suites and modified visibility settings. Additionally, a new utility object for fetching tests has been introduced, and method references in streaming tests are adjusted.

Changes

File(s)	Change Summary
`.github/workflows/test_scala_spark.yaml`	Added new CI jobs (fetcher_tests, join_tests, groupby_tests, analyzer_tests, streaming_tests) with updated memory allocation settings.
`spark/BUILD.bazel`	Updated Scala test suites; removed test_lib visibility; added new test suites for fetcher, groupby, join, analyzer, and streaming tests.
`spark/src/test/scala/ai/chronon/spark/test/fetcher/{ChainingFetcherTest.scala, FetcherTest.scala, FetcherTestUtil.scala, JavaFetcherTest.java}`	Reorganized package declarations, consolidated imports; added FetcherTestUtil with joinResponses; updated resource handling.
`spark/src/test/scala/ai/chronon/spark/test/groupby/{GroupByTest.scala, GroupByUploadTest.scala}`	Updated package names and consolidated imports for groupby functionality tests.
`spark/src/test/scala/ai/chronon/spark/test/join/{FeatureWithLabelJoinTest.scala, JoinTest.scala, JoinUtilsTest.scala, LabelJoinTest.scala}`	Reorganized packages, updated imports, and added TestUtils; enhanced label join test cases.
`spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala`	Updated package declaration and consolidated imports in the analyzer test.
`spark/src/test/scala/ai/chronon/spark/test/streaming/{AvroTest.scala, KafkaStreamBuilderTest.scala, MutationsTest.scala, StreamingTest.scala}`	Reorganized package declarations; consolidated imports; updated method references for streaming tests.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as GitHub Runner
    participant Workflow as GitHub Actions
    participant Bazel as Bazel Test Runner

    Runner->>Workflow: Trigger workflow
    Workflow->>Bazel: Run spark_tests job
    par Parallel Jobs
        Workflow->>Bazel: Run fetcher_tests
        Workflow->>Bazel: Run join_tests
        Workflow->>Bazel: Run groupby_tests
        Workflow->>Bazel: Run analyzer_tests
        Workflow->>Bazel: Run streaming_tests
    end
    Bazel-->>Workflow: Return test results
    Workflow-->>Runner: Publish results

Possibly related PRs

Add delta lake integration #51: Enhancements to the GitHub Actions workflow by adding new testing jobs for various Spark components.
fix: Address flaky spark tests #72: Introduction of new testing jobs for Spark components, related to reintroducing a test method in the FeatureWithLabelJoinTest class.
Modify spark test options with higher timeout and java heap size #323: Adjustments to Java options for memory allocation during tests, enhancing the testing framework for Spark.

Suggested reviewers

nikhil-zlai
piyush-zlai
varant-zlai

Poem

In the realm of code so bright,
New tests align to make it right.
Imports refined and jobs in flow,
Packages now in tidy row.
Spark lights up with joyful might 🚀.

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 73fc6e0 and 59bc5dc.

📒 Files selected for processing (1)

.github/workflows/test_scala_spark.yaml (1 hunks)

🧰 Additional context used

🪛 actionlint (1.7.4)

.github/workflows/test_scala_spark.yaml

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

.github/workflows/test_scala_spark.yaml

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: streaming_tests
GitHub Check: join_tests
GitHub Check: analyzer_tests
GitHub Check: groupby_tests
GitHub Check: fetcher_tests
GitHub Check: spark_tests
GitHub Check: scala_compile_fmt_fix

🔇 Additional comments (2)

.github/workflows/test_scala_spark.yaml (2)

23-23: Verify custom runner label configuration.

The runner label ubuntu-8_cores-32_gb is not a standard GitHub-hosted runner. Ensure it's properly configured in actionlint.yaml for self-hosted runners.

Also applies to: 49-49, 75-75, 101-101, 127-127, 153-153

🧰 Tools

🪛 actionlint (1.7.4)

23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

176-176: Add newline at end of file.

Add a newline character at the end of the file to comply with YAML standards.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

.github/workflows/test_scala_spark.yaml (1)
23-23: 💡 Verification agent
🧩 Analysis chain

Verify runner label.

The label "ubuntu-8_cores-32_gb" is not a standard GitHub-hosted runner label.

Also applies to: 49-49

🏁 Script executed:
#!/bin/bash
# Check if this is a self-hosted runner
gh api repos/{owner}/{repo}/actions/runners -q '.runners[].labels[].name' | grep 'ubuntu-8_cores-32_gb' || echo "Label not found"
Length of output: 200
Action Required: Address Workflow Runner and Duplication Issues

Runner Label: The label "ubuntu-8_cores-32_gb" isn’t found via the API, suggesting it’s not a recognized GitHub-hosted runner. Confirm if this custom self-hosted runner is intended or update to a valid label.

Memory Allocation: The increased memory settings in the new job (i.e., -Xmx16G -Xms8G) differ noticeably from the existing job. Ensure these settings are deliberate.

Code Duplication: The new fetcher_tests job duplicates much of the existing setup. Consider refactoring to share common configuration.

Missing Newline: Add a newline at the end of the file.

🧰 Tools

🪛 actionlint (1.7.4)

23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🧹 Nitpick comments (7)

spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (3)
19-19: Consider explicit imports over wildcard.

Wildcard imports can make it harder to track dependencies and may lead to naming conflicts.
-import ai.chronon.api._
+import ai.chronon.api.{Accuracy, Builders, Constants, Operation, TimeUnit, Window}
182-227: Add test data cleanup.

Missing cleanup after test execution could affect subsequent test runs.

Add cleanup in a finally block or use a test fixture:
+  override def afterEach(): Unit = {
+    super.afterEach()
+    tableUtils.dropTable(s"${namespace}.${tableName}")
+  }
434-455: Add ScalaDoc for helper methods.

Document parameters and return values for better maintainability.
+  /**
+   * Creates a test label join configuration.
+   * @param startOffset The left start offset
+   * @param endOffset The left end offset
+   * @param groupBys The group by configurations
+   * @return The label parts configuration
+   */
   def createTestLabelJoin(startOffset: Int,
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (2)
28-29: Convert magic number to named constant.

The chunkSize value of 100 should be defined as a named constant for better maintainability.
+  private val DefaultChunkSize = 100
-    val chunkSize = 100
+    val chunkSize = DefaultChunkSize
79-81: Improve comment clarity.

The comment about overwhelming the profiler needs more context about its purpose.
-    // to overwhelm the profiler with fetching code path
-    // so as to make it prominent in the flamegraph & collect enough stats
+    // Run multiple iterations to generate sufficient profiling data
+    // This ensures the fetching code path is prominent in the flamegraph
+    // and provides statistically significant performance metrics
.github/workflows/test_scala_spark.yaml (2)
48-72: Consider parameterizing job config.

Duplicate setup could be refactored into reusable workflow.
 jobs:
+  .test_base: &test_base
+    runs-on: ubuntu-8_cores-32_gb
+    container:
+      image: ghcr.io/${{ github.repository }}-ci:latest
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+    defaults:
+      run:
+        working-directory: ${{ github.workspace }}
+    steps:
+      - uses: actions/checkout@v4
+      - name: Setup Bazel cache credentials
+        run: |
+          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

   spark_tests:
-    runs-on: ubuntu-8_cores-32_gb
-    container: ...
+    <<: *test_base
     steps:
-      - uses: actions/checkout@v4
-      - name: Setup Bazel cache credentials ...
       - name: Run Spark tests
         run: |
           bazel test \
             --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
             --google_credentials=bazel-cache-key.json \
             --test_env=JAVA_OPTS="-Xmx8G -Xms2G" \
             //spark:tests

   fetcher_tests:
-    runs-on: ubuntu-8_cores-32_gb
-    container: ...
+    <<: *test_base
     steps:
-      - uses: actions/checkout@v4
-      - name: Setup Bazel cache credentials ...
       - name: Run Spark tests
         run: |
           bazel test \
             --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
             --google_credentials=bazel-cache-key.json \
             --test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
             //spark:fetcher_test
🧰 Tools

🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

[error] 72-72: no new line character at the end of file

(new-line-at-end-of-file)

72-72: Add newline at EOF.

Add missing newline character.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 72-72: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between f3e4971 and c07dbbd.

📒 Files selected for processing (14)

.github/workflows/test_scala_spark.yaml (1 hunks)
spark/BUILD.bazel (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/fetcher/ChainingFetcherTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/fetcher/JavaFetcherTest.java (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/FeatureWithLabelJoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/MutationsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/StreamingTest.scala (3 hunks)

✅ Files skipped from review due to trivial changes (8)

spark/src/test/scala/ai/chronon/spark/test/fetcher/JavaFetcherTest.java
spark/src/test/scala/ai/chronon/spark/test/join/FeatureWithLabelJoinTest.scala
spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala
spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/MutationsTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala
spark/src/test/scala/ai/chronon/spark/test/fetcher/ChainingFetcherTest.scala

🧰 Additional context used

🪛 actionlint (1.7.4)

.github/workflows/test_scala_spark.yaml

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

.github/workflows/test_scala_spark.yaml

[error] 72-72: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: scala_compile_fmt_fix
GitHub Check: fetcher_tests
GitHub Check: spark_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (9)

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (2)

1-19: LGTM! Well-organized imports and logger declaration.

20-27: LGTM! Clear method signature with well-defined parameters.

spark/src/test/scala/ai/chronon/spark/test/join/StreamingTest.scala (2)

17-25: LGTM! Good package organization.

Package and import restructuring improves code organization.

54-54: LGTM! Improved method call clarity.

Using qualified name StreamingTest.buildInMemoryKvStore makes the method origin explicit.

Also applies to: 113-113
spark/BUILD.bazel (2)

96-203: LGTM! Well-structured test suite organization.

The new test suites are properly organized by functionality with consistent configuration.

89-94:
❓ Verification inconclusive

Verify impact of removed visibility.

The removal of visibility from test_lib might affect other packages that depend on it.

🏁 Script executed:
#!/bin/bash
# Search for references to the test_lib target
rg -l "//spark:test_lib" | grep BUILD
Length of output: 37
Re-check impact of removed visibility on test_lib

Automated search for references to //spark:test_lib returned no results. However, please manually verify that no other packages depend on this library and need explicit visibility settings.
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala (2)

17-41: LGTM! Improved code organization.

Package relocation and import consolidation enhance code structure.

68-68: LGTM! Better resource loading.

Using Source.fromResource is more reliable than Source.fromFile for test resources.
.github/workflows/test_scala_spark.yaml (1)

71-71:
❓ Verification inconclusive

Verify memory requirements.

Fetcher tests require 2x more memory. Ensure this is necessary.

🏁 Script executed:
#!/bin/bash
# Check memory usage in fetcher tests
rg -A 5 "spark.driver.memory|spark.executor.memory" spark/src/test/scala/ai/chronon/spark/test/fetcher/
Length of output: 105
Please verify memory settings manually.
The regex search in fetcher tests yielded no memory configuration entries, so it's unclear if fetcher tests truly need 2× the memory. Confirm that --test_env=JAVA_OPTS="-Xmx16G -Xms8G" is correct or adjust as needed.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (5)

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (3)
7-7: Fix wildcard import.

Replace import ai.chronon.spark.{Join => _, _} with specific imports to improve code clarity.

82-83: Initialize variables with empty values.

Initialize loggedValues and result with empty values instead of null.
-var loggedValues: Seq[LoggableResponseBase64] = null
-var result: List[Response] = null
+var loggedValues: Seq[LoggableResponseBase64] = Seq.empty
+var result: List[Response] = List.empty
26-26: Make chunkSize configurable.

Consider making chunkSize a parameter for better flexibility.
.github/workflows/test_scala_spark.yaml (2)

48-150: Reduce job configuration duplication.

Consider using job templates or reusable workflows to avoid duplicating configuration.

🧰 Tools

🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

[error] 150-150: no new line character at the end of file

(new-line-at-end-of-file)

150-150: Add newline at end of file.

Add a newline character at the end of the file to comply with YAML standards.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 150-150: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between c07dbbd and 7803143.

📒 Files selected for processing (4)

.github/workflows/test_scala_spark.yaml (1 hunks)
spark/BUILD.bazel (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (1 hunks)

✅ Files skipped from review due to trivial changes (1)

spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala

🚧 Files skipped from review as they are similar to previous changes (1)

spark/BUILD.bazel

🧰 Additional context used

🪛 actionlint (1.7.4)

.github/workflows/test_scala_spark.yaml

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

.github/workflows/test_scala_spark.yaml

[error] 150-150: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: join_tests
GitHub Check: groupby_tests
GitHub Check: analyzer_tests
GitHub Check: fetcher_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: spark_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (1)

.github/workflows/test_scala_spark.yaml (1)

49-49: Verify custom runner label.

The label "ubuntu-8_cores-32_gb" is not a standard GitHub-hosted runner label. If this is a self-hosted runner, update actionlint.yaml config.

Also applies to: 75-75, 101-101, 127-127

🧰 Tools

🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

coderabbitai · 2025-02-14T06:09:20Z

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala

+          System.currentTimeMillis() -> fixedResponses
+        }
+        .flatMap { case (start, future) =>
+          val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis


⚠️ Potential issue

Fix hardcoded timeout value.

The timeout duration is set to 10000 seconds with a TODO comment. Consider making this configurable.

-val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis +val result = Await.result(future, Duration(10, SECONDS))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis

val result = Await.result(future, Duration(10, SECONDS))

⚠️ Potential issue

Fix timeout duration unit.

Change SECONDS to MILLISECONDS as indicated in the TODO comment.

-val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis +val result = Await.result(future, Duration(10000, MILLISECONDS))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis

val result = Await.result(future, Duration(10000, MILLISECONDS))

coderabbitai · 2025-02-14T06:09:21Z

.github/workflows/test_scala_spark.yaml

+      image: ghcr.io/${{ github.repository }}-ci:latest
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+    defaults:
+      run:
+        working-directory: ${{ github.workspace }}
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Bazel cache credentials
+        run: |
+          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
+
+      - name: Run Fetcher tests
+        run: |
+          bazel test \
+            --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
+            --google_credentials=bazel-cache-key.json \
+            --test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
+            //spark:fetcher_test
+
+  join_tests:
+    runs-on: ubuntu-8_cores-32_gb
+    container:
+      image: ghcr.io/${{ github.repository }}-ci:latest
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+    defaults:
+      run:
+        working-directory: ${{ github.workspace }}
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Bazel cache credentials
+        run: |
+          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
+
+      - name: Run Join tests
+        run: |
+          bazel test \
+            --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
+            --google_credentials=bazel-cache-key.json \
+            --test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
+            //spark:join_test
+
+  groupby_tests:
+    runs-on: ubuntu-8_cores-32_gb
+    container:
+      image: ghcr.io/${{ github.repository }}-ci:latest
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+    defaults:
+      run:
+        working-directory: ${{ github.workspace }}
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Bazel cache credentials
+        run: |
+          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
+
+      - name: Run Join tests
+        run: |
+          bazel test \
+            --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
+            --google_credentials=bazel-cache-key.json \
+            --test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
+            //spark:groupby_test
+
+  analyzer_tests:
+    runs-on: ubuntu-8_cores-32_gb
+    container:
+      image: ghcr.io/${{ github.repository }}-ci:latest
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+    defaults:
+      run:
+        working-directory: ${{ github.workspace }}
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Bazel cache credentials
+        run: |
+          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
+
+      - name: Run Join tests
+        run: |
+          bazel test \
+            --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
+            --google_credentials=bazel-cache-key.json \
+            --test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
+            //spark:analyzer_test


⚠️ Potential issue

Refactor duplicate job configurations.

The new jobs have identical configurations except for the test target. Consider using a reusable workflow or job template.

Fix incorrect step names.

The step names for groupby_tests and analyzer_tests incorrectly say "Run Join tests".

- - name: Run Join tests + - name: Run Groupby tests

- - name: Run Join tests + - name: Run Analyzer tests

Add newline at end of file.

Add a newline character at the end of the file to comply with YAML standards.

//spark:analyzer_test +

Verify custom runner label.

The runner label ubuntu-8_cores-32_gb is not a standard GitHub-hosted runner. Ensure it's properly configured for self-hosted runners.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fetcher_tests:

runs-on: ubuntu-8_cores-32_gb

container:

image: ghcr.io/${{ github.repository }}-ci:latest

credentials:

username: ${{ github.actor }}

password: ${{ secrets.GITHUB_TOKEN }}

defaults:

run:

working-directory: ${{ github.workspace }}

steps:

- uses: actions/checkout@v4

- name: Setup Bazel cache credentials

run: |

echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Fetcher tests

run: |

bazel test \

--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \

--google_credentials=bazel-cache-key.json \

--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \

//spark:fetcher_test

join_tests:

runs-on: ubuntu-8_cores-32_gb

container:

image: ghcr.io/${{ github.repository }}-ci:latest

credentials:

username: ${{ github.actor }}

password: ${{ secrets.GITHUB_TOKEN }}

defaults:

run:

working-directory: ${{ github.workspace }}

steps:

- uses: actions/checkout@v4

- name: Setup Bazel cache credentials

run: |

echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Join tests

run: |

bazel test \

--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \

--google_credentials=bazel-cache-key.json \

--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \

//spark:join_test

groupby_tests:

runs-on: ubuntu-8_cores-32_gb

container:

image: ghcr.io/${{ github.repository }}-ci:latest

credentials:

username: ${{ github.actor }}

password: ${{ secrets.GITHUB_TOKEN }}

defaults:

run:

working-directory: ${{ github.workspace }}

steps:

- uses: actions/checkout@v4

- name: Setup Bazel cache credentials

run: |

echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Join tests

run: |

bazel test \

--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \

--google_credentials=bazel-cache-key.json \

--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \

//spark:groupby_test

analyzer_tests:

runs-on: ubuntu-8_cores-32_gb

container:

image: ghcr.io/${{ github.repository }}-ci:latest

credentials:

username: ${{ github.actor }}

password: ${{ secrets.GITHUB_TOKEN }}

defaults:

run:

working-directory: ${{ github.workspace }}

steps:

- uses: actions/checkout@v4

- name: Setup Bazel cache credentials

run: |

echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Join tests

run: |

bazel test \

--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \

--google_credentials=bazel-cache-key.json \

--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \

//spark:analyzer_test

fetcher_tests:

runs-on: ubuntu-8_cores-32_gb

container:

image: ghcr.io/${{ github.repository }}-ci:latest

credentials:

username: ${{ github.actor }}

password: ${{ secrets.GITHUB_TOKEN }}

defaults:

run:

working-directory: ${{ github.workspace }}

steps:

- uses: actions/checkout@v4

- name: Setup Bazel cache credentials

run: |

echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Fetcher tests

run: |

bazel test \

--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \

--google_credentials=bazel-cache-key.json \

--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \

//spark:fetcher_test

join_tests:

runs-on: ubuntu-8_cores-32_gb

container:

image: ghcr.io/${{ github.repository }}-ci:latest

credentials:

username: ${{ github.actor }}

password: ${{ secrets.GITHUB_TOKEN }}

defaults:

run:

working-directory: ${{ github.workspace }}

steps:

- uses: actions/checkout@v4

- name: Setup Bazel cache credentials

run: |

echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Join tests

run: |

bazel test \

--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \

--google_credentials=bazel-cache-key.json \

--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \

//spark:join_test

groupby_tests:

runs-on: ubuntu-8_cores-32_gb

container:

image: ghcr.io/${{ github.repository }}-ci:latest

credentials:

username: ${{ github.actor }}

password: ${{ secrets.GITHUB_TOKEN }}

defaults:

run:

working-directory: ${{ github.workspace }}

steps:

- uses: actions/checkout@v4

- name: Setup Bazel cache credentials

run: |

echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Groupby tests

run: |

bazel test \

--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \

--google_credentials=bazel-cache-key.json \

--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \

//spark:groupby_test

analyzer_tests:

runs-on: ubuntu-8_cores-32_gb

container:

image: ghcr.io/${{ github.repository }}-ci:latest

credentials:

username: ${{ github.actor }}

password: ${{ secrets.GITHUB_TOKEN }}

defaults:

run:

working-directory: ${{ github.workspace }}

steps:

- uses: actions/checkout@v4

- name: Setup Bazel cache credentials

run: |

echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Analyzer tests

run: |

bazel test \

--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \

--google_credentials=bazel-cache-key.json \

--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \

//spark:analyzer_test

🧰 Tools

🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

[error] 150-150: no new line character at the end of file

(new-line-at-end-of-file)

coderabbitai · 2025-02-14T06:09:21Z

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala

+    def fetchOnce = {
+      var latencySum: Long = 0
+      var latencyCount = 0
+      val blockStart = System.currentTimeMillis()
+      val result = requests.iterator
+        .grouped(chunkSize)
+        .map { oldReqs =>
+          // deliberately mis-type a few keys
+          val r = oldReqs
+            .map(r =>
+              r.copy(keys = r.keys.mapValues { v =>
+                if (v.isInstanceOf[java.lang.Long]) v.toString else v
+              }.toMap))
+          val responses = if (useJavaFetcher) {
+            // Converting to java request and using the toScalaRequest functionality to test conversion
+            val convertedJavaRequests = r.map(new JavaRequest(_)).toJava
+            val javaResponse = javaFetcher.fetchJoin(convertedJavaRequests)
+            FutureConverters
+              .toScala(javaResponse)
+              .map(
+                _.toScala.map(jres =>
+                  Response(
+                    Request(jres.request.name, jres.request.keys.toScala.toMap, Option(jres.request.atMillis)),
+                    jres.values.toScala.map(_.toScala)
+                  )))
+          } else {
+            fetcher.fetchJoin(r)
+          }
+
+          // fix mis-typed keys in the request
+          val fixedResponses =
+            responses.map(resps => resps.zip(oldReqs).map { case (resp, req) => resp.copy(request = req) })
+          System.currentTimeMillis() -> fixedResponses
+        }
+        .flatMap { case (start, future) =>
+          val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis
+          val latency = System.currentTimeMillis() - start
+          latencySum += latency
+          latencyCount += 1
+          result
+        }
+        .toList
+      val latencyMillis = latencySum.toFloat / latencyCount.toFloat
+      val qps = (requests.length * 1000.0) / (System.currentTimeMillis() - blockStart).toFloat
+      (latencyMillis, qps, result)
+    }


🛠️ Refactor suggestion

Add error handling for failed futures.

The fetchOnce function should handle failed futures gracefully.

def fetchOnce = { + try { var latencySum: Long = 0 // ... existing code ... (latencyMillis, qps, result) + } catch { + case e: Exception => + logger.error(s"Failed to fetch responses: ${e.getMessage}") + throw e + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def fetchOnce = {

var latencySum: Long = 0

var latencyCount = 0

val blockStart = System.currentTimeMillis()

val result = requests.iterator

.grouped(chunkSize)

.map { oldReqs =>

// deliberately mis-type a few keys

val r = oldReqs

.map(r =>

r.copy(keys = r.keys.mapValues { v =>

if (v.isInstanceOf[java.lang.Long]) v.toString else v

}.toMap))

val responses = if (useJavaFetcher) {

// Converting to java request and using the toScalaRequest functionality to test conversion

val convertedJavaRequests = r.map(new JavaRequest(_)).toJava

val javaResponse = javaFetcher.fetchJoin(convertedJavaRequests)

FutureConverters

.toScala(javaResponse)

.map(

_.toScala.map(jres =>

Response(

Request(jres.request.name, jres.request.keys.toScala.toMap, Option(jres.request.atMillis)),

jres.values.toScala.map(_.toScala)

)))

} else {

fetcher.fetchJoin(r)

}

// fix mis-typed keys in the request

val fixedResponses =

responses.map(resps => resps.zip(oldReqs).map { case (resp, req) => resp.copy(request = req) })

System.currentTimeMillis() -> fixedResponses

}

.flatMap { case (start, future) =>

val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis

val latency = System.currentTimeMillis() - start

latencySum += latency

latencyCount += 1

result

}

.toList

val latencyMillis = latencySum.toFloat / latencyCount.toFloat

val qps = (requests.length * 1000.0) / (System.currentTimeMillis() - blockStart).toFloat

(latencyMillis, qps, result)

}

def fetchOnce = {

try {

var latencySum: Long = 0

var latencyCount = 0

val blockStart = System.currentTimeMillis()

val result = requests.iterator

.grouped(chunkSize)

.map { oldReqs =>

// deliberately mis-type a few keys

val r = oldReqs

.map(r =>

r.copy(keys = r.keys.mapValues { v =>

if (v.isInstanceOf[java.lang.Long]) v.toString else v

}.toMap))

val responses = if (useJavaFetcher) {

// Converting to java request and using the toScalaRequest functionality to test conversion

val convertedJavaRequests = r.map(new JavaRequest(_)).toJava

val javaResponse = javaFetcher.fetchJoin(convertedJavaRequests)

FutureConverters

.toScala(javaResponse)

.map(

_.toScala.map(jres =>

Response(

Request(jres.request.name, jres.request.keys.toScala.toMap, Option(jres.request.atMillis)),

jres.values.toScala.map(_.toScala)

)))

} else {

fetcher.fetchJoin(r)

}

// fix mis-typed keys in the request

val fixedResponses =

responses.map(resps => resps.zip(oldReqs).map { case (resp, req) => resp.copy(request = req) })

System.currentTimeMillis() -> fixedResponses

}

.flatMap { case (start, future) =>

val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis

val latency = System.currentTimeMillis() - start

latencySum += latency

latencyCount += 1

result

}

.toList

val latencyMillis = latencySum.toFloat / latencyCount.toFloat

val qps = (requests.length * 1000.0) / (System.currentTimeMillis() - blockStart).toFloat

(latencyMillis, qps, result)

} catch {

case e: Exception =>

logger.error(s"Failed to fetch responses: ${e.getMessage}")

throw e

}

}

.github/workflows/test_scala_spark.yaml

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 7803143 and 4453931.

📒 Files selected for processing (6)

.github/workflows/test_scala_spark.yaml (1 hunks)
spark/BUILD.bazel (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/streaming/AvroTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/streaming/KafkaStreamBuilderTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/streaming/MutationsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/streaming/StreamingTest.scala (3 hunks)

✅ Files skipped from review due to trivial changes (3)

spark/src/test/scala/ai/chronon/spark/test/streaming/KafkaStreamBuilderTest.scala
spark/src/test/scala/ai/chronon/spark/test/streaming/AvroTest.scala
spark/src/test/scala/ai/chronon/spark/test/streaming/MutationsTest.scala

🧰 Additional context used

🪛 actionlint (1.7.4)

.github/workflows/test_scala_spark.yaml

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

.github/workflows/test_scala_spark.yaml

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: streamimg_tests
GitHub Check: join_tests
GitHub Check: groupby_tests
GitHub Check: fetcher_tests
GitHub Check: analyzer_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: spark_tests
GitHub Check: enforce_triggered_workflows
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (6)

spark/src/test/scala/ai/chronon/spark/test/streaming/StreamingTest.scala (2)

17-17: LGTM! Package and import organization improved.

Package structure and imports are better organized.

Also applies to: 22-22, 25-25

54-54: LGTM! Method references properly encapsulated.

Proper object references improve code organization.

Also applies to: 113-113

spark/BUILD.bazel (2)

98-106: LGTM! Test suites properly organized.

Test files are correctly categorized in their respective directories.

109-157: LGTM! New test suites properly configured.

Test suites are well-structured for parallel execution.
.github/workflows/test_scala_spark.yaml (2)

23-23: Verify custom runner configuration.

The runner label ubuntu-8_cores-32_gb is not a standard GitHub-hosted runner. Ensure it's properly configured in your self-hosted runners.

Also applies to: 49-49, 75-75, 101-101, 127-127, 153-153

🧰 Tools

🪛 actionlint (1.7.4)

23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

48-176: Consider using reusable workflows.

The jobs have identical configurations except for the test target. Consider using a reusable workflow to reduce duplication.

Also, add a newline at the end of file.
            //spark:stream_test
+
🧰 Tools

🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

coderabbitai · 2025-02-14T06:23:46Z

.github/workflows/test_scala_spark.yaml

+        run: |
+          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
+
+      - name: Run Analyzer tests


⚠️ Potential issue

Fix incorrect step name.

Change "Run Analyzer tests" to "Run Streaming tests".

- - name: Run Analyzer tests + - name: Run Streaming tests

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Run Analyzer tests

- name: Run Streaming tests

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

.github/workflows/test_scala_spark.yaml (1)
48-176: 🛠️ Refactor suggestion

Refactor duplicate job configurations using reusable workflow.

Create a reusable workflow to eliminate duplication:
# .github/workflows/spark-test-job.yaml
+name: Spark Test Job
+
+on:
+  workflow_call:
+    inputs:
+      test_target:
+        required: true
+        type: string
+      memory_opts:
+        required: true
+        type: string
+      step_name:
+        required: true
+        type: string
+
+jobs:
+  run_test:
+    runs-on: ubuntu-8_cores-32_gb
+    container:
+      image: ghcr.io/${{ github.repository }}-ci:latest
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+    defaults:
+      run:
+        working-directory: ${{ github.workspace }}
+    steps:
+      - uses: actions/checkout@v4
+      - name: Setup Bazel cache credentials
+        run: |
+          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
+      - name: ${{ inputs.step_name }}
+        run: |
+          bazel test \
+            --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
+            --google_credentials=bazel-cache-key.json \
+            --test_env=JAVA_OPTS="${{ inputs.memory_opts }}" \
+            ${{ inputs.test_target }}
Then use it in the main workflow:
  fetcher_tests:
-    runs-on: ubuntu-8_cores-32_gb
-    container:
-      image: ghcr.io/${{ github.repository }}-ci:latest
-      # ... rest of duplicate config
+    uses: ./.github/workflows/spark-test-job.yaml
+    with:
+      test_target: //spark:fetcher_test
+      memory_opts: "-Xmx16G -Xms8G"
+      step_name: "Run Fetcher tests"
🧰 Tools

🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

🧹 Nitpick comments (1)

.github/workflows/test_scala_spark.yaml (1)

45-45: Document memory allocation differences.

Consider adding a comment explaining why new jobs need more memory (-Xmx16G) compared to spark_tests (-Xmx8G).

Also applies to: 71-71, 97-97, 123-123, 149-149, 175-175

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4453931 and 1685bed.

📒 Files selected for processing (1)

.github/workflows/test_scala_spark.yaml (1 hunks)

🧰 Additional context used

🪛 actionlint (1.7.4)

.github/workflows/test_scala_spark.yaml

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

.github/workflows/test_scala_spark.yaml

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: streamimg_tests
GitHub Check: spark_tests
GitHub Check: groupby_tests
GitHub Check: join_tests
GitHub Check: fetcher_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: enforce_triggered_workflows
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (2)

.github/workflows/test_scala_spark.yaml (2)

176-176: Add newline at end of file.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

23-23:
❓ Verification inconclusive

Verify custom runner label configuration.

The runner label ubuntu-8_cores-32_gb is not a standard GitHub-hosted runner.

Also applies to: 49-49, 75-75, 101-101, 127-127, 153-153

🏁 Script executed:
#!/bin/bash
# Check if self-hosted runner is configured
gh api -X GET repos/{owner}/{repo}/actions/runners | jq '.runners[] | select(.labels[].name == "ubuntu-8_cores-32_gb")'
Length of output: 250
Action: Verify Custom Runner Label Configuration

The custom runner label ubuntu-8_cores-32_gb isn’t a standard GitHub-hosted runner label. The automated check failed (repository not detected), so please manually verify that a self-hosted runner with this label is configured. This applies to the workflow file at:

.github/workflows/test_scala_spark.yaml (lines 23, 49, 75, 101, 127, 153)

🧰 Tools

🪛 actionlint (1.7.4)

23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

.github/workflows/test_scala_spark.yaml (1)
176-176: ⚠️ Potential issue

Add newline at end of file.

Add a newline character at the end of the file to comply with YAML standards.
            //spark:streaming_test
+
🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

🧹 Nitpick comments (4)

spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (4)
42-76: Consider parameterizing test data.

The hardcoded test data and SQL query could be moved to test fixtures for better maintainability.

92-114: Consider breaking down complex SQL query.

The SQL query is complex and hard to maintain. Consider breaking it into CTEs or helper methods.

246-248: Add specific error messages to assertions.

The intercept blocks should verify specific error messages to ensure correct error handling.
-    intercept[AssertionError] {
+    val error = intercept[AssertionError] {
       new LabelJoin(invalidJoinConf, tableUtils, labelDS).computeLabelJoin()
     }
+    assert(error.getMessage.contains("expected error message"))
Also applies to: 281-283, 316-318

434-455: Add ScalaDoc to helper methods.

Document parameters and return values for better code maintainability.
+  /**
+   * Creates a test label join configuration.
+   * @param startOffset The start offset for the join window
+   * @param endOffset The end offset for the join window
+   * @param groupBys The group by configurations to join
+   * @return The label join configuration
+   */
   def createTestLabelJoin(startOffset: Int,
                          endOffset: Int,
                          groupBys: Seq[ai.chronon.api.GroupBy]): ai.chronon.api.LabelParts = {

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 0e449db and 73fc6e0.

📒 Files selected for processing (18)

.github/workflows/test_scala_spark.yaml (1 hunks)
spark/BUILD.bazel (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/fetcher/ChainingFetcherTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/fetcher/JavaFetcherTest.java (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/FeatureWithLabelJoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/streaming/AvroTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/streaming/KafkaStreamBuilderTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/streaming/MutationsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/streaming/StreamingTest.scala (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (15)

spark/src/test/scala/ai/chronon/spark/test/fetcher/JavaFetcherTest.java
spark/src/test/scala/ai/chronon/spark/test/streaming/KafkaStreamBuilderTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/FeatureWithLabelJoinTest.scala
spark/src/test/scala/ai/chronon/spark/test/streaming/StreamingTest.scala
spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala
spark/src/test/scala/ai/chronon/spark/test/streaming/AvroTest.scala
spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala
spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationTest.scala
spark/src/test/scala/ai/chronon/spark/test/streaming/MutationsTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala
spark/src/test/scala/ai/chronon/spark/test/fetcher/ChainingFetcherTest.scala
spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala
spark/BUILD.bazel

🧰 Additional context used

🪛 actionlint (1.7.4)

.github/workflows/test_scala_spark.yaml

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

.github/workflows/test_scala_spark.yaml

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: streamimg_tests
GitHub Check: spark_tests
GitHub Check: join_tests
GitHub Check: groupby_tests
GitHub Check: fetcher_tests
GitHub Check: analyzer_tests
GitHub Check: scala_compile_fmt_fix

🔇 Additional comments (5)

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala (3)

17-41: LGTM! Package move and import consolidation.

The package move and import consolidation improve code organization and readability.

68-68: LGTM! Improved resource handling.

Using Source.fromResource is safer for accessing resources in both IDE and packaged environments.

715-766: LGTM! Comprehensive test coverage.

Test cases cover various scenarios and are well-structured.

spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (1)

17-22: LGTM! Package organization and import consolidation.

The package move to test.join and import consolidation improves code organization.

.github/workflows/test_scala_spark.yaml (1)

23-23: Verify custom runner label configuration.

The runner label "ubuntu-8_cores-32_gb" is not a standard GitHub-hosted runner. Ensure it's properly configured in self-hosted runners.

Also applies to: 49-49, 75-75, 101-101, 127-127, 153-153

🧰 Tools

🪛 actionlint (1.7.4)

23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

coderabbitai · 2025-02-14T07:28:19Z

.github/workflows/test_scala_spark.yaml

+      image: ghcr.io/${{ github.repository }}-ci:latest
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+    defaults:
+      run:
+        working-directory: ${{ github.workspace }}
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Bazel cache credentials
+        run: |
+          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
+
+      - name: Run Join tests
+        run: |
+          bazel test \
+            --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
+            --google_credentials=bazel-cache-key.json \
+            --test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
+            //spark:join_test
+
+  groupby_tests:
+    runs-on: ubuntu-8_cores-32_gb
+    container:
+      image: ghcr.io/${{ github.repository }}-ci:latest
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+    defaults:
+      run:
+        working-directory: ${{ github.workspace }}
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Bazel cache credentials
+        run: |
+          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
+
+      - name: Run GroupBy tests
+        run: |
+          bazel test \
+            --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
+            --google_credentials=bazel-cache-key.json \
+            --test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
+            //spark:groupby_test
+
+  analyzer_tests:
+    runs-on: ubuntu-8_cores-32_gb
+    container:
+      image: ghcr.io/${{ github.repository }}-ci:latest
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+    defaults:
+      run:
+        working-directory: ${{ github.workspace }}
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Bazel cache credentials
+        run: |
+          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
+
+      - name: Run Analyzer tests
+        run: |
+          bazel test \
+            --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
+            --google_credentials=bazel-cache-key.json \
+            --test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
+            //spark:analyzer_test
+
+  streamimg_tests:
+    runs-on: ubuntu-8_cores-32_gb
+    container:
+      image: ghcr.io/${{ github.repository }}-ci:latest
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+    defaults:
+      run:
+        working-directory: ${{ github.workspace }}
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Bazel cache credentials
+        run: |
+          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
+
+      - name: Run Streaming tests
+        run: |
+          bazel test \
+            --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
+            --google_credentials=bazel-cache-key.json \
+            --test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
+            //spark:streaming_test


🛠️ Refactor suggestion

Refactor duplicate job configurations using a reusable workflow.

Extract common configuration into a reusable workflow to reduce duplication.

Create .github/workflows/spark-test-job.yaml:

name: Spark Test Job on: workflow_call: inputs: test_target: required: true type: string memory_opts: required: true type: string step_name: required: true type: string jobs: test: runs-on: ubuntu-8_cores-32_gb container: image: ghcr.io/${{ github.repository }}-ci:latest credentials: username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} defaults: run: working-directory: ${{ github.workspace }} steps: - uses: actions/checkout@v4 - name: Setup Bazel cache credentials run: | echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json - name: ${{ inputs.step_name }} run: | bazel test \ --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \ --google_credentials=bazel-cache-key.json \ --test_env=JAVA_OPTS="${{ inputs.memory_opts }}" \ ${{ inputs.test_target }}

Then simplify the main workflow:

jobs: spark_tests: uses: ./.github/workflows/spark-test-job.yaml with: test_target: //spark:tests memory_opts: "-Xmx8G -Xms2G" step_name: "Run Spark tests" fetcher_tests: uses: ./.github/workflows/spark-test-job.yaml with: test_target: //spark:fetcher_test memory_opts: "-Xmx16G -Xms8G" step_name: "Run Fetcher tests" # ... repeat for other jobs

🧰 Tools

🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

coderabbitai · 2025-02-14T07:28:19Z

.github/workflows/test_scala_spark.yaml

+            --test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
+            //spark:analyzer_test
+
+  streamimg_tests:


⚠️ Potential issue

Fix typo in job name.

"streamimg_tests" should be "streaming_tests".

- streamimg_tests: + streaming_tests:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

streamimg_tests:

streaming_tests:

Co-authored-by: Thomas Chow <[email protected]>

## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced dedicated testing workflows covering multiple system components to enhance overall reliability. - Added new test suites for various components to enhance testing granularity. - **Refactor** - Streamlined code organization with improved package structures and consolidated imports across test modules. - **Chores** - Upgraded automated testing configurations with optimized resource settings for improved performance and stability.   --------- Co-authored-by: Thomas Chow <[email protected]>

## Summary ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced dedicated testing workflows covering multiple system components to enhance overall reliability. - Added new test suites for various components to enhance testing granularity. - **Refactor** - Streamlined code organization with improved paour clientsage structures and consolidated imports across test modules. - **Chores** - Upgraded automated testing configurations with optimized resource settings for improved performance and stability.   --------- Co-authored-by: Thomas Chow <[email protected]>

Base automatically changed from tchow/clean-up-spark-tests to main February 14, 2025 05:24

tchow-zlai force-pushed the tchow/split-up-spark-tests branch from 0a2f504 to f9ebf17 Compare February 14, 2025 05:27

coderabbitai bot reviewed Feb 14, 2025

View reviewed changes

tchow-zlai changed the title ~~wip~~ chore: split out expensive spark tests to parallelize Feb 14, 2025

tchow-zlai requested review from nikhil-zlai, varant-zlai, piyush-zlai, chewy-zlai and david-zlai February 14, 2025 05:49

coderabbitai bot reviewed Feb 14, 2025

View reviewed changes

nikhil-zlai approved these changes Feb 14, 2025

View reviewed changes

tchow-zlai and others added 14 commits February 13, 2025 23:25

wip

bd866a6

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

wip

e84fb9a

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

break out more modules

d8d809f

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

relocate

5380ed0

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

typo

de3fbd4

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

refactor

8e1a5c7

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

typo

dd668d6

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

update

63d00c2

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

analyzer

d338913

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

upadte

c969745

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

update

b9dcc12

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

move another

cb77e8e

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

large

b896e03

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

large tests

73fc6e0

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

tchow-zlai force-pushed the tchow/split-up-spark-tests branch from 0e449db to 73fc6e0 Compare February 14, 2025 07:25

coderabbitai bot reviewed Feb 14, 2025

View reviewed changes

typo

59bc5dc

Co-authored-by: Thomas Chow <[email protected]>

tchow-zlai merged commit 277f040 into main Feb 14, 2025
10 of 13 checks passed

tchow-zlai deleted the tchow/split-up-spark-tests branch February 14, 2025 08:08

This was referenced Feb 21, 2025

fix: Fix approx_histogram_k aggregations #411

Merged

chore: moving join codec from fetcher into metadata store #401

Merged

refactor: split fetcher logic into multiple files #425

Merged

coderabbitai bot mentioned this pull request Apr 5, 2025

chore: move modular architecture pieces into dedicated module #590

Merged

4 tasks

	val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis
	val result = Await.result(future, Duration(10, SECONDS))

chore: split out expensive spark tests to parallelize #382

chore: split out expensive spark tests to parallelize #382

Uh oh!

Conversation

tchow-zlai commented Feb 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tchow-zlai commented Feb 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 14, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)