Skip to content

chore: split out expensive spark tests to parallelize #382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Feb 14, 2025

Conversation

tchow-zlai
Copy link
Collaborator

@tchow-zlai tchow-zlai commented Feb 14, 2025

Summary

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • New Features
    • Introduced dedicated testing workflows covering multiple system components to enhance overall reliability.
    • Added new test suites for various components to enhance testing granularity.
  • Refactor
    • Streamlined code organization with improved package structures and consolidated imports across test modules.
  • Chores
    • Upgraded automated testing configurations with optimized resource settings for improved performance and stability.

Copy link

coderabbitai bot commented Feb 14, 2025

Walkthrough

This pull request adds new GitHub Actions jobs and several granular Scala test suites for Spark components. It also reorganizes package declarations and consolidates import statements across test files. The Bazel build file is updated to reflect these new test suites and modified visibility settings. Additionally, a new utility object for fetching tests has been introduced, and method references in streaming tests are adjusted.

Changes

File(s) Change Summary
.github/workflows/test_scala_spark.yaml Added new CI jobs (fetcher_tests, join_tests, groupby_tests, analyzer_tests, streaming_tests) with updated memory allocation settings.
spark/BUILD.bazel Updated Scala test suites; removed test_lib visibility; added new test suites for fetcher, groupby, join, analyzer, and streaming tests.
spark/src/test/scala/ai/chronon/spark/test/fetcher/{ChainingFetcherTest.scala, FetcherTest.scala, FetcherTestUtil.scala, JavaFetcherTest.java} Reorganized package declarations, consolidated imports; added FetcherTestUtil with joinResponses; updated resource handling.
spark/src/test/scala/ai/chronon/spark/test/groupby/{GroupByTest.scala, GroupByUploadTest.scala} Updated package names and consolidated imports for groupby functionality tests.
spark/src/test/scala/ai/chronon/spark/test/join/{FeatureWithLabelJoinTest.scala, JoinTest.scala, JoinUtilsTest.scala, LabelJoinTest.scala} Reorganized packages, updated imports, and added TestUtils; enhanced label join test cases.
spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala Updated package declaration and consolidated imports in the analyzer test.
spark/src/test/scala/ai/chronon/spark/test/streaming/{AvroTest.scala, KafkaStreamBuilderTest.scala, MutationsTest.scala, StreamingTest.scala} Reorganized package declarations; consolidated imports; updated method references for streaming tests.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as GitHub Runner
    participant Workflow as GitHub Actions
    participant Bazel as Bazel Test Runner

    Runner->>Workflow: Trigger workflow
    Workflow->>Bazel: Run spark_tests job
    par Parallel Jobs
        Workflow->>Bazel: Run fetcher_tests
        Workflow->>Bazel: Run join_tests
        Workflow->>Bazel: Run groupby_tests
        Workflow->>Bazel: Run analyzer_tests
        Workflow->>Bazel: Run streaming_tests
    end
    Bazel-->>Workflow: Return test results
    Workflow-->>Runner: Publish results
Loading

Possibly related PRs

Suggested reviewers

  • nikhil-zlai
  • piyush-zlai
  • varant-zlai

Poem

In the realm of code so bright,
New tests align to make it right.
Imports refined and jobs in flow,
Packages now in tidy row.
Spark lights up with joyful might 🚀.

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 73fc6e0 and 59bc5dc.

📒 Files selected for processing (1)
  • .github/workflows/test_scala_spark.yaml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/test_scala_spark.yaml

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)
.github/workflows/test_scala_spark.yaml

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: streaming_tests
  • GitHub Check: join_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: spark_tests
  • GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (2)
.github/workflows/test_scala_spark.yaml (2)

23-23: Verify custom runner label configuration.

The runner label ubuntu-8_cores-32_gb is not a standard GitHub-hosted runner. Ensure it's properly configured in actionlint.yaml for self-hosted runners.

Also applies to: 49-49, 75-75, 101-101, 127-127, 153-153

🧰 Tools
🪛 actionlint (1.7.4)

23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


176-176: Add newline at end of file.

Add a newline character at the end of the file to comply with YAML standards.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Base automatically changed from tchow/clean-up-spark-tests to main February 14, 2025 05:24
@tchow-zlai tchow-zlai force-pushed the tchow/split-up-spark-tests branch from 0a2f504 to f9ebf17 Compare February 14, 2025 05:27
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
.github/workflows/test_scala_spark.yaml (1)

23-23: 💡 Verification agent

🧩 Analysis chain

Verify runner label.

The label "ubuntu-8_cores-32_gb" is not a standard GitHub-hosted runner label.

Also applies to: 49-49


🏁 Script executed:

#!/bin/bash
# Check if this is a self-hosted runner
gh api repos/{owner}/{repo}/actions/runners -q '.runners[].labels[].name' | grep 'ubuntu-8_cores-32_gb' || echo "Label not found"

Length of output: 200


Action Required: Address Workflow Runner and Duplication Issues

  • Runner Label: The label "ubuntu-8_cores-32_gb" isn’t found via the API, suggesting it’s not a recognized GitHub-hosted runner. Confirm if this custom self-hosted runner is intended or update to a valid label.
  • Memory Allocation: The increased memory settings in the new job (i.e., -Xmx16G -Xms8G) differ noticeably from the existing job. Ensure these settings are deliberate.
  • Code Duplication: The new fetcher_tests job duplicates much of the existing setup. Consider refactoring to share common configuration.
  • Missing Newline: Add a newline at the end of the file.
🧰 Tools
🪛 actionlint (1.7.4)

23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🧹 Nitpick comments (7)
spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (3)

19-19: Consider explicit imports over wildcard.

Wildcard imports can make it harder to track dependencies and may lead to naming conflicts.

-import ai.chronon.api._
+import ai.chronon.api.{Accuracy, Builders, Constants, Operation, TimeUnit, Window}

182-227: Add test data cleanup.

Missing cleanup after test execution could affect subsequent test runs.

Add cleanup in a finally block or use a test fixture:

+  override def afterEach(): Unit = {
+    super.afterEach()
+    tableUtils.dropTable(s"${namespace}.${tableName}")
+  }

434-455: Add ScalaDoc for helper methods.

Document parameters and return values for better maintainability.

+  /**
+   * Creates a test label join configuration.
+   * @param startOffset The left start offset
+   * @param endOffset The left end offset
+   * @param groupBys The group by configurations
+   * @return The label parts configuration
+   */
   def createTestLabelJoin(startOffset: Int,
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (2)

28-29: Convert magic number to named constant.

The chunkSize value of 100 should be defined as a named constant for better maintainability.

+  private val DefaultChunkSize = 100
-    val chunkSize = 100
+    val chunkSize = DefaultChunkSize

79-81: Improve comment clarity.

The comment about overwhelming the profiler needs more context about its purpose.

-    // to overwhelm the profiler with fetching code path
-    // so as to make it prominent in the flamegraph & collect enough stats
+    // Run multiple iterations to generate sufficient profiling data
+    // This ensures the fetching code path is prominent in the flamegraph
+    // and provides statistically significant performance metrics
.github/workflows/test_scala_spark.yaml (2)

48-72: Consider parameterizing job config.

Duplicate setup could be refactored into reusable workflow.

 jobs:
+  .test_base: &test_base
+    runs-on: ubuntu-8_cores-32_gb
+    container:
+      image: ghcr.io/${{ github.repository }}-ci:latest
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+    defaults:
+      run:
+        working-directory: ${{ github.workspace }}
+    steps:
+      - uses: actions/checkout@v4
+      - name: Setup Bazel cache credentials
+        run: |
+          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

   spark_tests:
-    runs-on: ubuntu-8_cores-32_gb
-    container: ...
+    <<: *test_base
     steps:
-      - uses: actions/checkout@v4
-      - name: Setup Bazel cache credentials ...
       - name: Run Spark tests
         run: |
           bazel test \
             --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
             --google_credentials=bazel-cache-key.json \
             --test_env=JAVA_OPTS="-Xmx8G -Xms2G" \
             //spark:tests

   fetcher_tests:
-    runs-on: ubuntu-8_cores-32_gb
-    container: ...
+    <<: *test_base
     steps:
-      - uses: actions/checkout@v4
-      - name: Setup Bazel cache credentials ...
       - name: Run Spark tests
         run: |
           bazel test \
             --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
             --google_credentials=bazel-cache-key.json \
             --test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
             //spark:fetcher_test
🧰 Tools
🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

[error] 72-72: no new line character at the end of file

(new-line-at-end-of-file)


72-72: Add newline at EOF.

Add missing newline character.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 72-72: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between f3e4971 and c07dbbd.

📒 Files selected for processing (14)
  • .github/workflows/test_scala_spark.yaml (1 hunks)
  • spark/BUILD.bazel (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/ChainingFetcherTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/JavaFetcherTest.java (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/FeatureWithLabelJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/MutationsTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/StreamingTest.scala (3 hunks)
✅ Files skipped from review due to trivial changes (8)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/JavaFetcherTest.java
  • spark/src/test/scala/ai/chronon/spark/test/join/FeatureWithLabelJoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/MutationsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/ChainingFetcherTest.scala
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/test_scala_spark.yaml

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)
.github/workflows/test_scala_spark.yaml

[error] 72-72: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: fetcher_tests
  • GitHub Check: spark_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (9)
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (2)

1-19: LGTM! Well-organized imports and logger declaration.


20-27: LGTM! Clear method signature with well-defined parameters.

spark/src/test/scala/ai/chronon/spark/test/join/StreamingTest.scala (2)

17-25: LGTM! Good package organization.

Package and import restructuring improves code organization.


54-54: LGTM! Improved method call clarity.

Using qualified name StreamingTest.buildInMemoryKvStore makes the method origin explicit.

Also applies to: 113-113

spark/BUILD.bazel (2)

96-203: LGTM! Well-structured test suite organization.

The new test suites are properly organized by functionality with consistent configuration.


89-94:

❓ Verification inconclusive

Verify impact of removed visibility.

The removal of visibility from test_lib might affect other packages that depend on it.


🏁 Script executed:

#!/bin/bash
# Search for references to the test_lib target
rg -l "//spark:test_lib" | grep BUILD

Length of output: 37


Re-check impact of removed visibility on test_lib

Automated search for references to //spark:test_lib returned no results. However, please manually verify that no other packages depend on this library and need explicit visibility settings.

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala (2)

17-41: LGTM! Improved code organization.

Package relocation and import consolidation enhance code structure.


68-68: LGTM! Better resource loading.

Using Source.fromResource is more reliable than Source.fromFile for test resources.

.github/workflows/test_scala_spark.yaml (1)

71-71:

❓ Verification inconclusive

Verify memory requirements.

Fetcher tests require 2x more memory. Ensure this is necessary.


🏁 Script executed:

#!/bin/bash
# Check memory usage in fetcher tests
rg -A 5 "spark.driver.memory|spark.executor.memory" spark/src/test/scala/ai/chronon/spark/test/fetcher/

Length of output: 105


Please verify memory settings manually.
The regex search in fetcher tests yielded no memory configuration entries, so it's unclear if fetcher tests truly need 2× the memory. Confirm that --test_env=JAVA_OPTS="-Xmx16G -Xms8G" is correct or adjust as needed.

@tchow-zlai tchow-zlai changed the title wip chore: split out expensive spark tests to parallelize Feb 14, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (5)
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (3)

7-7: Fix wildcard import.

Replace import ai.chronon.spark.{Join => _, _} with specific imports to improve code clarity.


82-83: Initialize variables with empty values.

Initialize loggedValues and result with empty values instead of null.

-var loggedValues: Seq[LoggableResponseBase64] = null
-var result: List[Response] = null
+var loggedValues: Seq[LoggableResponseBase64] = Seq.empty
+var result: List[Response] = List.empty

26-26: Make chunkSize configurable.

Consider making chunkSize a parameter for better flexibility.

.github/workflows/test_scala_spark.yaml (2)

48-150: Reduce job configuration duplication.

Consider using job templates or reusable workflows to avoid duplicating configuration.

🧰 Tools
🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

[error] 150-150: no new line character at the end of file

(new-line-at-end-of-file)


150-150: Add newline at end of file.

Add a newline character at the end of the file to comply with YAML standards.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 150-150: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between c07dbbd and 7803143.

📒 Files selected for processing (4)
  • .github/workflows/test_scala_spark.yaml (1 hunks)
  • spark/BUILD.bazel (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala
🚧 Files skipped from review as they are similar to previous changes (1)
  • spark/BUILD.bazel
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/test_scala_spark.yaml

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)
.github/workflows/test_scala_spark.yaml

[error] 150-150: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: spark_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (1)
.github/workflows/test_scala_spark.yaml (1)

49-49: Verify custom runner label.

The label "ubuntu-8_cores-32_gb" is not a standard GitHub-hosted runner label. If this is a self-hosted runner, update actionlint.yaml config.

Also applies to: 75-75, 101-101, 127-127

🧰 Tools
🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

System.currentTimeMillis() -> fixedResponses
}
.flatMap { case (start, future) =>
val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix hardcoded timeout value.

The timeout duration is set to 10000 seconds with a TODO comment. Consider making this configurable.

-val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis
+val result = Await.result(future, Duration(10, SECONDS))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis
val result = Await.result(future, Duration(10, SECONDS))

⚠️ Potential issue

Fix timeout duration unit.

Change SECONDS to MILLISECONDS as indicated in the TODO comment.

-val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis
+val result = Await.result(future, Duration(10000, MILLISECONDS))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis
val result = Await.result(future, Duration(10000, MILLISECONDS))

Comment on lines 48 to 150
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}

steps:
- uses: actions/checkout@v4

- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Fetcher tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:fetcher_test

join_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}

steps:
- uses: actions/checkout@v4

- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Join tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:join_test

groupby_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}

steps:
- uses: actions/checkout@v4

- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Join tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:groupby_test

analyzer_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}

steps:
- uses: actions/checkout@v4

- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Join tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:analyzer_test
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Refactor duplicate job configurations.

The new jobs have identical configurations except for the test target. Consider using a reusable workflow or job template.

Fix incorrect step names.

The step names for groupby_tests and analyzer_tests incorrectly say "Run Join tests".

-      - name: Run Join tests
+      - name: Run Groupby tests
-      - name: Run Join tests
+      - name: Run Analyzer tests

Add newline at end of file.

Add a newline character at the end of the file to comply with YAML standards.

            //spark:analyzer_test
+

Verify custom runner label.

The runner label ubuntu-8_cores-32_gb is not a standard GitHub-hosted runner. Ensure it's properly configured for self-hosted runners.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fetcher_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}
steps:
- uses: actions/checkout@v4
- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
- name: Run Fetcher tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:fetcher_test
join_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}
steps:
- uses: actions/checkout@v4
- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
- name: Run Join tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:join_test
groupby_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}
steps:
- uses: actions/checkout@v4
- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
- name: Run Join tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:groupby_test
analyzer_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}
steps:
- uses: actions/checkout@v4
- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
- name: Run Join tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:analyzer_test
fetcher_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}
steps:
- uses: actions/checkout@v4
- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
- name: Run Fetcher tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:fetcher_test
join_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}
steps:
- uses: actions/checkout@v4
- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
- name: Run Join tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:join_test
groupby_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}
steps:
- uses: actions/checkout@v4
- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
- name: Run Groupby tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:groupby_test
analyzer_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}
steps:
- uses: actions/checkout@v4
- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
- name: Run Analyzer tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:analyzer_test
🧰 Tools
🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

[error] 150-150: no new line character at the end of file

(new-line-at-end-of-file)

Comment on lines +30 to +75
def fetchOnce = {
var latencySum: Long = 0
var latencyCount = 0
val blockStart = System.currentTimeMillis()
val result = requests.iterator
.grouped(chunkSize)
.map { oldReqs =>
// deliberately mis-type a few keys
val r = oldReqs
.map(r =>
r.copy(keys = r.keys.mapValues { v =>
if (v.isInstanceOf[java.lang.Long]) v.toString else v
}.toMap))
val responses = if (useJavaFetcher) {
// Converting to java request and using the toScalaRequest functionality to test conversion
val convertedJavaRequests = r.map(new JavaRequest(_)).toJava
val javaResponse = javaFetcher.fetchJoin(convertedJavaRequests)
FutureConverters
.toScala(javaResponse)
.map(
_.toScala.map(jres =>
Response(
Request(jres.request.name, jres.request.keys.toScala.toMap, Option(jres.request.atMillis)),
jres.values.toScala.map(_.toScala)
)))
} else {
fetcher.fetchJoin(r)
}

// fix mis-typed keys in the request
val fixedResponses =
responses.map(resps => resps.zip(oldReqs).map { case (resp, req) => resp.copy(request = req) })
System.currentTimeMillis() -> fixedResponses
}
.flatMap { case (start, future) =>
val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis
val latency = System.currentTimeMillis() - start
latencySum += latency
latencyCount += 1
result
}
.toList
val latencyMillis = latencySum.toFloat / latencyCount.toFloat
val qps = (requests.length * 1000.0) / (System.currentTimeMillis() - blockStart).toFloat
(latencyMillis, qps, result)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for failed futures.

The fetchOnce function should handle failed futures gracefully.

 def fetchOnce = {
+  try {
     var latencySum: Long = 0
     // ... existing code ...
     (latencyMillis, qps, result)
+  } catch {
+    case e: Exception =>
+      logger.error(s"Failed to fetch responses: ${e.getMessage}")
+      throw e
+  }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def fetchOnce = {
var latencySum: Long = 0
var latencyCount = 0
val blockStart = System.currentTimeMillis()
val result = requests.iterator
.grouped(chunkSize)
.map { oldReqs =>
// deliberately mis-type a few keys
val r = oldReqs
.map(r =>
r.copy(keys = r.keys.mapValues { v =>
if (v.isInstanceOf[java.lang.Long]) v.toString else v
}.toMap))
val responses = if (useJavaFetcher) {
// Converting to java request and using the toScalaRequest functionality to test conversion
val convertedJavaRequests = r.map(new JavaRequest(_)).toJava
val javaResponse = javaFetcher.fetchJoin(convertedJavaRequests)
FutureConverters
.toScala(javaResponse)
.map(
_.toScala.map(jres =>
Response(
Request(jres.request.name, jres.request.keys.toScala.toMap, Option(jres.request.atMillis)),
jres.values.toScala.map(_.toScala)
)))
} else {
fetcher.fetchJoin(r)
}
// fix mis-typed keys in the request
val fixedResponses =
responses.map(resps => resps.zip(oldReqs).map { case (resp, req) => resp.copy(request = req) })
System.currentTimeMillis() -> fixedResponses
}
.flatMap { case (start, future) =>
val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis
val latency = System.currentTimeMillis() - start
latencySum += latency
latencyCount += 1
result
}
.toList
val latencyMillis = latencySum.toFloat / latencyCount.toFloat
val qps = (requests.length * 1000.0) / (System.currentTimeMillis() - blockStart).toFloat
(latencyMillis, qps, result)
}
def fetchOnce = {
try {
var latencySum: Long = 0
var latencyCount = 0
val blockStart = System.currentTimeMillis()
val result = requests.iterator
.grouped(chunkSize)
.map { oldReqs =>
// deliberately mis-type a few keys
val r = oldReqs
.map(r =>
r.copy(keys = r.keys.mapValues { v =>
if (v.isInstanceOf[java.lang.Long]) v.toString else v
}.toMap))
val responses = if (useJavaFetcher) {
// Converting to java request and using the toScalaRequest functionality to test conversion
val convertedJavaRequests = r.map(new JavaRequest(_)).toJava
val javaResponse = javaFetcher.fetchJoin(convertedJavaRequests)
FutureConverters
.toScala(javaResponse)
.map(
_.toScala.map(jres =>
Response(
Request(jres.request.name, jres.request.keys.toScala.toMap, Option(jres.request.atMillis)),
jres.values.toScala.map(_.toScala)
)))
} else {
fetcher.fetchJoin(r)
}
// fix mis-typed keys in the request
val fixedResponses =
responses.map(resps => resps.zip(oldReqs).map { case (resp, req) => resp.copy(request = req) })
System.currentTimeMillis() -> fixedResponses
}
.flatMap { case (start, future) =>
val result = Await.result(future, Duration(10000, SECONDS)) // todo: change back to millis
val latency = System.currentTimeMillis() - start
latencySum += latency
latencyCount += 1
result
}
.toList
val latencyMillis = latencySum.toFloat / latencyCount.toFloat
val qps = (requests.length * 1000.0) / (System.currentTimeMillis() - blockStart).toFloat
(latencyMillis, qps, result)
} catch {
case e: Exception =>
logger.error(s"Failed to fetch responses: ${e.getMessage}")
throw e
}
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 7803143 and 4453931.

📒 Files selected for processing (6)
  • .github/workflows/test_scala_spark.yaml (1 hunks)
  • spark/BUILD.bazel (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/streaming/AvroTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/streaming/KafkaStreamBuilderTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/streaming/MutationsTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/streaming/StreamingTest.scala (3 hunks)
✅ Files skipped from review due to trivial changes (3)
  • spark/src/test/scala/ai/chronon/spark/test/streaming/KafkaStreamBuilderTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/streaming/AvroTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/streaming/MutationsTest.scala
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/test_scala_spark.yaml

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)
.github/workflows/test_scala_spark.yaml

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: streamimg_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: spark_tests
  • GitHub Check: enforce_triggered_workflows
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (6)
spark/src/test/scala/ai/chronon/spark/test/streaming/StreamingTest.scala (2)

17-17: LGTM! Package and import organization improved.

Package structure and imports are better organized.

Also applies to: 22-22, 25-25


54-54: LGTM! Method references properly encapsulated.

Proper object references improve code organization.

Also applies to: 113-113

spark/BUILD.bazel (2)

98-106: LGTM! Test suites properly organized.

Test files are correctly categorized in their respective directories.


109-157: LGTM! New test suites properly configured.

Test suites are well-structured for parallel execution.

.github/workflows/test_scala_spark.yaml (2)

23-23: Verify custom runner configuration.

The runner label ubuntu-8_cores-32_gb is not a standard GitHub-hosted runner. Ensure it's properly configured in your self-hosted runners.

Also applies to: 49-49, 75-75, 101-101, 127-127, 153-153

🧰 Tools
🪛 actionlint (1.7.4)

23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


48-176: Consider using reusable workflows.

The jobs have identical configurations except for the test target. Consider using a reusable workflow to reduce duplication.

Also, add a newline at the end of file.

            //spark:stream_test
+
🧰 Tools
🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Analyzer tests
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect step name.

Change "Run Analyzer tests" to "Run Streaming tests".

-      - name: Run Analyzer tests
+      - name: Run Streaming tests
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Run Analyzer tests
- name: Run Streaming tests

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
.github/workflows/test_scala_spark.yaml (1)

48-176: 🛠️ Refactor suggestion

Refactor duplicate job configurations using reusable workflow.

Create a reusable workflow to eliminate duplication:

# .github/workflows/spark-test-job.yaml
+name: Spark Test Job
+
+on:
+  workflow_call:
+    inputs:
+      test_target:
+        required: true
+        type: string
+      memory_opts:
+        required: true
+        type: string
+      step_name:
+        required: true
+        type: string
+
+jobs:
+  run_test:
+    runs-on: ubuntu-8_cores-32_gb
+    container:
+      image: ghcr.io/${{ github.repository }}-ci:latest
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+    defaults:
+      run:
+        working-directory: ${{ github.workspace }}
+    steps:
+      - uses: actions/checkout@v4
+      - name: Setup Bazel cache credentials
+        run: |
+          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
+      - name: ${{ inputs.step_name }}
+        run: |
+          bazel test \
+            --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
+            --google_credentials=bazel-cache-key.json \
+            --test_env=JAVA_OPTS="${{ inputs.memory_opts }}" \
+            ${{ inputs.test_target }}

Then use it in the main workflow:

  fetcher_tests:
-    runs-on: ubuntu-8_cores-32_gb
-    container:
-      image: ghcr.io/${{ github.repository }}-ci:latest
-      # ... rest of duplicate config
+    uses: ./.github/workflows/spark-test-job.yaml
+    with:
+      test_target: //spark:fetcher_test
+      memory_opts: "-Xmx16G -Xms8G"
+      step_name: "Run Fetcher tests"
🧰 Tools
🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

🧹 Nitpick comments (1)
.github/workflows/test_scala_spark.yaml (1)

45-45: Document memory allocation differences.

Consider adding a comment explaining why new jobs need more memory (-Xmx16G) compared to spark_tests (-Xmx8G).

Also applies to: 71-71, 97-97, 123-123, 149-149, 175-175

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4453931 and 1685bed.

📒 Files selected for processing (1)
  • .github/workflows/test_scala_spark.yaml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/test_scala_spark.yaml

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)
.github/workflows/test_scala_spark.yaml

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: streamimg_tests
  • GitHub Check: spark_tests
  • GitHub Check: groupby_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
.github/workflows/test_scala_spark.yaml (2)

176-176: Add newline at end of file.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)


23-23:

❓ Verification inconclusive

Verify custom runner label configuration.

The runner label ubuntu-8_cores-32_gb is not a standard GitHub-hosted runner.

Also applies to: 49-49, 75-75, 101-101, 127-127, 153-153


🏁 Script executed:

#!/bin/bash
# Check if self-hosted runner is configured
gh api -X GET repos/{owner}/{repo}/actions/runners | jq '.runners[] | select(.labels[].name == "ubuntu-8_cores-32_gb")'

Length of output: 250


Action: Verify Custom Runner Label Configuration

The custom runner label ubuntu-8_cores-32_gb isn’t a standard GitHub-hosted runner label. The automated check failed (repository not detected), so please manually verify that a self-hosted runner with this label is configured. This applies to the workflow file at:

  • .github/workflows/test_scala_spark.yaml (lines 23, 49, 75, 101, 127, 153)
🧰 Tools
🪛 actionlint (1.7.4)

23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

tchow-zlai and others added 14 commits February 13, 2025 23:25
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
@tchow-zlai tchow-zlai force-pushed the tchow/split-up-spark-tests branch from 0e449db to 73fc6e0 Compare February 14, 2025 07:25
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
.github/workflows/test_scala_spark.yaml (1)

176-176: ⚠️ Potential issue

Add newline at end of file.

Add a newline character at the end of the file to comply with YAML standards.

            //spark:streaming_test
+
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

🧹 Nitpick comments (4)
spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (4)

42-76: Consider parameterizing test data.

The hardcoded test data and SQL query could be moved to test fixtures for better maintainability.


92-114: Consider breaking down complex SQL query.

The SQL query is complex and hard to maintain. Consider breaking it into CTEs or helper methods.


246-248: Add specific error messages to assertions.

The intercept blocks should verify specific error messages to ensure correct error handling.

-    intercept[AssertionError] {
+    val error = intercept[AssertionError] {
       new LabelJoin(invalidJoinConf, tableUtils, labelDS).computeLabelJoin()
     }
+    assert(error.getMessage.contains("expected error message"))

Also applies to: 281-283, 316-318


434-455: Add ScalaDoc to helper methods.

Document parameters and return values for better code maintainability.

+  /**
+   * Creates a test label join configuration.
+   * @param startOffset The start offset for the join window
+   * @param endOffset The end offset for the join window
+   * @param groupBys The group by configurations to join
+   * @return The label join configuration
+   */
   def createTestLabelJoin(startOffset: Int,
                          endOffset: Int,
                          groupBys: Seq[ai.chronon.api.GroupBy]): ai.chronon.api.LabelParts = {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 0e449db and 73fc6e0.

📒 Files selected for processing (18)
  • .github/workflows/test_scala_spark.yaml (1 hunks)
  • spark/BUILD.bazel (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/ChainingFetcherTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/JavaFetcherTest.java (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/FeatureWithLabelJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/streaming/AvroTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/streaming/KafkaStreamBuilderTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/streaming/MutationsTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/streaming/StreamingTest.scala (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (15)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/JavaFetcherTest.java
  • spark/src/test/scala/ai/chronon/spark/test/streaming/KafkaStreamBuilderTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/FeatureWithLabelJoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/streaming/StreamingTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/streaming/AvroTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/streaming/MutationsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/ChainingFetcherTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala
  • spark/BUILD.bazel
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/test_scala_spark.yaml

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)
.github/workflows/test_scala_spark.yaml

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: streamimg_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (5)
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala (3)

17-41: LGTM! Package move and import consolidation.

The package move and import consolidation improve code organization and readability.


68-68: LGTM! Improved resource handling.

Using Source.fromResource is safer for accessing resources in both IDE and packaged environments.


715-766: LGTM! Comprehensive test coverage.

Test cases cover various scenarios and are well-structured.

spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (1)

17-22: LGTM! Package organization and import consolidation.

The package move to test.join and import consolidation improves code organization.

.github/workflows/test_scala_spark.yaml (1)

23-23: Verify custom runner label configuration.

The runner label "ubuntu-8_cores-32_gb" is not a standard GitHub-hosted runner. Ensure it's properly configured in self-hosted runners.

Also applies to: 49-49, 75-75, 101-101, 127-127, 153-153

🧰 Tools
🪛 actionlint (1.7.4)

23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

Comment on lines 48 to 176
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}

steps:
- uses: actions/checkout@v4

- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Join tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:join_test

groupby_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}

steps:
- uses: actions/checkout@v4

- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run GroupBy tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:groupby_test

analyzer_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}

steps:
- uses: actions/checkout@v4

- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Analyzer tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:analyzer_test

streamimg_tests:
runs-on: ubuntu-8_cores-32_gb
container:
image: ghcr.io/${{ github.repository }}-ci:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
defaults:
run:
working-directory: ${{ github.workspace }}

steps:
- uses: actions/checkout@v4

- name: Setup Bazel cache credentials
run: |
echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json

- name: Run Streaming tests
run: |
bazel test \
--remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
--google_credentials=bazel-cache-key.json \
--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:streaming_test
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor duplicate job configurations using a reusable workflow.

Extract common configuration into a reusable workflow to reduce duplication.

Create .github/workflows/spark-test-job.yaml:

name: Spark Test Job
on:
  workflow_call:
    inputs:
      test_target:
        required: true
        type: string
      memory_opts:
        required: true
        type: string
      step_name:
        required: true
        type: string

jobs:
  test:
    runs-on: ubuntu-8_cores-32_gb
    container:
      image: ghcr.io/${{ github.repository }}-ci:latest
      credentials:
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
    defaults:
      run:
        working-directory: ${{ github.workspace }}
    steps:
      - uses: actions/checkout@v4
      - name: Setup Bazel cache credentials
        run: |
          echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json
      - name: ${{ inputs.step_name }}
        run: |
          bazel test \
            --remote_cache=https://storage.googleapis.com/zipline-bazel-cache \
            --google_credentials=bazel-cache-key.json \
            --test_env=JAVA_OPTS="${{ inputs.memory_opts }}" \
            ${{ inputs.test_target }}

Then simplify the main workflow:

jobs:
  spark_tests:
    uses: ./.github/workflows/spark-test-job.yaml
    with:
      test_target: //spark:tests
      memory_opts: "-Xmx8G -Xms2G"
      step_name: "Run Spark tests"
  fetcher_tests:
    uses: ./.github/workflows/spark-test-job.yaml
    with:
      test_target: //spark:fetcher_test
      memory_opts: "-Xmx16G -Xms8G"
      step_name: "Run Fetcher tests"
  # ... repeat for other jobs
🧰 Tools
🪛 actionlint (1.7.4)

49-49: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


75-75: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


101-101: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


127-127: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


153-153: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

[error] 176-176: no new line character at the end of file

(new-line-at-end-of-file)

--test_env=JAVA_OPTS="-Xmx16G -Xms8G" \
//spark:analyzer_test

streamimg_tests:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo in job name.

"streamimg_tests" should be "streaming_tests".

-  streamimg_tests:
+  streaming_tests:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
streamimg_tests:
streaming_tests:

Co-authored-by: Thomas Chow <[email protected]>
@tchow-zlai tchow-zlai merged commit 277f040 into main Feb 14, 2025
10 of 13 checks passed
@tchow-zlai tchow-zlai deleted the tchow/split-up-spark-tests branch February 14, 2025 08:08
kumar-zlai pushed a commit that referenced this pull request Apr 25, 2025
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
- Introduced dedicated testing workflows covering multiple system
components to enhance overall reliability.
- Added new test suites for various components to enhance testing
granularity.
- **Refactor**
- Streamlined code organization with improved package structures and
consolidated imports across test modules.
- **Chores**
- Upgraded automated testing configurations with optimized resource
settings for improved performance and stability.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
kumar-zlai pushed a commit that referenced this pull request Apr 29, 2025
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
- Introduced dedicated testing workflows covering multiple system
components to enhance overall reliability.
- Added new test suites for various components to enhance testing
granularity.
- **Refactor**
- Streamlined code organization with improved package structures and
consolidated imports across test modules.
- **Chores**
- Upgraded automated testing configurations with optimized resource
settings for improved performance and stability.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
- Introduced dedicated testing workflows covering multiple system
components to enhance overall reliability.
- Added new test suites for various components to enhance testing
granularity.
- **Refactor**
- Streamlined code organization with improved package structures and
consolidated imports across test modules.
- **Chores**
- Upgraded automated testing configurations with optimized resource
settings for improved performance and stability.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
- Introduced dedicated testing workflows covering multiple system
components to enhance overall reliability.
- Added new test suites for various components to enhance testing
granularity.
- **Refactor**
- Streamlined code organization with improved package structures and
consolidated imports across test modules.
- **Chores**
- Upgraded automated testing configurations with optimized resource
settings for improved performance and stability.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
chewy-zlai pushed a commit that referenced this pull request May 16, 2025
## Summary

## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
- Introduced dedicated testing workflows covering multiple system
components to enhance overall reliability.
- Added new test suites for various components to enhance testing
granularity.
- **Refactor**
- Streamlined code organization with improved paour clientsage structures and
consolidated imports across test modules.
- **Chores**
- Upgraded automated testing configurations with optimized resource
settings for improved performance and stability.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to traour clients
the status of staour clientss when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants