Skip to content

wip: run join tests in parallel #79

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Conversation

tchow-zlai
Copy link
Collaborator

@tchow-zlai tchow-zlai commented Nov 22, 2024

Summary

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • Tests
    • Enhanced test structure by implementing fixture parameters for improved isolation.
    • Updated test methods to utilize unique database namespaces, preventing conflicts.
    • Minor adjustments made to expected outputs and assertions for better accuracy.
    • Updated the command for executing specific tests in the CI workflow.

Copy link

coderabbitai bot commented Nov 22, 2024

Walkthrough

The changes in JoinTest.scala involve a comprehensive restructuring of the test framework by integrating fixture parameters. The JoinTest class now extends FixtureAnyFunSuite and implements ParallelTestExecution, allowing for dynamic namespace generation based on test names. This modification enhances test isolation through unique database namespaces for each test case. Additionally, several test methods have been updated to include a namespace parameter, ensuring consistent metadata handling. Overall, the adjustments refine the organization and execution of tests without altering their fundamental purposes.

Changes

File Path Change Summary
spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala - Class signature updated to extend FixtureAnyFunSuite and ParallelTestExecution.
- Test methods modified to accept a namespace parameter for unique database namespaces.
- Metadata calls updated to convert namespace to string.
- Minor logic adjustments in several tests.
.github/workflows/test_scala_and_python.yaml - Job command updated in join_spark_tests to execute JoinTest with new parameters.

Possibly related PRs

Suggested reviewers

  • varant-zlai
  • piyush-zlai
  • nikhil-zlai

Poem

🐇 In the land of code where rabbits play,
Tests now run in a clearer way.
With namespaces bright, each test a delight,
Isolation’s the key, shining so bright!
Hopping through logic, with fixtures we cheer,
JoinTest is ready, let’s give a big cheer! 🎉


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (20)
spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (20)

59-66: Consider cleaning up test databases after execution

The withFixture method creates a new database for each test but does not clean it up afterwards. This could lead to an accumulation of test databases over time.

You might want to modify the withFixture method to drop the database after the test runs:

 override def withFixture(test: OneArgTest) = {

   val namespace: FixtureParam = f"test_namespace_jointest__${test.name.split(" ").mkString("_")}"
   tableUtils.createDatabase(namespace)
+  try {
     test(namespace)
+  } finally {
+    tableUtils.dropDatabase(namespace)
+  }
 }

129-129: Unnecessary toString call on namespace

The namespace parameter is already a String. Calling namespace.toString is redundant.

Apply this diff to simplify the code:

           Builders.MetaData(name = "unit_test.user_transactions", namespace = namespace.toString, team = "chronon")
+          Builders.MetaData(name = "unit_test.user_transactions", namespace = namespace, team = "chronon")

157-157: Unnecessary toString call on namespace

The namespace is a String; calling toString is unnecessary.

Apply this diff:

           Builders.MetaData(name = "test.user_transaction_features", namespace = namespace.toString, team = "chronon")
+          Builders.MetaData(name = "test.user_transaction_features", namespace = namespace, team = "chronon")

296-296: Unnecessary toString call on namespace

Avoid redundant toString on a String variable.

Apply this diff:

         metaData = Builders.MetaData(name = "unit_test.country_weights", namespace = namespace.toString)
+        metaData = Builders.MetaData(name = "unit_test.country_weights", namespace = namespace)

315-315: Unnecessary toString call on namespace

The namespace variable is already a String.

Apply this diff:

         metaData = Builders.MetaData(name = "unit_test.country_heights", namespace = namespace.toString)
+        metaData = Builders.MetaData(name = "unit_test.country_heights", namespace = namespace)

328-328: Unnecessary toString call on namespace

Simplify the namespace assignment.

Apply this diff:

         metaData = Builders.MetaData(name = "test.country_features", namespace = namespace.toString, team = "chronon")
+        metaData = Builders.MetaData(name = "test.country_features", namespace = namespace, team = "chronon")

413-413: Unnecessary toString call on namespace

Remove the redundant toString method.

Apply this diff:

          metaData = Builders.MetaData(name = "test.country_weights_no_backfill", namespace = namespace.toString)
+         metaData = Builders.MetaData(name = "test.country_weights_no_backfill", namespace = namespace)

427-427: Unnecessary toString on namespace

Apply this diff:

                                       namespace = namespace.toString,
+                                      namespace = namespace,

468-468: Unnecessary toString call on namespace

Simplify the metaData construction.

Apply this diff:

          metaData = Builders.MetaData(name = "unit_test.item_views", namespace = namespace.toString),
+         metaData = Builders.MetaData(name = "unit_test.item_views", namespace = namespace),

485-485: Unnecessary toString call on namespace

Apply this diff:

            Builders.MetaData(name = "test.item_snapshot_features_2", namespace = namespace.toString, team = "chronon")
+           Builders.MetaData(name = "test.item_snapshot_features_2", namespace = namespace, team = "chronon")

542-542: Unnecessary toString call on namespace

Apply this diff:

          metaData = Builders.MetaData(name = "unit_test.item_views", namespace = namespace.toString)
+         metaData = Builders.MetaData(name = "unit_test.item_views", namespace = namespace)

732-732: Unnecessary toString call on namespace

Apply this diff:

          metaData = Builders.MetaData(name = "test.user_features", namespace = namespace.toString, team = "chronon")
+         metaData = Builders.MetaData(name = "test.user_features", namespace = namespace, team = "chronon")

985-986: Unnecessary toString call on namespace

Apply this diff:

          metaData =
            Builders.MetaData(name = "test.item_snapshot_bloom_test", namespace = namespace.toString, team = "chronon")
+         metaData =
+           Builders.MetaData(name = "test.item_snapshot_bloom_test", namespace = namespace, team = "chronon")

1035-1037: Unnecessary toString call on namespace

Apply this diff:

         metaData = Builders.MetaData(name = s"unit_test.item_views_$nameSuffix",
                                      namespace = namespace.toString,
                                      team = "item_team"),
+        metaData = Builders.MetaData(name = s"unit_test.item_views_$nameSuffix",
+                                     namespace = namespace,
+                                     team = "item_team"),

1044-1046: Unnecessary toString call on namespace

Apply this diff:

         metaData = Builders.MetaData(name = s"test.item_temporal_features$nameSuffix",
                                      namespace = namespace.toString,
                                      team = "item_team")
+        metaData = Builders.MetaData(name = s"test.item_temporal_features$nameSuffix",
+                                     namespace = namespace,
+                                     team = "item_team")

1144-1146: Unnecessary toString call on namespace

Apply this diff:

            Builders.MetaData(name = "unit_test.key_overlap.user_features",
                              namespace = namespace.toString,
                              team = "chronon")
+           Builders.MetaData(name = "unit_test.key_overlap.user_features",
+                             namespace = namespace,
+                             team = "chronon")

1198-1198: Unnecessary toString call on namespace

Apply this diff:

          metaData = Builders.MetaData(name = "unit_test.item_views_selected_join_parts_1",
                                       namespace = namespace.toString,
                                       team = "item_team"),
+         metaData = Builders.MetaData(name = "unit_test.item_views_selected_join_parts_1",
+                                      namespace = namespace,
+                                      team = "item_team"),

1214-1214: Unnecessary toString call on namespace

Apply this diff:

          metaData = Builders.MetaData(name = "unit_test.item_views_selected_join_parts_2",
                                       namespace = namespace.toString,
                                       team = "item_team"),
+         metaData = Builders.MetaData(name = "unit_test.item_views_selected_join_parts_2",
+                                      namespace = namespace,
+                                      team = "item_team"),

1230-1230: Unnecessary toString call on namespace

Apply this diff:

          metaData = Builders.MetaData(name = "unit_test.item_views_selected_join_parts_3",
                                       namespace = namespace.toString,
                                       team = "item_team"),
+         metaData = Builders.MetaData(name = "unit_test.item_views_selected_join_parts_3",
+                                      namespace = namespace,
+                                      team = "item_team"),

1244-1246: Unnecessary toString call on namespace

Apply this diff:

          metaData = Builders.MetaData(name = "unit_test.item_temporal_features.selected_join_parts",
                                       namespace = namespace.toString,
                                       team = "item_team",
                                       online = true)
+         metaData = Builders.MetaData(name = "unit_test.item_temporal_features.selected_join_parts",
+                                      namespace = namespace,
+                                      team = "item_team",
+                                      online = true)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c7afd1c and 17474eb.

📒 Files selected for processing (1)
  • spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (36 hunks)
🔇 Additional comments (7)
spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (7)

42-43: Imports added for fixture support and parallel execution

The added imports for FixtureAnyFunSuite and ParallelTestExecution are necessary for the updated test structure.


49-49: Ensure thread safety with ParallelTestExecution

By extending ParallelTestExecution, tests will run in parallel. Please verify that shared resources, such as databases or file systems, are properly synchronized to avoid race conditions or data inconsistencies.


70-70: Update test methods to accept namespace parameter

The test method now accepts a namespace parameter, enhancing test isolation and allowing dynamic namespace creation for each test case.


907-907: Update method to accept namespace parameter

The getEventsEventsTemporal method now accepts a namespace parameter, allowing for dynamic namespace usage within the method.


868-868: Update method to accept namespace parameter

The getViewsGroupBy method now includes a namespace parameter, enhancing flexibility in specifying namespaces.


651-651: Update method to accept namespace parameter

Method getGroupByForIncrementalSourceTest now accepts namespace as a parameter, allowing for better control over test namespaces.


927-927: Update test method to accept namespace parameter

The test method now uses the namespace parameter, ensuring consistency across test cases.

Comment on lines +1052 to +1055
test("test migration") { _ =>
// Left

val namespace = "test_namespace_jointest"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Inconsistent use of namespace parameter

In the test("test migration") method, the namespace parameter is ignored ({ _ =>) and a local variable namespace is hardcoded. For consistency and to leverage the dynamic namespace generation, consider using the fixture parameter.

Apply this diff to use the fixture parameter:

-    test("test migration") { _ =>
-
-        val namespace = "test_namespace_jointest"
+    test("test migration") { namespace =>
+
+        // `namespace` is provided by the fixture
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test("test migration") { _ =>
// Left
val namespace = "test_namespace_jointest"
test("test migration") { namespace =>
// `namespace` is provided by the fixture

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (10)
spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (10)

126-127: Redundant .toString on namespace

The namespace variable is already of type String; calling .toString is unnecessary.

Apply this diff:

-          Builders.MetaData(name = "unit_test.user_transactions", namespace = namespace.toString, team = "chronon")
+          Builders.MetaData(name = "unit_test.user_transactions", namespace = namespace, team = "chronon")

154-155: Redundant .toString on namespace

Apply this diff:

-          Builders.MetaData(name = "test.user_transaction_features", namespace = namespace.toString, team = "chronon")
+          Builders.MetaData(name = "test.user_transaction_features", namespace = namespace, team = "chronon")

293-294: Redundant .toString on namespace

Apply this diff:

-          metaData = Builders.MetaData(name = "unit_test.country_weights", namespace = namespace.toString)
+          metaData = Builders.MetaData(name = "unit_test.country_weights", namespace = namespace)

312-313: Redundant .toString on namespace

Apply this diff:

-          metaData = Builders.MetaData(name = "unit_test.country_heights", namespace = namespace.toString)
+          metaData = Builders.MetaData(name = "unit_test.country_heights", namespace = namespace)

425-425: Redundant .toString on namespace

Apply this diff:

-          metaData = Builders.MetaData(name = "test.country_no_historical_backfill",
-                                       namespace = namespace.toString,
-                                       team = "chronon",
-                                       historicalBackfill = false)
+          metaData = Builders.MetaData(name = "test.country_no_historical_backfill",
+                                       namespace = namespace,
+                                       team = "chronon",
+                                       historicalBackfill = false)

483-483: Redundant .toString on namespace

Apply this diff:

-            Builders.MetaData(name = "test.item_snapshot_features_2", namespace = namespace.toString, team = "chronon")
+            Builders.MetaData(name = "test.item_snapshot_features_2", namespace = namespace, team = "chronon")

540-540: Redundant .toString on namespace

Apply this diff:

-          metaData = Builders.MetaData(name = "unit_test.item_views", namespace = namespace.toString)
+          metaData = Builders.MetaData(name = "unit_test.item_views", namespace = namespace)

594-594: Redundant .toString on namespace in Method Call

In the method call getViewsGroupBy(namespace.toString, suffix = "cumulative", makeCumulative = true), namespace is already a String, so calling .toString is unnecessary.

Apply this diff:

-        val viewsGroupBy = getViewsGroupBy(namespace.toString, suffix = "cumulative", makeCumulative = true)
+        val viewsGroupBy = getViewsGroupBy(namespace, suffix = "cumulative", makeCumulative = true)

1033-1035: Redundant .toString on namespace

Apply this diff:

-          metaData = Builders.MetaData(name = s"unit_test.item_views_$nameSuffix",
-                                       namespace = namespace.toString,
-                                       team = "item_team")
+          metaData = Builders.MetaData(name = s"unit_test.item_views_$nameSuffix",
+                                       namespace = namespace,
+                                       team = "item_team")

1142-1144: Redundant .toString on namespace

Apply this diff:

-          metaData = Builders.MetaData(name = "unit_test.key_overlap.user_features",
-                                       namespace = namespace.toString,
-                                       team = "chronon")
+          metaData = Builders.MetaData(name = "unit_test.key_overlap.user_features",
+                                       namespace = namespace,
+                                       team = "chronon")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 17474eb and 41353dd.

📒 Files selected for processing (2)
  • .github/workflows/test_scala_and_python.yaml (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (36 hunks)
🔇 Additional comments (3)
.github/workflows/test_scala_and_python.yaml (1)

87-87: Verify parallel test execution configuration

The change introduces parallel test execution with 8 threads (-P8). While this can improve test execution speed, please ensure:

  1. The tests are truly independent and don't share mutable state
  2. The memory settings (-Xmx8G) are sufficient for parallel execution
  3. The CI environment can handle the increased load

Run this script to check for potential test dependencies and shared resources:

Consider the following improvements:

  1. Scale the number of parallel threads (-P8) based on the CI environment's CPU cores
  2. Add memory monitoring to detect potential OOM issues
  3. Consider moving this job to ubuntu-8_cores-32_gb runner like other_spark_tests if memory becomes a constraint
✅ Verification successful

Based on the results, I can now analyze the test isolation patterns. Let me check the specific implementation of JoinTest to verify thread safety:


Based on the code analysis, I can now provide a definitive response about the parallel test execution safety:

Parallel test execution is safe with current implementation

The JoinTest implementation is thread-safe for parallel execution because:

  1. It extends ParallelTestExecution trait and uses FixtureAnyFunSuite with proper test isolation:

    • Each test gets its own namespace via test_namespace_jointest__${test.name}
    • Database and tables are created fresh for each test via withFixture
  2. No shared mutable state between tests:

    • SparkSession is immutable and thread-safe
    • TableUtils is effectively immutable
    • Each test operates in its own database namespace
    • Test data is generated uniquely per test
  3. Resource isolation:

    • Each test creates and manages its own tables in isolated namespaces
    • The -Xmx8G heap setting is sufficient for parallel execution with 8 threads given the test data volumes
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential shared resources in JoinTest

# Look for shared resource patterns in JoinTest
ast-grep --pattern 'object $OBJECT {
  $$$
  val $FIELD = $$$
  $$$
}' | grep -B2 -A2 'JoinTest'

# Check for static/shared database connections
rg -A 5 'Connection|DataSource|Database' 

Length of output: 34305


Script:

#!/bin/bash
# Check JoinTest implementation for thread safety patterns
ast-grep --pattern 'class JoinTest extends $_ {
  $$$
}'

# Look for shared state patterns in test setup
rg -A 10 "def beforeEach|def beforeAll|@Before|@BeforeClass" spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala

Length of output: 100378

spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (2)

1050-1053: Inconsistent use of namespace parameter


63-63: ⚠️ Potential issue

Syntax Error: Incorrect String Interpolation

In Scala, string interpolation uses the s prefix instead of f. Replace f with s to fix the syntax error.

Apply this diff to correct the syntax:

- val namespace: FixtureParam = f"test_namespace_jointest__${test.name.split(" ").mkString("_")}"
+ val namespace: FixtureParam = s"test_namespace_jointest__${test.name.split(" ").mkString("_")}"

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (1)

126-127: Redundant .toString Call on String Variable namespace

The namespace variable is already of type String, so calling .toString on it is unnecessary. This occurs in multiple places. Consider removing .toString to improve code clarity.

Apply this change across the affected lines:

- namespace = namespace.toString
+ namespace = namespace

Also applies to: 154-155, 273-274, 294-294, 313-313, 326-326, 411-411, 425-425, 466-466, 482-483, 540-540, 594-594, 596-596, 730-730, 771-771, 796-797, 918-918, 926-926, 968-968, 983-984, 1033-1035, 1042-1044, 1142-1144, 1196-1196, 1212-1212, 1228-1228, 1242-1245

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 41353dd and 4dd00dd.

📒 Files selected for processing (1)
  • spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (36 hunks)
🔇 Additional comments (2)
spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (2)

49-49: Ensure Thread Safety with Parallel Test Execution

By extending ParallelTestExecution, tests will run concurrently. Please verify that shared resources like spark and tableUtils are thread-safe or appropriately synchronized to prevent race conditions.


1050-1053: Inconsistent Use of namespace Parameter

This issue has been previously raised: In the test("test migration") method, the namespace parameter is ignored ({ _ =>), and a local variable namespace is hardcoded. For consistency and to leverage the dynamic namespace generation, consider using the fixture parameter.

Comment on lines 51 to 52
val spark: SparkSession = SparkSessionBuilder.build("JoinTest", local = true)
private implicit val tableUtils = TableUtils(spark)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential Thread Safety Issue with Shared spark and tableUtils Instances

The spark and tableUtils instances are shared across all tests. Since tests now run in parallel, this could lead to concurrency issues if these instances are not thread-safe. Consider instantiating spark and tableUtils within each test or ensuring they are safe for concurrent use.

Apply this diff to instantiate spark and tableUtils within each test:

- val spark: SparkSession = SparkSessionBuilder.build("JoinTest", local = true)
- private implicit val tableUtils = TableUtils(spark)
+ // Remove shared instances; create per-test instances in 'withFixture' or individual tests.

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (3)

60-67: Consider adding cleanup for test namespaces.

The fixture implementation correctly creates unique namespaces for test isolation. However, there's no cleanup of these namespaces after test execution, which could lead to accumulation of test databases over time.

Consider adding cleanup in the fixture:

 override def withFixture(test: OneArgTest): Outcome = {
   val namespace: FixtureParam = f"test_namespace_jointest__${test.name.split(" ").mkString("_")}"
   tableUtils.createDatabase(namespace)
-  test(namespace)
+  try {
+    test(namespace)
+  } finally {
+    tableUtils.dropDatabase(namespace, cascade = true)
+  }
 }

Line range hint 69-1051: Consider extracting common test patterns into helper methods.

The test methods follow similar patterns for setup and verification. Consider extracting these common patterns into helper methods to improve maintainability and reduce code duplication:

  1. Table setup with DataFrameGen
  2. Join configuration creation
  3. Result verification with diff comparison

Example helper method:

private def verifyJoinResults(computed: DataFrame, expected: DataFrame, keyColumns: List[String]): Unit = {
  val diff = Comparison.sideBySide(computed, expected, keyColumns)
  if (diff.count() > 0) {
    println(s"Actual count: ${computed.count()}")
    println(s"Expected count: ${expected.count()}")
    println(s"Diff count: ${diff.count()}")
    diff.show()
  }
  assertEquals(0, diff.count())
}

Also applies to: 1051-1100


Line range hint 42-1275: Good implementation of parallel test execution with proper test isolation.

The changes successfully achieve the PR objective of running join tests in parallel while maintaining test isolation through:

  1. Proper use of ScalaTest's parallel execution support
  2. Test isolation using unique namespaces per test
  3. Consistent fixture parameter usage

Some architectural considerations:

  1. The parallel execution might increase resource usage during test runs
  2. Test isolation through namespaces provides good separation but requires proper cleanup
  3. Consider monitoring test execution times to ensure parallel execution provides expected benefits
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4dd00dd and 35f77fe.

📒 Files selected for processing (2)
  • .github/workflows/test_scala_and_python.yaml (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (36 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/test_scala_and_python.yaml
🔇 Additional comments (2)
spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (2)

42-50: LGTM! Good implementation of parallel test execution.

The changes correctly implement parallel test execution by:

  1. Adding necessary imports for parallel testing and fixture support
  2. Extending FixtureAnyFunSuite and mixing in ParallelTestExecution

Line range hint 867-922: LGTM! Well-structured helper methods.

The helper methods are well-implemented with:

  1. Clear separation of concerns
  2. Proper parameter handling
  3. Consistent namespace usage

Also applies to: 906-922


import scala.collection.JavaConverters._
import scala.util.ScalaJavaConversions.ListOps

// Run as follows: sbt "spark/testOnly -- -n jointest"
class JoinTest extends AnyFunSuite with TaggedFilterSuite {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the challenge with yanking the tagged filter suite iirc is that this test will get triggered as part of the other spark tests step ('sbt "spark/testOnly"'). For this one we don't want the dual run right?

@@ -84,7 +84,7 @@ jobs:
- name: Run other spark tests
run: |
export SBT_OPTS="-Xmx8G -Xms2G --add-opens=java.base/sun.nio.ch=ALL-UNNAMED"
sbt "spark/testOnly -- -n jointest"
sbt "spark/testOnly ai.chronon.spark.test.JoinTest -- -P8"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the runs-on: ubuntu-latest single core?

@tchow-zlai tchow-zlai closed this Dec 23, 2024
@tchow-zlai tchow-zlai deleted the tchow/jointest-parallel branch December 23, 2024 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants