unifying all tests into flatspec except vertx ones #212

nikhil-zlai · 2025-01-14T14:38:02Z

Why?

we have tests written in multiple flavors

extending junit TestCase class
using @test annotation
using AnyFunSuite
using AnyFlatSpec
using vertx junit runner

bazel silently fails to run the tests when they are not uniform!

How?

This PR translates almost all of the tests to AnyFlatSpec except for vertx tests.
I used the script at scripts/codemod/test_replace.py

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

Release Notes

Testing Framework Update
- Transitioned from JUnit and AnyFunSuite to ScalaTest's AnyFlatSpec across multiple test classes.
- Improved test readability and expressiveness.
- Maintained existing test logic and functionality.
- Updated test method syntax from @Test to it should format.
Affected Components
- Aggregator test suite
- API test suite
- Cloud AWS/GCP test suites
- Flink test suite
- Online test suite
- Spark test suite
- New Python script added for automated conversion of tests to AnyFlatSpec format.
Benefits
- More descriptive test case definitions.
- Enhanced test documentation.
- Consistent testing approach across the project.

coderabbitai · 2025-01-14T14:38:12Z

Walkthrough

This pull request introduces a comprehensive migration of test cases across multiple Scala modules from JUnit to ScalaTest's AnyFlatSpec framework. The changes primarily involve restructuring test method definitions, updating import statements, and adopting a more descriptive, behavior-driven testing style. A new Python script test_replace.py has been added to facilitate this conversion process, demonstrating an automated approach to framework migration.

Changes

File	Change Summary
`.github/workflows/test_scala_no_spark.yaml`	Updated SBT memory allocation and added Catalyst Util test step
`aggregator/test/scala/*Test.scala`	Migrated from JUnit to ScalaTest `AnyFlatSpec`
`api/test/scala/*Test.scala`	Converted test cases to ScalaTest style
`build.sbt`	Updated project dependency declaration
`spark/test/scala/*Test.scala`	Transitioned to ScalaTest framework
`scripts/codemod/test_replace.py`	New script for automated test framework conversion

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant Script as test_replace.py
    participant Tests as Scala Test Files
    
    Dev->>Script: Run conversion script
    Script->>Tests: Analyze test files
    Script->>Tests: Convert JUnit annotations
    Script->>Tests: Rewrite test method signatures
    Script->>Tests: Update import statements
    Script-->>Dev: Provide conversion results

Possibly related PRs

Tweak spark test setup to tags and run tests appropriately #56: GitHub Actions workflow testing enhancements
fix: Address flaky spark tests #72: Addressing flaky Spark tests
feat: unit tests for local iteration #148: Adding unit tests for local iteration

Suggested Reviewers

piyush-zlai
varant-zlai
david-zlai

Poem

🧪 Test cases dance and sway,
From JUnit's rigid display,
ScalaTest brings grace anew,
Descriptive tests shine through!
🚀 Code evolves, frameworks align! 🌟

Warning

Review ran into problems

🔥 Problems

GitHub Actions: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

Finishing Touches

📝 Generate Docstrings (Beta)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 6

🔭 Outside diff range comments (4)

online/src/test/scala/ai/chronon/online/test/TagsTest.scala (1)
Line range hint 48-85: Replace JUnit assertions with ScalaTest style.

Test converted well to BDD style, but still uses JUnit's Assert.
-import org.junit.Assert.assertEquals
+import org.scalatest.matchers.should.Matchers

-class TagsTest extends AnyFlatSpec {
+class TagsTest extends AnyFlatSpec with Matchers {

-    assertEquals(cache.cMap.size(), 2)
+    cache.cMap.size() shouldBe 2

-    assertEquals(slowTags, fastTags)
+    fastTags shouldBe slowTags
spark/src/test/scala/ai/chronon/spark/test/GroupByTest.scala (1)
Line range hint 89-89: Update assertions to ScalaTest style.

Replace JUnit assertions with ScalaTest's matchers for consistency.
-assertEquals(0, diff.count())
+diff.count() shouldBe 0
-assertTrue(aggregationsMetadata.length == 2)
+aggregationsMetadata should have length 2
Also applies to: 141-141, 273-273, 294-294, 322-322, 352-352, 491-491, 517-517, 544-544, 618-618, 656-656
spark/src/test/scala/ai/chronon/spark/test/FeatureWithLabelJoinTest.scala (1)
Line range hint 584-611: Convert remaining test methods to ScalaTest's syntax.

These methods should follow the "it should" pattern like other tests in the file.
-  def testWhereClauseShouldFilterEventOut(): Unit = {
+  it should "filter event out with where clause" in {
     val selects = Map(
       "id" -> "key",
       "created" -> "created_ts",
       "score" -> "CAST(get_json_object(json_prediction, '$.score') as Double)"
     ).toSeq
     val wheres = Seq("tag = 'inexistent'")
     val cu = new CatalystUtil(inputEventStruct, selects, wheres)
     val res = cu.performSql(inputEventRow)
     assertTrue(res.isEmpty)
   }

-  def testJsonInSelectAndValidWhereClause(): Unit = {
+  it should "handle json in select with valid where clause" in {
     val selects = Map(
       "id" -> "key",
       "created" -> "created_ts",
       "score" -> "CAST(get_json_object(json_prediction, '$.score') as Double)"
     ).toSeq
     val wheres = Seq("tag = 'v1.0'")
     val cu = new CatalystUtil(inputEventStruct, selects, wheres)
     val res = cu.performSql(inputEventRow)
     assertTrue(res.get.size == 3)
     assertTrue(res.get("id") == "unique_key")
     assertTrue(res.get("created") == 1000L)
     assertTrue(res.get("score") == 0.5)
   }
aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxPercentilesTest.scala (1)
Line range hint 59-64: Fix syntax error in test name.

Remove the extra ": unit = {" from the test name.
-  it should "basic percentiles: unit = {" in {
+  it should "basic percentiles" in {

🧹 Nitpick comments (58)

spark/src/test/scala/ai/chronon/spark/test/AvroTest.scala (1)
37-37: Use a more descriptive test name.

"decimal" doesn't clearly describe what's being tested. Consider a more descriptive name like "should handle decimal types in Avro schema correctly".
-  it should "decimal" in {
+  it should "handle decimal types in Avro schema correctly" in {
spark/src/test/scala/ai/chronon/spark/test/StreamingTest.scala (1)
55-55: Consider improving test description.

The "struct in streaming" description could be more descriptive of the test's purpose.
-  it should "struct in streaming" in {
+  it should "handle structured data in streaming operations correctly" in {
online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1)
95-95: Consider improving test descriptions.

The test descriptions could be more descriptive of the test scenarios.
-  it should "tile codec ir ser round trip" in {
+  it should "successfully serialize and deserialize tile intermediate results" in {
-  it should "tile codec ir ser round trip_with buckets" in {
+  it should "successfully serialize and deserialize bucketed tile intermediate results" in {
Also applies to: 129-129
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/GCSFormatTest.scala (1)

Line range hint 23-90: Consider adding more edge cases.

While the existing tests cover basic scenarios, consider adding tests for:

Invalid date formats

Malformed partition structures

Very large partition sets
online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (2)
27-27: Consider removing JUnit Assert import.

Replace JUnit assertions with ScalaTest matchers for consistency.
-import org.junit.Assert.assertEquals
+import org.scalatest.matchers.should.Matchers
31-31: Make test description more descriptive.

Current "decoding" is vague.
-  it should "decoding" in {
+  it should "decode Thrift objects into expected JSON format" in {
Also applies to: 33-33
spark/src/test/scala/ai/chronon/spark/test/bootstrap/LogBootstrapTest.scala (1)

51-51: Consider adding a more descriptive test name.

"bootstrap" is vague. Consider something like "should correctly bootstrap log data with schema evolution".
spark/src/test/scala/ai/chronon/spark/test/EditDistanceTest.scala (1)
Line range hint 28-42: Consider using ScalaTest assertions.

Replace JUnit's assertEquals with ScalaTest's assertion style for better consistency.
-    assertEquals(0, of(null, null).total)
+    of(null, null).total should be (0)
online/src/test/scala/ai/chronon/online/test/JoinCodecTest.scala (1)
Line range hint 42-43: Consider using ScalaTest assertions.

Replace JUnit's assertEquals with ScalaTest's matchers for consistency.
-    assertEquals(expected, result)
+    result should equal(expected)
spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala (2)
67-67: Consider enhancing the test description.

The current "metadata export" is too vague.
-  it should "metadata export" in {
+  it should "successfully export metadata to the expected JSON format" in {
Line range hint 67-93: Improve test structure with BeforeAndAfter.

Long setup code should be moved to a setup method.
import org.scalatest.BeforeAndAfter

class MetadataExporterTest extends AnyFlatSpec with BeforeAndAfter {
  private var tmpDir: File = _
  private var sampleTable: String = _

  before {
    val namespace = "example_namespace"
    val tablename = "table"
    tableUtils.createDatabase(namespace)
    sampleTable = s"$namespace.$tablename"
    tmpDir = Files.createTempDir()
    // ... rest of setup
  }

  after {
    tmpDir.delete()
  }
spark/src/test/scala/ai/chronon/spark/test/ExternalSourcesTest.scala (1)

32-33: Consider breaking down the test for better maintainability.

Framework migration looks good, but the test could be split into smaller, focused scenarios:

External source fetching

Schema validation

Error handling

api/src/test/scala/ai/chronon/api/test/ExtensionsTest.scala (1)

Line range hint 36-157: Test conversions look good, minor style suggestion.

Clean conversion to "it should" style while preserving test logic.

Consider using consistent verb tense in test descriptions:

"sub partition filters" → "handle sub partition filters"

"owning team" → "determine owning team"
spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1)
Line range hint 46-58: Consider ScalaTest idioms and assertions.

Use ScalaTest assertions instead of JUnit's assertEquals

Simplify test descriptions to be more idiomatic
-  it should "load data file as table should be correct" in {
+  it should "load CSV data as a table" in {
     // ...
-    assertEquals(3, loadedDataDf.count())
+    loadedDataDf.count() shouldBe 3

-  it should "load data recursively should be correct" in {
+  it should "recursively load CSV data from directory" in {
     // ...
-    assertEquals(3, loadedDataDf.count())
+    loadedDataDf.count() shouldBe 3
Also applies to: 60-71
flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala (2)

21-21: LGTM with minor style suggestion

Consider "successfully write to store" for more natural BDD phrasing.

43-43: Consider incorporating comment into test name

Suggest: "should continue processing after encountering poison pill writes"
spark/src/test/scala/ai/chronon/spark/test/StagingQueryTest.scala (1)
Line range hint 43-286: Consider using ScalaTest matchers.

Replace JUnit assertions with ScalaTest's matchers for better readability:
-assertEquals(0, diff.count())
+diff.count() shouldBe 0
online/src/test/scala/ai/chronon/online/test/LRUCacheTest.scala (1)
Line range hint 23-32: Consider expanding eviction test coverage.

Add test cases for multiple entries to verify LRU behavior.
it should "evict least recently used entry when capacity is exceeded" in {
  testCache.put("key1", "value1")
  testCache.put("key2", "value2")
  testCache.getIfPresent("key1") // access key1 to make it more recent
  testCache.put("key3", "value3") // should evict key2
  assert(testCache.getIfPresent("key1") != null)
  assert(testCache.getIfPresent("key2") == null)
  assert(testCache.getIfPresent("key3") != null)
}
spark/src/test/scala/ai/chronon/spark/test/GroupByTest.scala (2)
51-51: Clean up JUnit imports.

The class has been migrated to ScalaTest's AnyFlatSpec, but JUnit imports are still present.
-import org.junit.Assert._
Also applies to: 55-55

61-61: Improve test descriptions.

The test descriptions could be more descriptive. Consider using BDD-style descriptions that explain the expected behavior.

Examples:
-  it should "snapshot entities" in {
+  it should "correctly aggregate entity snapshots with time windows" in {
-  it should "snapshot events" in {
+  it should "correctly aggregate event snapshots with multiple aggregations" in {
Also applies to: 94-94, 145-145, 214-214, 278-278, 297-297, 325-325, 355-355, 471-471, 494-494, 547-547, 562-562, 621-621
spark/src/test/scala/ai/chronon/spark/test/ResultValidationAbilityTest.scala (2)
34-34: Migrate fully to ScalaTest style

Replace JUnit assertions/annotations with ScalaTest equivalents for consistency.
-import org.junit.Assert.assertFalse
-import org.junit.Assert.assertTrue
-import org.junit.Before
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.matchers.should.Matchers

-class ResultValidationAbilityTest extends AnyFlatSpec {
+class ResultValidationAbilityTest extends AnyFlatSpec with BeforeAndAfterEach with Matchers {

-  @Before
-  def setup(): Unit = {
+  override def beforeEach(): Unit = {
Also applies to: 36-36

Line range hint 54-93: Improve test descriptions for better readability

Test cases are well structured but descriptions could be more idiomatic.
-  it should "should not validate when comparison table is not specified" in {
+  it should "skip validation when comparison table is not specified" in {

-  it should "should validate when comparison table is specified" in {
+  it should "perform validation when comparison table is specified" in {

-  it should "successful validation" in {
+  it should "pass validation when data matches" in {

-  it should "failed validation" in {
+  it should "fail validation when data mismatches" in {
spark/src/test/scala/ai/chronon/spark/test/CompareTest.scala (2)
56-56: Consider parameterizing the basic test.

The test data could be shared across test cases using ScalaTest's property-based testing.
it should behave like compareTest(
  leftData,
  rightData,
  leftColumns,
  leftColumns,
  Seq("keyId", "ts", "ds")
)
Also applies to: 87-87

123-123: Consider using ScalaTest's error checking.

Replace try-catch with ScalaTest's assertThrows.
assertThrows[AssertionError] {
  CompareBaseJob.compare(leftDf, rightDf, keys, tableUtils, mapping = mapping)
}
Also applies to: 138-138, 160-160, 182-182
spark/src/test/scala/ai/chronon/spark/test/TableUtilsFormatTest.scala (3)
35-37: Consider enhancing test description.

"testing dynamic classloading" could be more descriptive, e.g., "verify default format provider initialization".

Line range hint 39-73: Consider extracting common test setup.

Three test methods share similar setup code. Extract database/table creation into a helper method.
private def createTestTable(format: String): String = {
  val dbName = s"db_${System.currentTimeMillis()}"
  val tableName = s"$dbName.test_table_${System.nanoTime()}_$format"
  spark.sql(s"CREATE DATABASE IF NOT EXISTS $dbName")
  tableName
}
Also applies to: 74-108, 109-144

Line range hint 146-209: Consider breaking down testInsertPartitions.

Long helper method could be split into smaller, focused functions for better maintainability:

validateSchemaChanges

insertAndValidateFormat

validatePartitions
online/src/test/scala/ai/chronon/online/test/stats/DriftMetricsTest.scala (1)
Line range hint 78-128: Add assertions for PSI metric.

PSI metric is calculated but never verified in tests.
   it should "Low drift - similar distributions" in {
     val drifts = compareDistributions(meanA = 100.0, varianceA = 225.0, meanB = 101.0, varianceB = 225.0)
 
     // JSD assertions
     val (jsdPercentile, jsdHistogram) = drifts(DriftMetric.JENSEN_SHANNON)
     jsdPercentile should be < 0.05
     jsdHistogram should be < 0.05
 
     // Hellinger assertions
     val (hellingerPercentile, hellingerHisto) = drifts(DriftMetric.HELLINGER)
     hellingerPercentile should be < 0.05
     hellingerHisto should be < 0.05
+
+    // PSI assertions
+    val (psiPercentile, psiHisto) = drifts(DriftMetric.PSI)
+    psiPercentile should be < 0.05
+    psiHisto should be < 0.05
   }
spark/src/test/scala/ai/chronon/spark/test/JoinUtilsTest.scala (2)
38-38: Consider using ScalaTest assertions.

Replace JUnit assertions with ScalaTest's for consistency.
-import org.junit.Assert._
+import org.scalatest.matchers.should.Matchers
+
-class JoinUtilsTest extends AnyFlatSpec {
+class JoinUtilsTest extends AnyFlatSpec with Matchers {
Also applies to: 43-43

Line range hint 120-129: Enhance assertion messages.

Add descriptive messages to assertTrue calls for better failure reporting.
-      assertTrue(df.isFailure)
+      assertTrue("Expected join operation to fail", df.isFailure)
-      assertTrue(df.isSuccess)
+      assertTrue("Expected join operation to succeed", df.isSuccess)
spark/src/test/scala/ai/chronon/spark/test/KafkaStreamBuilderTest.scala (1)
30-30: Extract test constants.

Move hardcoded topic info to constants for better maintainability.
+  private val TestTopic = "test_topic"
+  private val TestSchema = "my_schema"
+  private val TestTopicInfo = TopicInfo.parse(s"kafka://$TestTopic/schema=$TestSchema/host=X/port=Y")
-    val topicInfo = TopicInfo.parse("kafka://test_topic/schema=my_schema/host=X/port=Y")
+    val topicInfo = TestTopicInfo
spark/src/test/scala/ai/chronon/spark/test/AnalyzerTest.scala (1)
Line range hint 84-110: Consider using ScalaTest's assertThrows.

Replace implicit assertion with explicit exception checking for better test clarity.
-    analyzer.analyzeJoin(joinConf, validationAssert = true)
+    assertThrows[AssertionError] {
+      analyzer.analyzeJoin(joinConf, validationAssert = true)
+    }
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (1)
Line range hint 41-44: Migrate setup to ScalaTest style.

Replace JUnit @rule and @before with ScalaTest's BeforeAndAfter trait.
+import org.scalatest.BeforeAndAfter
-@Rule
-def bigtableEmulator: BigtableEmulatorRule = _bigtableEmulator
+class BigTableKVStoreTest extends AnyFlatSpec with BeforeAndAfter {
+  private val bigtableEmulator = BigtableEmulatorRule.create()
+  before {
+    setup()
+  }
Also applies to: 50-50
online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (1)
Line range hint 199-229: Consider using ScalaTest assertions

Replace JUnit assertions with ScalaTest style:
-    assertFalse(fetcherBaseWithFlagStore.isCachingEnabled(Builders.GroupBy()))
-    assertFalse(fetcherBaseWithFlagStore.isCachingEnabled(buildGroupByWithCustomJson("test_groupby_2")))
-    assertTrue(fetcherBaseWithFlagStore.isCachingEnabled(buildGroupByWithCustomJson("test_groupby_3")))
+    fetcherBaseWithFlagStore.isCachingEnabled(Builders.GroupBy()) shouldBe false
+    fetcherBaseWithFlagStore.isCachingEnabled(buildGroupByWithCustomJson("test_groupby_2")) shouldBe false
+    fetcherBaseWithFlagStore.isCachingEnabled(buildGroupByWithCustomJson("test_groupby_3")) shouldBe true
spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala (3)
37-37: Consider enhancing test readability with additional ScalaTest features.

Add with Matchers to use more expressive assertions.
-import org.scalatest.flatspec.AnyFlatSpec
+import org.scalatest.flatspec.AnyFlatSpec
+import org.scalatest.matchers.should.Matchers

-class TableUtilsTest extends AnyFlatSpec {
+class TableUtilsTest extends AnyFlatSpec with Matchers {
Also applies to: 47-47

Line range hint 52-76: Use ScalaTest matchers for more readable assertions.
-    assertEquals(expected, columns.sorted)
+    columns.sorted should contain theSameElementsAs expected
Line range hint 257-298: Reduce test setup duplication using ScalaTest fixtures.

Multiple test methods share similar table setup logic. Consider using BeforeAndAfterEach or fixture.FlatSpec.
+import org.scalatest.BeforeAndAfterEach

-class TableUtilsTest extends AnyFlatSpec with Matchers {
+class TableUtilsTest extends AnyFlatSpec with Matchers with BeforeAndAfterEach {
+  override def beforeEach(): Unit = {
+    super.beforeEach()
+    spark.sql("CREATE DATABASE IF NOT EXISTS db")
+  }
Also applies to: 422-460
spark/src/test/scala/ai/chronon/spark/test/bootstrap/DerivationTest.scala (3)

Line range hint 49-294: Consider breaking down the test into smaller, focused test cases.

The test covers multiple scenarios and could be more maintainable if split into separate test cases for:

Basic derivations

External derivations

Contextual derivations

368-372: Consider more descriptive test names.

Rename to clarify the test scenarios:

"should log all fields when using wildcard"

"should log only specified fields without wildcard"

Line range hint 626-661: Consider extracting SQL queries to constants.

Move the SQL queries to class-level constants to improve maintainability.
online/src/test/scala/ai/chronon/online/test/FetcherCacheTest.scala (2)
Line range hint 42-324: Consider grouping related tests using behavior-driven development (BDD) style.

The tests are well-written but could be better organized. Consider using ScalaTest's behavior syntax to group related tests:
+ behavior of "BatchIrCache"
  it should "correctly cache batch irs" in { ... }
  it should "correctly cache map response" in { ... }
  it should "compare keys by equality" in { ... }

+ behavior of "CachedRequests"
  it should "return correct cached data when cache is enabled" in { ... }
  it should "not cache when cache is disabled for group by" in { ... }

+ behavior of "BatchBytes"
  it should "return latest timed value bytes if greater than batch end" in { ... }
  it should "return null if latest timed value timestamp is less than batch end" in { ... }
Line range hint 200-297: Extract setup code into helper methods to improve readability.

Long test methods could be more maintainable. Consider extracting common setup:
+ private def createMockedServingInfo(dataset: String, batchEndTs: Long): GroupByServingInfoParsed = {
+   val servingInfo = mock[GroupByServingInfoParsed]
+   val groupByOps = mock[GroupByOps]
+   when(servingInfo.groupByOps).thenReturn(groupByOps)
+   when(groupByOps.batchDataset).thenReturn(dataset)
+   when(servingInfo.batchEndTsMillis).thenReturn(batchEndTs)
+   servingInfo
+ }
spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala (1)
Line range hint 1-213: Consider removing unused JUnit imports.

The file still imports JUnit Assert methods but only uses a few. Consider importing only what's needed.
-import org.junit.Assert.assertEquals
-import org.junit.Assert.assertFalse
+import org.junit.Assert.{assertEquals, assertFalse}
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (2)

Line range hint 15-44: Extract hardcoded test values as constants.

Move test values like "mock-job-id" and "test-project" to class-level constants.

46-48: Add assertion to verify compatibility check.

The test should verify the outcome of validateScalaVersionCompatibility().
online/src/test/scala/ai/chronon/online/test/CatalystUtilHiveUDFTest.scala (1)
Line range hint 9-22: Consider using ScalaTest assertions.

Replace JUnit's assertEquals with ScalaTest's assert for consistency:
-    assertEquals(res.get.size, 2)
-    assertEquals(res.get("a"), Int.MaxValue - 1)
-    assertEquals(res.get("b"), "hello123")
+    assert(res.get.size === 2)
+    assert(res.get("a") === Int.MaxValue - 1)
+    assert(res.get("b") === "hello123")
spark/src/test/scala/ai/chronon/spark/test/DataRangeTest.scala (1)
Line range hint 31-35: Consider using ScalaTest assertions.

Replace JUnit's assertEquals with ScalaTest's assert for consistency:
-    assertEquals(range2, range1.intersect(range2))
+    assert(range1.intersect(range2) === range2)
aggregator/src/test/scala/ai/chronon/aggregator/test/MinHeapTest.scala (1)
Line range hint 27-49: Consider using ScalaTest assertions.

Replace JUnit's assertArrayEquals with ScalaTest's array comparison:
-    assertArrayEquals(arr1.asScala.toArray, Array(9, 4, 5, -1))
+    assert(arr1.asScala.toArray === Array(9, 4, 5, -1))

-    assertArrayEquals(arr2.asScala.toArray, Array(5, 4, 2, 1))
+    assert(arr2.asScala.toArray === Array(5, 4, 2, 1))

-    assertArrayEquals(arr1.asScala.toArray, Array(4, 2, 1, -1))
+    assert(arr1.asScala.toArray === Array(4, 2, 1, -1))
api/src/test/scala/ai/chronon/api/test/ParametricMacroTest.scala (1)
21-21: Consider using ScalaTest assertions

While the migration to AnyFlatSpec is correct, replace JUnit's assertEquals with ScalaTest's matchers for better integration.
-import org.junit.Assert.assertEquals
+import org.scalatest.matchers.should.Matchers

-class ParametricMacroTest extends AnyFlatSpec {
+class ParametricMacroTest extends AnyFlatSpec with Matchers {
   it should "substitution" in {
     // ... test code ...
-    assertEquals(expected, replaced)
+    replaced shouldBe expected
Also applies to: 23-24
aggregator/src/test/scala/ai/chronon/aggregator/test/VarianceTest.scala (1)
21-21: Consider using ScalaTest assertions

Replace JUnit's assertTrue with ScalaTest's matchers for better integration.
-import org.junit.Assert._
+import org.scalatest.matchers.should.Matchers

-class VarianceTest extends AnyFlatSpec {
+class VarianceTest extends AnyFlatSpec with Matchers {
   // ... in compare method ...
-  assertTrue((naiveResult - welfordResult) / naiveResult < 0.0000001)
+  ((naiveResult - welfordResult) / naiveResult) should be < 0.0000001
Also applies to: 25-25
aggregator/src/test/scala/ai/chronon/aggregator/test/MomentTest.scala (1)
7-7: Consider using ScalaTest assertions

Replace JUnit's assertEquals with ScalaTest's matchers for better integration.
-import org.junit.Assert._
+import org.scalatest.matchers.should.Matchers

-class MomentTest extends AnyFlatSpec {
+class MomentTest extends AnyFlatSpec with Matchers {
   // ... in assertUpdate method ...
-  assertEquals(expected(values), agg.finalize(ir), 0.1)
+  agg.finalize(ir) shouldBe expected(values) +- 0.1
Also applies to: 9-9
aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxDistinctTest.scala (1)
21-21: Consider using ScalaTest assertions

Replace JUnit's assertTrue with ScalaTest's matchers for better integration.
-import org.junit.Assert._
+import org.scalatest.matchers.should.Matchers

-class ApproxDistinctTest extends AnyFlatSpec {
+class ApproxDistinctTest extends AnyFlatSpec with Matchers {
   // ... in testErrorBound method ...
-  assertTrue(Math.abs(estimated - uniques) < errorBound)
+  Math.abs(estimated - uniques) should be < errorBound
Also applies to: 23-23
aggregator/src/test/scala/ai/chronon/aggregator/test/RowAggregatorTest.scala (1)
21-22: Consider using ScalaTest assertions.

Replace JUnit's Assert with ScalaTest's matchers for better integration.
-import org.junit.Assert._
+import org.scalatest.matchers.should.Matchers
aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothOnlineAggregatorTest.scala (1)
28-29: Standardize assertion style.

Align with ScalaTest's assertion methods for consistency.
-import org.junit.Assert.assertEquals
+import org.scalatest.matchers.should.Matchers
aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala (1)

Line range hint 42-44: Track the TODO comment about benchmarks.

The TODO comment suggests implementing proper benchmarks.

Would you like me to create a GitHub issue to track the implementation of proper benchmarks?

spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (1)

1195-1195: Add ScalaDoc for helper methods.

Consider adding ScalaDoc comments to document the purpose and parameters of helper methods like test selected join parts.

spark/src/test/scala/ai/chronon/spark/test/MigrationCompareTest.scala (1)

Line range hint 107-123: Consider parameterizing column selection.

The hardcoded column list "item, ts, ds" could be moved to a constant or configuration.

aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala (1)

Line range hint 95-102: Remove commented-out code.

Dead code should be removed rather than commented out.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 919f3dd and 4291b99.

📒 Files selected for processing (67)

aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxDistinctTest.scala (2 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxHistogramTest.scala (8 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxPercentilesTest.scala (3 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala (6 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/MinHeapTest.scala (1 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/MomentTest.scala (2 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/RowAggregatorTest.scala (2 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala (3 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothOnlineAggregatorTest.scala (1 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala (2 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/VarianceTest.scala (2 hunks)
api/src/test/scala/ai/chronon/api/test/DataTypeConversionTest.scala (1 hunks)
api/src/test/scala/ai/chronon/api/test/ExtensionsTest.scala (8 hunks)
api/src/test/scala/ai/chronon/api/test/ParametricMacroTest.scala (1 hunks)
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (7 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (4 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (12 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (3 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/GCSFormatTest.scala (3 hunks)
flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala (2 hunks)
flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (3 hunks)
flink/src/test/scala/ai/chronon/flink/test/SparkExpressionEvalFnTest.scala (1 hunks)
flink/src/test/scala/ai/chronon/flink/test/window/FlinkRowAggregationFunctionTest.scala (4 hunks)
flink/src/test/scala/ai/chronon/flink/test/window/KeySelectorTest.scala (2 hunks)
online/src/test/scala/ai/chronon/online/test/CatalystUtilHiveUDFTest.scala (1 hunks)
online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (23 hunks)
online/src/test/scala/ai/chronon/online/test/DataStreamBuilderTest.scala (3 hunks)
online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (9 hunks)
online/src/test/scala/ai/chronon/online/test/FetcherCacheTest.scala (12 hunks)
online/src/test/scala/ai/chronon/online/test/JoinCodecTest.scala (1 hunks)
online/src/test/scala/ai/chronon/online/test/LRUCacheTest.scala (1 hunks)
online/src/test/scala/ai/chronon/online/test/TagsTest.scala (2 hunks)
online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (3 hunks)
online/src/test/scala/ai/chronon/online/test/stats/DriftMetricsTest.scala (5 hunks)
spark/src/test/scala/ai/chronon/spark/test/AnalyzerTest.scala (11 hunks)
spark/src/test/scala/ai/chronon/spark/test/AvroTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/ChainingFetcherTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/CompareTest.scala (7 hunks)
spark/src/test/scala/ai/chronon/spark/test/DataRangeTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/EditDistanceTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/ExternalSourcesTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/FeatureWithLabelJoinTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/FetcherTest.scala (5 hunks)
spark/src/test/scala/ai/chronon/spark/test/GroupByTest.scala (13 hunks)
spark/src/test/scala/ai/chronon/spark/test/GroupByUploadTest.scala (4 hunks)
spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (17 hunks)
spark/src/test/scala/ai/chronon/spark/test/JoinUtilsTest.scala (13 hunks)
spark/src/test/scala/ai/chronon/spark/test/KafkaStreamBuilderTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/LabelJoinTest.scala (11 hunks)
spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/LocalExportTableAbilityTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/MigrationCompareTest.scala (5 hunks)
spark/src/test/scala/ai/chronon/spark/test/MutationsTest.scala (10 hunks)
spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/ResultValidationAbilityTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionTest.scala (4 hunks)
spark/src/test/scala/ai/chronon/spark/test/StagingQueryTest.scala (5 hunks)
spark/src/test/scala/ai/chronon/spark/test/StatsComputeTest.scala (5 hunks)
spark/src/test/scala/ai/chronon/spark/test/StreamingTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/TableUtilsFormatTest.scala (4 hunks)
spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala (16 hunks)
spark/src/test/scala/ai/chronon/spark/test/bootstrap/DerivationTest.scala (5 hunks)
spark/src/test/scala/ai/chronon/spark/test/bootstrap/LogBootstrapTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala (3 hunks)

🧰 Additional context used

📓 Learnings (1)

cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (2)

Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-11-12T09:38:33.532Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:82-89
Timestamp: 2024-11-12T09:38:33.532Z
Learning: DynamoDB does not create duplicate tables when attempting to create a table that already exists, so checking for duplicates in tests is redundant.

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: other_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: join_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: no_spark_scala_tests
GitHub Check: mutation_spark_tests

🔇 Additional comments (116)

flink/src/test/scala/ai/chronon/flink/test/window/KeySelectorTest.scala (4)

5-5: LGTM: ScalaTest import added.

7-7: LGTM: Class now extends AnyFlatSpec.

Line range hint 8-24: Test method converted correctly to FlatSpec style.

The behavior description is clear and test logic remains unchanged.

Line range hint 26-52: Test method converted correctly to FlatSpec style.

The behavior description and test logic are preserved.

spark/src/test/scala/ai/chronon/spark/test/AvroTest.scala (2)

28-28: LGTM!

Required import for ScalaTest.

30-30: LGTM!

Correct base class for ScalaTest.

spark/src/test/scala/ai/chronon/spark/test/StreamingTest.scala (2)

33-33: LGTM!

Import aligns with the transition to ScalaTest.

45-45: LGTM!

Class extension change matches PR objective.

flink/src/test/scala/ai/chronon/flink/test/window/FlinkRowAggregationFunctionTest.scala (5)

6-6: LGTM: Required ScalaTest import added.

11-11: LGTM: Correctly extends AnyFlatSpec.

Line range hint 53-93: LGTM: Test converted to ScalaTest style while preserving logic.

Successfully migrated from JUnit's @Test to ScalaTest's it should.

Line range hint 95-157: LGTM: Complex merge test preserved during framework migration.

Line range hint 159-205: LGTM: Order-sensitivity test maintained in new format.

spark/src/test/scala/ai/chronon/spark/test/LocalExportTableAbilityTest.scala (2)

34-34: LGTM! Clean transition to AnyFlatSpec.

The import and class signature changes correctly implement the ScalaTest framework.

Also applies to: 36-36

50-53: LGTM! Well-structured test cases.

Test cases are properly converted to ScalaTest's "it should" syntax with clear, descriptive names.

Also applies to: 56-60, 63-90

online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (2)

23-23: LGTM!

Required import for ScalaTest's AnyFlatSpec.

29-29: LGTM!

Clean conversion to ScalaTest style.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/GCSFormatTest.scala (2)

12-12: LGTM: Clean transition to AnyFlatSpec.

Proper import and class declaration changes.

Also applies to: 16-16

23-23: LGTM: Well-structured BDD style test cases.

Test cases maintain functionality while improving readability with descriptive "it should" statements.

Also applies to: 43-43, 63-63

spark/src/test/scala/ai/chronon/spark/test/bootstrap/LogBootstrapTest.scala (2)

35-35: LGTM! Import aligns with AnyFlatSpec migration.

42-42: LGTM! Class now extends AnyFlatSpec as intended.

spark/src/test/scala/ai/chronon/spark/test/EditDistanceTest.scala (2)

21-23: LGTM! Successful transition to AnyFlatSpec.

25-25: LGTM! Test method properly converted to FlatSpec style.

spark/src/test/scala/ai/chronon/spark/test/ChainingFetcherTest.scala (4)

39-39: LGTM!

Required import for ScalaTest migration.

49-49: LGTM!

Clean transition from JUnit TestCase to ScalaTest AnyFlatSpec.

321-326: LGTM!

Test method successfully converted to ScalaTest's behavior-driven style.

Line range hint 328-335: LGTM!

Test method successfully converted to ScalaTest's behavior-driven style.

online/src/test/scala/ai/chronon/online/test/JoinCodecTest.scala (2)

21-21: LGTM! Successfully migrated to AnyFlatSpec.

Also applies to: 23-23

24-24: LGTM! Clear test description using ScalaTest's syntax.

online/src/test/scala/ai/chronon/online/test/TagsTest.scala (1)

25-25: LGTM! Good transition to AnyFlatSpec.

The class now correctly extends AnyFlatSpec.

Also applies to: 27-27

spark/src/test/scala/ai/chronon/spark/test/GroupByUploadTest.scala (3)

33-33: LGTM: Required import for ScalaTest.

The import is correctly added while maintaining JUnit assertions.

40-40: LGTM: Proper extension of AnyFlatSpec.

The class correctly extends AnyFlatSpec for flat specification style testing.

47-47: LGTM: Test methods properly converted to ScalaTest style.

All test methods are correctly converted to use ScalaTest's "it should" syntax with clear descriptions.

Also applies to: 75-75, 116-116, 150-150

spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala (2)

30-30: LGTM: ScalaTest import added.

37-37: LGTM: Class correctly extends AnyFlatSpec.

spark/src/test/scala/ai/chronon/spark/test/ExternalSourcesTest.scala (1)

24-24: LGTM!

Required import for ScalaTest framework.

api/src/test/scala/ai/chronon/api/test/ExtensionsTest.scala (1)

30-30: Framework transition looks good!

Clean transition to AnyFlatSpec.

Also applies to: 34-34

spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1)

44-44: LGTM! Successful transition to AnyFlatSpec.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (3)

13-13: LGTM: Test style standardization

Also applies to: 16-16

33-45: Nice conversion to behavior-driven style

Test descriptions are now more readable.

Line range hint 52-89: Track ignored integration tests

Consider creating GitHub issues to track these ignored tests and the local stubbing requirement.

Want me to create GitHub issues for:

Implementing local BigQuery stubbing

Re-enabling integration tests once stubbing is available?

spark/src/test/scala/ai/chronon/spark/test/LabelJoinTest.scala (2)

29-29: LGTM! Clean migration to AnyFlatSpec.

The import and class declaration changes correctly implement the standardization to ScalaTest's AnyFlatSpec.

Also applies to: 32-32

47-47: Well-structured conversion to AnyFlatSpec syntax.

Test methods have been consistently converted while preserving their original behavior and maintaining clear, descriptive names.

Also applies to: 374-374

spark/src/test/scala/ai/chronon/spark/test/MutationsTest.scala (2)

41-41: LGTM! Test framework standardization.

The change from AnyFunSuite to AnyFlatSpec aligns with the standardization effort.

Also applies to: 51-51

452-452: Test cases successfully converted to AnyFlatSpec style.

The conversion maintains test logic while improving readability with descriptive "it should" statements.

Also applies to: 510-510, 561-561, 618-618, 687-687, 758-758, 806-806, 846-846

flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala (2)

11-12: LGTM: Required imports for ScalaTest transition

14-14: LGTM: Correct transition to AnyFlatSpec

spark/src/test/scala/ai/chronon/spark/test/FetcherTest.scala (2)

69-69: LGTM: Class declaration updated to AnyFlatSpec.

The change aligns with the standardization objective.

724-727: LGTM: Test methods converted to AnyFlatSpec style.

Successfully migrated from test() to it should ... in syntax while preserving test logic.

Also applies to: 730-736, 740-744, 747-771

spark/src/test/scala/ai/chronon/spark/test/StagingQueryTest.scala (3)

29-29: LGTM!

Clean import addition for AnyFlatSpec.

33-33: LGTM!

Clean transition to AnyFlatSpec.

43-43: LGTM!

Clean conversion to ScalaTest's "it should" style with descriptive test names.

Also applies to: 112-112, 187-187, 238-238

online/src/test/scala/ai/chronon/online/test/LRUCacheTest.scala (3)

5-5: Clean migration to ScalaTest.

Framework migration is properly implemented with correct imports and class extension.

Also applies to: 8-8

11-14: Well-structured empty cache test.

Clear description and comprehensive assertions.

16-21: Thorough cache operations test.

Complete verification of put/get operations.

spark/src/test/scala/ai/chronon/spark/test/GroupByTest.scala (1)

Line range hint 413-469: LGTM!

Helper methods are well-structured and properly encapsulated.

spark/src/test/scala/ai/chronon/spark/test/ResultValidationAbilityTest.scala (1)

Line range hint 37-53: Clean test setup and inner class implementation

Well-structured mock setup and argument handling.

spark/src/test/scala/ai/chronon/spark/test/CompareTest.scala (1)

27-27: LGTM: Clean transition to AnyFlatSpec.

Appropriate import and class extension for ScalaTest migration.

Also applies to: 31-31

spark/src/test/scala/ai/chronon/spark/test/TableUtilsFormatTest.scala (1)

22-22: LGTM: Clean transition to AnyFlatSpec.

The framework change aligns with PR objectives.

Also applies to: 26-26

online/src/test/scala/ai/chronon/online/test/stats/DriftMetricsTest.scala (3)

7-7: LGTM!

Import aligns with the transition to FlatSpec.

10-10: LGTM!

Class declaration correctly updated to FlatSpec style.

Line range hint 78-128: LGTM!

Tests successfully converted to FlatSpec style with improved readability.

cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (7)

15-15: LGTM: Required ScalaTest imports added.

35-35: LGTM: Class correctly extends AnyFlatSpec.

Line range hint 84-89: LGTM: Test properly converted to AnyFlatSpec style.

Note: Duplicate table creation test is intentionally kept as per previous feedback.

Line range hint 97-104: LGTM: Test properly converted to AnyFlatSpec style.

Line range hint 110-134: LGTM: Test properly converted to AnyFlatSpec style.

Line range hint 140-168: LGTM: Test properly converted to AnyFlatSpec style.

Line range hint 172-191: LGTM: Test properly converted to AnyFlatSpec style.

Note: Hardcoded timestamps are acceptable as per previous feedback.

spark/src/test/scala/ai/chronon/spark/test/JoinUtilsTest.scala (2)

49-49: Well-structured test descriptions!

The BDD-style test names are clear and descriptive.

Also applies to: 82-82, 131-131, 145-145, 159-159, 174-174, 189-189, 206-206, 221-221, 236-236, 274-274, 319-319, 326-326, 342-342

Line range hint 1-380: Comprehensive test coverage!

Tests cover success cases, edge cases, and error conditions thoroughly.

spark/src/test/scala/ai/chronon/spark/test/AnalyzerTest.scala (4)

31-31: LGTM: Clean transition to ScalaTest.

The change from JUnit to AnyFlatSpec is implemented correctly.

Also applies to: 35-35

Line range hint 49-82: LGTM: Test conversion maintains original logic.

The test case has been correctly converted to FlatSpec style while preserving the validation logic.

Line range hint 112-341: LGTM: Consistent test style conversion.

All test methods have been correctly converted to FlatSpec style.

Line range hint 359-450: LGTM: Helper methods remain well-structured.

The utility methods are properly maintained during the framework transition.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (1)

Line range hint 74-321: Well-structured test suite!

Good job on the comprehensive test coverage and descriptive test names.

online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (4)

38-39: LGTM: Clean transition to ScalaTest style

Class signature properly extends AnyFlatSpec and BeforeAndAfterAll.

Also applies to: 51-51

Line range hint 61-68: LGTM: Clean migration of setup code

Setup logic correctly moved to beforeAll.

Line range hint 70-97: LGTM: Well-structured test conversion

Test properly converted to ScalaTest's behavior-driven style.

Line range hint 98-130: LGTM: Complex test scenarios well-preserved

Tests maintain their comprehensive coverage while adopting ScalaTest conventions.

Also applies to: 131-159, 160-176, 177-198

spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala (1)

Line range hint 1-545: LGTM! Test coverage is comprehensive.

The transition to AnyFlatSpec maintains test quality while improving readability.

spark/src/test/scala/ai/chronon/spark/test/bootstrap/DerivationTest.scala (3)

35-35: LGTM: Clean transition to ScalaTest.

The migration from JUnit to ScalaTest's AnyFlatSpec is well-executed.

Also applies to: 42-42

Line range hint 295-367: LGTM: Well-structured test case.

Clear setup, execution, and verification steps.

Line range hint 500-625: LGTM: Well-organized test with clear scenarios.

Good use of helper methods and explicit test cases for different derivation scenarios.

online/src/test/scala/ai/chronon/online/test/FetcherCacheTest.scala (3)

20-20: LGTM: ScalaTest imports added correctly.

The required ScalaTest imports are properly added.

36-36: LGTM: Class correctly extends AnyFlatSpec.

The test class properly extends both AnyFlatSpec and MockitoHelper.

Line range hint 1-324: LGTM: Comprehensive test coverage with good error handling.

The test suite thoroughly covers success and failure scenarios, properly handles edge cases, and maintains good test isolation.

spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala (2)

31-31: LGTM: ScalaTest import added.

Clean import addition for AnyFlatSpec.

35-35: LGTM: Class extends AnyFlatSpec.

Correctly implements ScalaTest's AnyFlatSpec style.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (2)

10-13: LGTM! Correct transition to AnyFlatSpec.

Line range hint 50-83: Remove sensitive data from test code.

Replace GCS paths and configuration with mock values or load from test configuration file.

online/src/test/scala/ai/chronon/online/test/CatalystUtilHiveUDFTest.scala (1)

7-7: LGTM!

Successfully transitioned to AnyFlatSpec while maintaining the necessary SparkSQL test structures.

flink/src/test/scala/ai/chronon/flink/test/SparkExpressionEvalFnTest.scala (1)

Line range hint 11-39: LGTM!

Clean transition to AnyFlatSpec with proper ScalaTest assertions.

spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala (1)

37-37: LGTM! Clean transition to ScalaTest.

Test cases effectively validate table export functionality with proper cleanup.

Also applies to: 85-85

spark/src/test/scala/ai/chronon/spark/test/StatsComputeTest.scala (1)

32-34: LGTM! Well-structured test class.

Good use of logging and lazy initialization for SparkSession.

online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (1)

169-169: LGTM!

The test class is properly converted to ScalaTest with comprehensive test coverage.

api/src/test/scala/ai/chronon/api/test/DataTypeConversionTest.scala (1)

23-23: Clean conversion to AnyFlatSpec.

Test logic preserved while standardizing framework.

Also applies to: 27-27, 29-29

spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala (1)

26-26: Well-structured conversion to AnyFlatSpec.

Clear test descriptions with preserved functionality.

Also applies to: 33-33, 45-45, 52-52, 64-64
spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionTest.scala (3)

41-41: Framework transition looks good!

Clean transition from JUnit to ScalaTest's AnyFlatSpec.

Also applies to: 76-76

446-453: Test method conversion is well-structured!

Good use of ScalaTest's behavior-driven syntax with clear test descriptions.

319-320: Verify the assertion change impact.

Changing from assert to require modifies error behavior from AssertionError to IllegalArgumentException.

✅ Verification successful

Change from assert to require is appropriate

The modification aligns with the codebase's pattern where require is used for input validation and preconditions, while assert is primarily used for test assertions. No other tests depend on this specific assertion pattern.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for similar assertion patterns in other test files
rg "assert.*metaData\.name.*==" -A 2
Length of output: 38

Script:
#!/bin/bash
# Find test files
fd "Test.scala$"

# Search for assert and require patterns in the codebase
rg "assert\(" -A 1
rg "require\(" -A 1
Length of output: 50320
spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (2)

42-42: LGTM! Class transition to AnyFlatSpec.

The changes correctly implement the transition from AnyFunSuite to AnyFlatSpec.

Also applies to: 57-57

Line range hint 812-862: Address the commented test case.

The commented test case for verifying frontfill behavior should be revisited.

Run this script to check if there are other similar TODOs:
spark/src/test/scala/ai/chronon/spark/test/MigrationCompareTest.scala (4)

31-31: LGTM: Required import for AnyFlatSpec.

33-33: LGTM: Class now extends AnyFlatSpec.

Line range hint 98-105: LGTM: Test converted to FlatSpec style.

Migrated from @Test to it should syntax while preserving test logic.

Line range hint 134-150: Verify date range consistency.

The staging query filters for monthAgo to today, but the comparison job uses ninetyDaysAgo. This mismatch appears intentional as it tests "less data" scenario, but should be documented.

Add a comment explaining the intentional date range mismatch:
+    // Intentionally using different date ranges to test data comparison with partial overlap
     val stagingQueryConf = Builders.StagingQuery(
✅ Verification successful

Add a concise comment explaining the date range mismatch.
+    // Testing comparison with partial data overlap
     val stagingQueryConf = Builders.StagingQuery(
🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found

Script:
#!/bin/bash
# Search for test files using these date variables
rg -l "monthAgo|ninetyDaysAgo|today" --type scala "src/test"

# Look for the definition of these date variables
rg "(val|def)\s+(monthAgo|ninetyDaysAgo|today)\s*=" --type scala "src/test"

# Find other test cases with "migrate" and "compare" to see similar patterns
rg "should.*migrate.*compare" --type scala "src/test"
Length of output: 337

Script:
#!/bin/bash
# Find the test file and related ones
fd -e scala . spark/src/test

# Look for date variable definitions with context
rg -B 2 -A 2 "(val|def)\s+(monthAgo|ninetyDaysAgo|today)\s*=" spark/src/test

# Find similar test patterns with context
rg -B 2 -A 2 "should.*migrate.*compare" spark/src/test
Length of output: 17930
online/src/test/scala/ai/chronon/online/test/DataStreamBuilderTest.scala (5)

34-34: LGTM: Required ScalaTest import added.

38-38: LGTM: Correct extension of AnyFlatSpec.

Line range hint 50-63: LGTM: Clean conversion to ScalaTest style.

Test logic preserved while adopting the more readable it should syntax.

Line range hint 66-71: LGTM: Proper conversion of topic info parsing test.

Test cases maintained while transitioning to ScalaTest format.

Line range hint 34-71: Verify test execution in CI pipeline.

Ensure the converted tests run successfully in the CI environment.

✅ Verification successful

Test execution is properly configured in CI pipeline

The test is configured to run in the CI environment with:

Dedicated non-spark test workflow

Proper memory allocation (8GB heap)

Correct test framework (ScalaTest 3.2.19)

Standard local Spark session setup
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check CI logs for test execution status
gh api repos/zipline-ai/chronon/actions/runs \
  --jq '.workflow_runs[] | select(.head_sha=="$GITHUB_SHA") | .conclusion'
Length of output: 349

Script:
#!/bin/bash
# Find build and CI config files
echo "=== Build Files ==="
fd -e sbt -e scala -e xml "build|pom" --type f

echo -e "\n=== CI Workflow Files ==="
fd -e yml -e yaml ".github/workflows" --type f

echo -e "\n=== Test Framework Config ==="
rg -A 5 "scalatest" --type scala --type sbt
Length of output: 1176

Script:
#!/bin/bash
# Check build config and CI files
echo "=== Build Config ==="
cat build.sbt

echo -e "\n=== CI Workflows ==="
fd . '.github/workflows' --type f --exec cat

echo -e "\n=== Test Dependencies ==="
rg "libraryDependencies" build.sbt -A 10
Length of output: 44949
aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxPercentilesTest.scala (2)

23-23: LGTM: Clean migration to ScalaTest.

Correctly retains JUnit assertions while adopting ScalaTest structure.

Also applies to: 29-29

Line range hint 77-95: LGTM: Clean test conversion.

aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala (1)

8-8: LGTM: Excellent ScalaTest migration.

Clean and consistent conversion of all test methods.

Also applies to: 13-13, 14-121

aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxHistogramTest.scala (1)

6-6: LGTM: Perfect ScalaTest migration.

Excellent test naming and consistent style throughout.

Also applies to: 11-11, 12-132

coderabbitai · 2025-01-14T14:48:31Z

spark/src/test/scala/ai/chronon/spark/test/CompareTest.scala

@@ -85,8 +84,7 @@ class CompareTest {
    }
  }

-  @Test
-  def mappingTest(): Unit = {
+  it should "mapping" in {


🛠️ Refactor suggestion

DRY: Extract common test logic.

The mapping test duplicates verification logic from the basic test.

def verifyResults(result: DataMetrics, leftData: Seq[_], rightData: Seq[_], leftColumns: Seq[String], keys: Seq[String]): Unit = { assert(result.series.length == 4, "Invalid result length") for { rowIndex <- 0 until leftData.length (colName, index) <- leftColumns.zipWithIndex if !keys.contains(colName) } { val actualMismatchCount = result.series(rowIndex)._2(s"${colName}_mismatch_sum") val expected = if (leftData(rowIndex).productElement(index) == rightData(rowIndex).productElement(index)) 0 else 1 assert(actualMismatchCount == expected, s"Expected ${if (expected == 0) "no" else "one"} mismatch") } }

Also applies to: 123-123

coderabbitai · 2025-01-14T14:48:32Z

spark/src/test/scala/ai/chronon/spark/test/KafkaStreamBuilderTest.scala

+  it should "kafka stream does not exist" in {
    val topicInfo = TopicInfo.parse("kafka://test_topic/schema=my_schema/host=X/port=Y")
    KafkaStreamBuilder.from(topicInfo)(spark, Map.empty)
  }


⚠️ Potential issue

Add proper exception assertion.

The test expects a RuntimeException but doesn't assert it. Use ScalaTest's exception testing syntax.

- it should "kafka stream does not exist" in { - val topicInfo = TopicInfo.parse("kafka://test_topic/schema=my_schema/host=X/port=Y") - KafkaStreamBuilder.from(topicInfo)(spark, Map.empty) + it should "throw RuntimeException when kafka stream does not exist" in { + val topicInfo = TopicInfo.parse("kafka://test_topic/schema=my_schema/host=X/port=Y") + assertThrows[RuntimeException] { + KafkaStreamBuilder.from(topicInfo)(spark, Map.empty) + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

it should "kafka stream does not exist" in {

val topicInfo = TopicInfo.parse("kafka://test_topic/schema=my_schema/host=X/port=Y")

KafkaStreamBuilder.from(topicInfo)(spark, Map.empty)

}

it should "throw RuntimeException when kafka stream does not exist" in {

val topicInfo = TopicInfo.parse("kafka://test_topic/schema=my_schema/host=X/port=Y")

assertThrows[RuntimeException] {

KafkaStreamBuilder.from(topicInfo)(spark, Map.empty)

}

}

coderabbitai · 2025-01-14T14:48:32Z

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala

 import org.junit.Before
 import org.junit.Rule
-import org.junit.Test
 import org.junit.runner.RunWith


🛠️ Refactor suggestion

Remove JUnit remnants.

Clean up redundant JUnit imports and @RunWith annotation after ScalaTest transition.

-import org.junit.runner.RunWith -import org.junit.runners.JUnit4 -@RunWith(classOf[JUnit4])

Also applies to: 35-35

coderabbitai · 2025-01-14T14:48:32Z

spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala

@@ -77,8 +77,7 @@ class TableBootstrapTest {
    (bootstrapPart, bootstrapDf)
  }

-  @Test
-  def testBootstrap(): Unit = {
+  it should "bootstrap" in {


💡 Codebase verification

Migration to ScalaTest is incomplete: JUnit assertions still in use

Replace JUnit assertions with ScalaTest style:

import org.junit.Assert.assertEquals → Use shouldEqual or should equal

import org.junit.Assert.assertFalse → Use shouldBe false

🔗 Analysis chain

LGTM: Test methods converted to FlatSpec style.

Successfully migrated from JUnit's @test to ScalaTest's "it should" syntax.

Let's verify no JUnit annotations remain:

Also applies to: 163-163

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for any remaining JUnit annotations rg "@Test" spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala

Length of output: 88

Script:

#!/bin/bash # Check for JUnit imports and annotations rg -i "junit" spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala # Check for ScalaTest imports and structure rg "org.scalatest" spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala

Length of output: 303

coderabbitai · 2025-01-14T14:48:32Z

aggregator/src/test/scala/ai/chronon/aggregator/test/VarianceTest.scala

@@ -60,7 +60,7 @@ class VarianceTest extends TestCase {
    assertTrue((naiveResult - welfordResult) / naiveResult < 0.0000001)
  }

-  def testVariance: Unit = {
+  it should "variance: unit = {" in {


⚠️ Potential issue

Fix syntax error in test description

The test description contains invalid syntax.

- it should "variance: unit = {" in { + it should "test variance" in {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

it should "variance: unit = {" in {

it should "test variance" in {

flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (1)
Line range hint 81-91: Consider removing duplicate table creation test.

Based on the retrieved learning from PR #33, DynamoDB inherently prevents duplicate table creation, making this test redundant.
-    // try another create for an existing table, should not fail
-    kvStore.create(dataset, props)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (1)

Line range hint 41-44: Consider ScalaTest alternative for BigtableEmulatorRule.

Replace JUnit @rule with ScalaTest's withFixture or a custom trait for better integration.
flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (1)
59-62: Add parentheses to afterAll method.

Method should be defined as afterAll() for consistency.
-  override def afterAll: Unit = {
+  override def afterAll(): Unit = {
scripts/codemod/test_replace.py (2)
48-53: Add error handling for malformed test descriptions.

The quote extraction could fail if the test description is malformed.
     if ("ignore(" in line or "test(" in line) and "{" in line:
         start = line.find('"')
         end = line.find('"', start + 1)
-        if start != -1 and end != -1:
+        if start == -1 or end == -1:
+            print(f"Warning: Malformed test description in line: {line.strip()}")
+            modified_lines.append(line)
+            continue
+        else:
             test_desc = line[start + 1 : end]  # Get description without quotes
199-204: Specify file encoding explicitly.

Add explicit UTF-8 encoding to file operations.
-    with open(file_path, "r") as file:
+    with open(file_path, "r", encoding="utf-8") as file:
         lines = file.readlines()
         converted = handler(lines, test_name)

-    with open(file_path, "w") as file:
+    with open(file_path, "w", encoding="utf-8") as file:
         file.write(converted)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4291b99 and 6840dc9.

📒 Files selected for processing (5)

cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (8 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (13 hunks)
flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (3 hunks)
scripts/codemod/test_replace.py (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/ResultValidationAbilityTest.scala (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

spark/src/test/scala/ai/chronon/spark/test/ResultValidationAbilityTest.scala

🧰 Additional context used

📓 Learnings (1)

cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (3)

Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:82-89
Timestamp: 2024-11-12T09:38:33.532Z
Learning: DynamoDB does not create duplicate tables when attempting to create a table that already exists, so checking for duplicates in tests is redundant.

Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-11-12T09:38:33.532Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-11-12T09:38:33.532Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.

🪛 Ruff (0.8.2)

scripts/codemod/test_replace.py

23-23: Found useless expression. Either assign it to a variable or remove it.

(B018)

23-23: Undefined name zz

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: mutation_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: join_spark_tests
GitHub Check: other_spark_tests

🔇 Additional comments (12)

cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (4)

13-14: LGTM: ScalaTest imports added correctly.

The new imports support the transition to ScalaTest's AnyFlatSpec style.

34-34: LGTM: Class properly extends ScalaTest traits.

The class correctly extends both AnyFlatSpec and BeforeAndAfterAll.

Line range hint 58-78: LGTM: Setup/teardown methods properly converted.

The JUnit @Before/@after methods are correctly converted to ScalaTest's beforeAll/afterAll.

Line range hint 169-189: LGTM: Hardcoded timestamps are acceptable.

Based on the retrieved learning from PR #33, using hardcoded timestamps in this test is acceptable since data generation and querying occur within the same time range.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (2)

19-20: Remove JUnit remnants.

Clean up redundant JUnit imports and @RunWith annotation after ScalaTest transition.

Also applies to: 35-35

Line range hint 73-300: Test conversion looks good!

The conversion to ScalaTest's it should style is clean and maintains the original test semantics.

flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (3)

16-17: LGTM! Correct trait choice.

BeforeAndAfterAll is the appropriate trait for the beforeAll/afterAll methods used.

22-22: LGTM! Proper trait composition.

Correctly extends AnyFlatSpec with BeforeAndAfterAll.

Line range hint 64-100: LGTM! Clean conversion to ScalaTest style.

Tests properly converted to "it should" syntax while preserving functionality.

scripts/codemod/test_replace.py (3)

24-29: LGTM!

80-111: LGTM!

Handles edge cases well, including acronyms in camel case.

179-183: Consider additional test method patterns.

The function only converts methods starting with "def test". Consider handling other test method patterns.

coderabbitai · 2025-01-14T15:01:37Z

scripts/codemod/test_replace.py

+
+
+if __name__ == "__main__":
+    test_files = glob.glob("**/*Test.scala", recursive=True)


⚠️ Potential issue

Exclude Vertx test directories.

As per PR objectives, Vertx tests should be excluded from conversion.

- test_files = glob.glob("**/*Test.scala", recursive=True) + test_files = [f for f in glob.glob("**/*Test.scala", recursive=True) + if "vertx" not in f.lower()]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

test_files = glob.glob("**/*Test.scala", recursive=True)

test_files = [f for f in glob.glob("**/*Test.scala", recursive=True)

if "vertx" not in f.lower()]

coderabbitai · 2025-01-14T15:01:37Z

scripts/codemod/test_replace.py

+
+USAGE: python3 scripts/codemod/test_replace.py
+"""
+zz


⚠️ Potential issue

Remove invalid code.

The line contains an undefined variable zz that serves no purpose.

-zz

🧰 Tools

🪛 Ruff (0.8.2)

23-23: Found useless expression. Either assign it to a variable or remove it.

(B018)

23-23: Undefined name zz

(F821)

coderabbitai · 2025-01-14T15:01:37Z

scripts/codemod/test_replace.py

+        if "import org.junit.Test" in line:
+            modified_lines.append("import org.scalatest.flatspec.AnyFlatSpec\n")
+            continue


💡 Codebase verification

Additional JUnit imports require conversion to ScalaTest

The codebase contains multiple JUnit imports that need ScalaTest equivalents:

org.junit.Assert.* → ScalaTest matchers

org.junit.Before/After → beforeEach/afterEach

org.junit.runner.* → ScalaTest runner configuration

🔗 Analysis chain

Consider handling additional JUnit imports.

The function only replaces org.junit.Test. Consider handling other JUnit imports like Assert, Before, After, etc.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for other JUnit imports that might need conversion rg "import org.junit" | grep -v "import org.junit.Test"

Length of output: 10247

coderabbitai

Actionable comments posted: 3

🔭 Outside diff range comments (1)

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (1)
Line range hint 41-44: Convert JUnit Rule to ScalaTest.

Replace JUnit @rule with ScalaTest's withFixture for managing the BigTable emulator.
-  // ugly hacks to wire up the emulator
-  val _bigtableEmulator: BigtableEmulatorRule = BigtableEmulatorRule.create()
-  @Rule
-  def bigtableEmulator: BigtableEmulatorRule = _bigtableEmulator
+  private val bigtableEmulator: BigtableEmulatorRule = BigtableEmulatorRule.create()

🧹 Nitpick comments (4)

flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (2)
22-22: Consider using BeforeAndAfter instead of BeforeAndAfterAll.

BeforeAndAfter is more appropriate for per-test setup/teardown.
-class FlinkJobIntegrationTest extends AnyFlatSpec with BeforeAndAfterAll {
+class FlinkJobIntegrationTest extends AnyFlatSpec with BeforeAndAfter {
64-64: Make test descriptions more descriptive.

Current descriptions are too terse. Consider more descriptive behavior specifications.
-  it should "flink job end to end" in {
+  it should "process events through the complete Flink pipeline" in {

-  it should "tiled flink job end to end" in {
+  it should "correctly aggregate events using tiled processing" in {
Also applies to: 100-100
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (1)
240-240: Improve test descriptions.

Replace underscores with spaces in test descriptions for better readability.
-  it should "time series query_multiple days" in {
+  it should "handle time series query for multiple days" in {
-  it should "multiple dataset time series query_one day" in {
+  it should "handle multiple dataset time series query for one day" in {
-  it should "multiple dataset time series query_same day" in {
+  it should "handle multiple dataset time series query for same day" in {
-  it should "multiple dataset time series query_days without data" in {
+  it should "handle multiple dataset time series query for days without data" in {
Also applies to: 260-260, 280-280, 300-300
scripts/codemod/test_replace.py (1)
232-274: Add dry-run mode and progress tracking.

Add options to preview changes and track conversion progress.
+import argparse
+
 if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--dry-run", action="store_true", help="Preview changes without modifying files")
+    args = parser.parse_args()
+
     test_files = glob.glob("**/*Test.scala", recursive=True)
+    total_files = len(test_files)
+    converted = 0

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4291b99 and 6840dc9.

📒 Files selected for processing (5)

cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (8 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (13 hunks)
flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (3 hunks)
scripts/codemod/test_replace.py (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/ResultValidationAbilityTest.scala (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

spark/src/test/scala/ai/chronon/spark/test/ResultValidationAbilityTest.scala
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala

🧰 Additional context used

🪛 Ruff (0.8.2)

scripts/codemod/test_replace.py

23-23: Found useless expression. Either assign it to a variable or remove it.

(B018)

23-23: Undefined name zz

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: fetcher_spark_tests
GitHub Check: join_spark_tests

🔇 Additional comments (6)

flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (2)

16-17: LGTM: ScalaTest imports added correctly.

54-62: Use before/after instead of beforeAll/afterAll.

Match the lifecycle hooks with BeforeAndAfter trait.
-  override def beforeAll(): Unit = {
+  before {
     flinkCluster.before()
     CollectSink.values.clear()
-  }
+  }

-  override def afterAll: Unit = {
+  after {
     flinkCluster.after()
     CollectSink.values.clear()
-  }
+  }
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (2)

18-19: Remove JUnit remnants.

Clean up redundant JUnit imports and @RunWith annotation after ScalaTest transition.
-import org.junit.runner.RunWith
-import org.junit.runners.JUnit4
-@RunWith(classOf[JUnit4])
Also applies to: 35-35

Line range hint 321-360: LGTM!

Helper methods are well-structured and properly use ScalaTest assertions.
scripts/codemod/test_replace.py (2)

1-22: LGTM! Clear docstring explaining the purpose.

The docstring effectively communicates the script's purpose and usage instructions.

31-77: Add error handling for malformed test descriptions.

The string parsing logic assumes quotes are present and well-formed.
     if ("ignore(" in line or "test(" in line) and "{" in line:
         start = line.find('"')
         end = line.find('"', start + 1)
-        if start != -1 and end != -1:
+        if start == -1 or end == -1:
+            print(f"Warning: Malformed test description in line: {line.strip()}")
+            modified_lines.append(line)
+            continue
+        try:
             test_desc = line[start + 1 : end]  # Get description without quotes
             words = test_desc.split()
✅ Verification successful

Approve error handling addition for robustness

The suggested error handling is a good defensive programming practice for the codemod tool, even though no malformed test descriptions were found in the current codebase.
🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found

Script:
#!/bin/bash
# Find test files using FunSuite
rg -l "extends AnyFunSuite"

# Look for test/ignore patterns to analyze description formats
rg 'test\(|ignore\(' -A 1

# Find similar codemod scripts
fd 'test.*\.py' scripts/codemod/
Length of output: 1652

coderabbitai · 2025-01-14T15:04:22Z

scripts/codemod/test_replace.py

+
+USAGE: python3 scripts/codemod/test_replace.py
+"""
+zz


⚠️ Potential issue

Remove invalid code.

The line contains an undefined variable 'zz' that serves no purpose.

-zz

🧰 Tools

🪛 Ruff (0.8.2)

23-23: Found useless expression. Either assign it to a variable or remove it.

(B018)

23-23: Undefined name zz

(F821)

coderabbitai · 2025-01-14T15:04:22Z

scripts/codemod/test_replace.py

+def convert(handler, file_path):
+    test_name = get_test_class_name(file_path)
+    with open(file_path, "r") as file:
+        lines = file.readlines()
+        converted = handler(lines, test_name)
+
+    with open(file_path, "w") as file:
+        file.write(converted)
+
+    print(f"Converted {file_path}")
+


🛠️ Refactor suggestion

Add backup before file modification.

Create a backup before overwriting files to prevent data loss.

def convert(handler, file_path): test_name = get_test_class_name(file_path) with open(file_path, "r") as file: lines = file.readlines() converted = handler(lines, test_name) + # Create backup + import shutil + backup_path = file_path + ".bak" + shutil.copy2(file_path, backup_path) + with open(file_path, "w") as file: file.write(converted)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def convert(handler, file_path):

test_name = get_test_class_name(file_path)

with open(file_path, "r") as file:

lines = file.readlines()

converted = handler(lines, test_name)

with open(file_path, "w") as file:

file.write(converted)

print(f"Converted {file_path}")

def convert(handler, file_path):

test_name = get_test_class_name(file_path)

with open(file_path, "r") as file:

lines = file.readlines()

converted = handler(lines, test_name)

# Create backup

import shutil

backup_path = file_path + ".bak"

shutil.copy2(file_path, backup_path)

with open(file_path, "w") as file:

file.write(converted)

print(f"Converted {file_path}")

coderabbitai · 2025-01-14T15:04:22Z

scripts/codemod/test_replace.py

+def convert_junit_to_flatspec(lines, test_name):
+    modified_lines = []
+    is_test_method = False
+    class_modified = False
+
+    for line in lines:
+        # Replace JUnit import with FlatSpec import
+        if "import org.junit.Test" in line:
+            modified_lines.append("import org.scalatest.flatspec.AnyFlatSpec\n")
+            continue
+
+        # Handle class definition
+        if "class" in line and "Test" in line and (not class_modified):
+            class_modified = True
+            class_name = line.split("class")[1].split("{")[0].strip()
+            modified_lines.append(f"class {class_name} extends AnyFlatSpec {{\n")
+            continue
+
+        # Mark start of a test method
+        if "@Test" in line:
+            is_test_method = True
+            continue
+
+        # Convert only test methods marked with @Test and not private
+        if (
+            is_test_method
+            and "def " in line
+            and "private" not in line
+            and (("(): Unit" in line) or ("): Unit" not in line))
+        ):
+            is_test_method = False
+
+            method_name = line.split("def ")[1].split("(")[0]
+
+            test_description = " ".join(split_camel_case(method_name))
+
+            modified_lines.append(f'  it should "{test_description}" in {{\n')
+            continue
+
+        is_test_method = False
+        modified_lines.append(line)
+
+    return "".join(modified_lines)
+


💡 Codebase verification

Simplify return type check while adding modifier guards

if ( is_test_method and "def " in line and "private" not in line - and (("(): Unit" in line) or ("): Unit" not in line)) + and "): Unit" in line + and not any(x in line.lower() for x in ["abstract", "override"]) ):

Adding abstract/override checks is good defensive programming

Simplified Unit return type check matches codebase convention

🔗 Analysis chain

Handle potential edge cases in JUnit conversion.

The method assumes specific format for test methods.

if ( is_test_method and "def " in line and "private" not in line - and (("(): Unit" in line) or ("): Unit" not in line)) + and not any(x in line.lower() for x in ["abstract", "override"]) ):

🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found

Script:

#!/bin/bash # Find files that might be JUnit tests echo "=== Looking for potential JUnit test files ===" rg -l "org.junit.Test" echo -e "\n=== Checking for test methods with abstract/override modifiers ===" rg "@Test" -A 2 | rg -i "abstract|override" echo -e "\n=== Analyzing test method signatures ===" rg "@Test" -A 2 | rg "def.*\("

Length of output: 5643

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (1)
Line range hint 125-159: Parameterize hardcoded values.

Consider making table, topic, window size, and aggregation operation configurable.
-  def makeGroupBy(keyColumns: Seq[String], filters: Seq[String] = Seq.empty): GroupBy =
+  def makeGroupBy(
+    keyColumns: Seq[String],
+    filters: Seq[String] = Seq.empty,
+    table: String = "events.my_stream_raw",
+    topic: String = "events.my_stream",
+    windowDays: Int = 1,
+    operation: Operation = Operation.SUM
+  ): GroupBy =

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 6840dc9 and 346b38d.

📒 Files selected for processing (6)

cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (8 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (13 hunks)
flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (3 hunks)
flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (1 hunks)
online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (9 hunks)
spark/src/test/scala/ai/chronon/spark/test/ResultValidationAbilityTest.scala (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
spark/src/test/scala/ai/chronon/spark/test/ResultValidationAbilityTest.scala
flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: other_spark_tests
GitHub Check: join_spark_tests
GitHub Check: no_spark_scala_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: mutation_spark_tests

🔇 Additional comments (1)

flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (1)

Line range hint 125-159: LGTM!

Clean integration with existing test utilities.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

.github/workflows/test_scala_no_spark.yaml (1)
51-51: Document the need for --add-opens.

The JVM flag suggests reflection access to NIO channels. Add a comment explaining why this is needed.
-          export SBT_OPTS="-Xmx24G -Xms4G --add-opens=java.base/sun.nio.ch=ALL-UNNAMED"
+          # Required for Flink's NIO operations in newer Java versions
+          export SBT_OPTS="-Xmx24G -Xms4G --add-opens=java.base/sun.nio.ch=ALL-UNNAMED"

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 346b38d and 0ea7842.

📒 Files selected for processing (3)

.github/workflows/test_scala_no_spark.yaml (1 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/VarianceTest.scala (2 hunks)
flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

aggregator/src/test/scala/ai/chronon/aggregator/test/VarianceTest.scala
flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: other_spark_tests
GitHub Check: join_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: no_spark_scala_tests
GitHub Check: fetcher_spark_tests
GitHub Check: scala_compile_fmt_fix

🔇 Additional comments (1)

.github/workflows/test_scala_no_spark.yaml (1)

51-51: Verify memory requirements for Flink tests.

3x memory increase for Flink tests. Ensure this is necessary and won't strain CI resources.

✅ Verification successful

Memory allocation aligns with established patterns.

The 24G memory setting matches the allocation used for other resource-intensive jobs across workflows, including Spark. This is consistent with the project's resource management strategy.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check historical test runs for OOM issues
gh run list --workflow=test_scala_no_spark.yaml --json conclusion,name,url --jq '.[] | select(.conclusion=="failure") | select(.name | contains("Flink"))' | head -n 5
Length of output: 310

Script:
#!/bin/bash
# Check memory settings across workflow files
fd -e yaml -e yml workflow .github/workflows --exec grep -l "SBT_OPTS.*-Xmx" {} \; -exec echo "=== {}" \; -exec grep "SBT_OPTS.*-Xmx" {} \;

# Check Flink-specific memory settings
fd build.sbt -x grep -l "flink" {} \; -exec echo "=== {}" \; -exec cat {} \;
Length of output: 933

Script:
#!/bin/bash
# Find workflow files with memory settings
find .github/workflows -type f -name "*.y*ml" -exec grep -l "SBT_OPTS.*Xmx" {} \;

# Show memory settings in found files
find .github/workflows -type f -name "*.y*ml" -exec sh -c 'echo "=== $1"; grep "SBT_OPTS.*Xmx" "$1"' sh {} \;

# Find and show Flink-related build settings
find . -name "build.sbt" -type f -exec sh -c 'echo "=== $1"; cat "$1"' sh {} \;
Length of output: 21129

tchow-zlai

LGTM

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

spark/src/test/scala/ai/chronon/spark/test/AnalyzerTest.scala (1)
Add test coverage for skewDetection functionality.

While skewDetection is enabled in several tests, there's no dedicated test verifying the heavy hitter analysis output. Add tests to validate data skew detection using the heavyHittersWithTsAndCount method.

🔗 Analysis chain

Line range hint 35-366: Add test coverage for skewDetection.

While the parameter is used in multiple tests, there's no explicit test verifying its functionality.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check if skewDetection is tested elsewhere
rg -l "skewDetection.*=.*true" --type scala
Length of output: 107

Script:
#!/bin/bash
# Find skewDetection implementation
ast-grep --pattern 'class Analyzer {
  $$$
  def analyzeJoin($$$) {
    $$$
  }
  $$$
}'
Length of output: 90

Script:
#!/bin/bash
# Find skewDetection implementation details
rg -A 10 "skewDetection" --type scala
Length of output: 10361

🧹 Nitpick comments (2)

spark/src/test/scala/ai/chronon/spark/test/AnalyzerTest.scala (2)

49-49: Consider standardizing test descriptions.

The test descriptions vary in style:

Some start with verbs: "throw on validation failure"

Some are more descriptive: "join analyzer check timestamp has values"
Consider adopting a consistent style across all tests.

Also applies to: 84-84, 114-114, 152-152, 218-218, 249-249, 281-281, 313-313, 331-331, 350-350

73-73: Standardize skewDetection parameter usage.

The skewDetection parameter is used inconsistently across Analyzer instantiations. Some calls include it while others don't.

Also applies to: 109-109, 147-147, 244-244, 276-276, 308-308, 345-345, 364-364

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 0ea7842 and 0fef86d.

📒 Files selected for processing (5)

spark/src/main/scala/ai/chronon/spark/Analyzer.scala (5 hunks)
spark/src/main/scala/ai/chronon/spark/stats/CompareJob.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/AnalyzerTest.scala (13 hunks)
spark/src/test/scala/ai/chronon/spark/test/KafkaStreamBuilderTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/LabelJoinTest.scala (11 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

spark/src/test/scala/ai/chronon/spark/test/LabelJoinTest.scala

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: other_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: join_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: no_spark_scala_tests
GitHub Check: fetcher_spark_tests

🔇 Additional comments (6)

spark/src/test/scala/ai/chronon/spark/test/KafkaStreamBuilderTest.scala (2)

23-23: LGTM! Clean transition to AnyFlatSpec.

Proper import and class extension for ScalaTest.

Also applies to: 25-25

29-33: LGTM! Proper exception testing.

Clean implementation using ScalaTest's intercept pattern.

spark/src/main/scala/ai/chronon/spark/Analyzer.scala (2)

90-90: LGTM! Parameter rename improves clarity.

The rename from enableHitter to skewDetection better describes the parameter's purpose.

531-531: Method rename and visibility change improves encapsulation.

Good changes:

Making the method private improves encapsulation

Camel case naming follows Scala conventions

Null handling is more idiomatic using Option

Also applies to: 594-603
spark/src/main/scala/ai/chronon/spark/stats/CompareJob.scala (1)

101-101: Verify impact of disabled skew detection.

The parameter rename from enableHitter to skewDetection is more descriptive. However, let's verify:

Impact of disabling skew detection

Consistency with other Analyzer usages

✅ Verification successful

Parameter rename improves clarity, disabling skew detection is appropriate.

The change from enableHitter to skewDetection makes the parameter more descriptive. Disabling it in CompareJob is correct as:

Schema validation doesn't require skew detection

No impact on core validation functionality

Improves performance by skipping unnecessary analysis
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check other Analyzer instantiations
ast-grep --pattern 'new Analyzer($$$, skewDetection = $_)'
Length of output: 1201

Script:
#!/bin/bash
# Check Analyzer class implementation
ast-grep --pattern 'class Analyzer($$$) {
  $$$
}'
Length of output: 55253
spark/src/test/scala/ai/chronon/spark/test/AnalyzerTest.scala (1)

31-31: LGTM! Clean transition to ScalaTest.

The import and class extension changes align with the PR objective of standardizing tests to AnyFlatSpec.

Also applies to: 35-35

coderabbitai · 2025-01-15T04:20:44Z

spark/src/main/scala/ai/chronon/spark/Analyzer.scala

+          analyzeJoin(joinConf, enableHitter = skewDetection)
        } else if (confPath.contains("/group_bys/")) {
          val groupByConf = parseConf[api.GroupBy](confPath)
-          analyzeGroupBy(groupByConf, enableHitter = enableHitter)
+          analyzeGroupBy(groupByConf, enableHitter = skewDetection)
        }
-      case groupByConf: api.GroupBy => analyzeGroupBy(groupByConf, enableHitter = enableHitter)
-      case joinConf: api.Join       => analyzeJoin(joinConf, enableHitter = enableHitter)
+      case groupByConf: api.GroupBy => analyzeGroupBy(groupByConf, enableHitter = skewDetection)
+      case joinConf: api.Join       => analyzeJoin(joinConf, enableHitter = skewDetection)


💡 Codebase verification

Parameter rename is inconsistently applied

The rename from enableHitter to skewDetection is incomplete. The parameter is still referenced as enableHitter in multiple locations:

Driver.scala: Parameter definition and usage

Analyzer.scala: Multiple method signatures and internal logic

🔗 Analysis chain

Verify parameter rename impact.

The parameter rename from enableHitter to skewDetection is consistently applied across all method calls.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Verify no remaining references to enableHitter rg "enableHitter" --type scala

Length of output: 1532

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 54d4eec and 3ba567c.

📒 Files selected for processing (1)

flink/src/test/scala/ai/chronon/flink/test/SparkExpressionEvalFnTest.scala (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: other_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: no_spark_scala_tests
GitHub Check: join_spark_tests

🔇 Additional comments (2)

flink/src/test/scala/ai/chronon/flink/test/SparkExpressionEvalFnTest.scala (2)

10-14: Clean transition to AnyFlatSpec.

29-31: Verify if classloader check disable is necessary.

Why is the classloader check being disabled?

coderabbitai · 2025-01-15T04:53:13Z

flink/src/test/scala/ai/chronon/flink/test/SparkExpressionEvalFnTest.scala

+    sparkExprEval.close()
+    env.close()


💡 Codebase verification

Resource cleanup inconsistency detected

Other test files using StreamExecutionEnvironment don't properly close their resources:

FlinkJobIntegrationTest.scala

AsyncKVStoreWriterTest.scala

🔗 Analysis chain

Good resource cleanup.

Proper cleanup of resources.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check if other test files follow similar cleanup pattern rg -l "extends AnyFlatSpec" | xargs rg "env\.close"

Length of output: 148

Script:

#!/bin/bash # Find test files using StreamExecutionEnvironment rg "StreamExecutionEnvironment" "src/test" # Check sparkExprEval cleanup pattern rg "sparkExprEval.*close" "src/test"

Length of output: 179

Script:

#!/bin/bash # Find test files using StreamExecutionEnvironment rg "StreamExecutionEnvironment" "flink/src/test" # Check sparkExprEval cleanup pattern rg "sparkExprEval.*close" "flink/src/test"

Length of output: 1856

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)

379-391: Consider using StringBuilder for better memory efficiency

Replace StringWriter with StringBuilder to reduce memory overhead in stack trace processing.

 private def cleanStackTrace(throwable: Throwable): String = {
-    val sw = new StringWriter()
-    val pw = new PrintWriter(sw)
-    throwable.printStackTrace(pw)
-    val stackTraceString = sw.toString
+    val sb = new StringBuilder()
+    throwable.getStackTrace
+      .filter(_.getClassName.contains("chronon"))
+      .map(_.toString
+        .replace("ai.chronon.spark.test.", "")
+        .replace("ai.chronon.spark.", ""))
+      .foreach(line => sb.append("    ").append(line).append('\n'))
-    "    " + stackTraceString
-      .split("\n")
-      .filter(_.contains("chronon"))
-      .map(_.replace("at ai.chronon.spark.test.", "").replace("at ai.chronon.spark.", "").stripLeading())
-      .mkString("\n    ")
+    sb.toString()
 }

spark/src/main/scala/ai/chronon/spark/Analyzer.scala (3)

Line range hint 210-257: Improve logging readability.

Consider extracting the repeated logging patterns into helper methods to reduce duplication and improve maintainability.

+  private def logAnalysisResult(name: String, analysis: String, keySchema: Seq[String], outputSchema: String): Unit = {
+    logger.info(s"""
+         |----- KEY SCHEMA -----
+         |${keySchema.mkString("\n")}
+         |----- OUTPUT SCHEMA -----
+         |$outputSchema
+         |------ END --------------
+         |""".stripMargin)
+  }

Line range hint 278-400: Consolidate validation error handling.

The validation error handling logic is scattered and repetitive. Consider extracting it into a dedicated method.

+  private def handleValidationErrors(
+    keysWithError: Seq[(String, String)],
+    noAccessTables: Set[String],
+    dataAvailabilityErrors: Seq[(String, String, String)],
+    validationAssert: Boolean,
+    hasBootstrapParts: Boolean
+  ): Unit = {
+    if (keysWithError.isEmpty && noAccessTables.isEmpty && dataAvailabilityErrors.isEmpty) {
+      logger.info("-- Backfill validation completed. No errors found. --".green)
+    }
+    
+    if (validationAssert) {
+      if (hasBootstrapParts) {
+        assert(
+          keysWithError.isEmpty && noAccessTables.isEmpty,
+          "ERROR: Join validation failed. Please check error message for details."
+        )
+      } else {
+        assert(
+          keysWithError.isEmpty && noAccessTables.isEmpty && dataAvailabilityErrors.isEmpty,
+          "ERROR: Join validation failed. Please check error message for details."
+        )
+      }
+    }
+  }

465-465: Improve error message formatting.

Consider using a template string for consistent error message formatting.

-logger.info(s"Table $table startPartition ${startOpt.getOrElse("empty")} endPartition ${endOpt.getOrElse("empty")}")
+logger.info(f"Table $table%s: startPartition=${startOpt.getOrElse("empty")}%s, endPartition=${endOpt.getOrElse("empty")}%s")

Also applies to: 485-485, 492-492, 502-502, 510-510

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4707538 and 31f24a7.

📒 Files selected for processing (11)

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (4 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (4 hunks)
flink/src/test/scala/ai/chronon/flink/test/SparkExpressionEvalFnTest.scala (2 hunks)
spark/src/main/scala/ai/chronon/spark/Analyzer.scala (16 hunks)
spark/src/main/scala/ai/chronon/spark/Driver.scala (2 hunks)
spark/src/main/scala/ai/chronon/spark/Join.scala (10 hunks)
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (17 hunks)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2 hunks)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (4 hunks)
spark/src/test/scala/ai/chronon/spark/test/AnalyzerTest.scala (12 hunks)
spark/src/test/scala/ai/chronon/spark/test/TableUtilsFormatTest.scala (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

spark/src/test/scala/ai/chronon/spark/test/TableUtilsFormatTest.scala
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala
flink/src/test/scala/ai/chronon/flink/test/SparkExpressionEvalFnTest.scala
spark/src/test/scala/ai/chronon/spark/test/AnalyzerTest.scala

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: other_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: join_spark_tests
GitHub Check: no_spark_scala_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: fetcher_spark_tests

🔇 Additional comments (19)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2)

128-131: LGTM! Improved logging in tableReachable

Changed to info level and added cleaned stack trace for better debugging.

Line range hint 392-404: LGTM! Enhanced SQL query logging

Added clear section markers and execution path tracking.

spark/src/main/scala/ai/chronon/spark/Driver.scala (4)

434-434: LGTM! More precise terminology.

The description now uses the more technically accurate term "skewed keys".

436-442: LGTM! Better variable naming.

Renamed from count to skewKeyCount for better clarity and consistency with skew detection terminology.

447-452: LGTM! Improved clarity and added performance context.

The rename to skewDetection and updated description better communicates the functionality and its performance implications.

464-466: LGTM! Consistent variable usage.

Variable references correctly updated to match the new names.

spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2)

359-361: LGTM! Good defensive programming.

Deep copying the groupBy object prevents concurrent modification issues.

396-397: LGTM! Enhanced observability.

Added logging improves debugging capabilities.

spark/src/main/scala/ai/chronon/spark/JoinBase.scala (2)

47-47: LGTM! Consistent renaming.

The parameter renaming from joinConf to joinConfCloned and its usage in assertions and metrics initialization is consistent.

Also applies to: 55-57

60-65: LGTM! Consistent property access.

Property access through joinConfCloned is correctly updated.

spark/src/main/scala/ai/chronon/spark/Join.scala (3)

75-76: LGTM! Good defensive programming.

Creating a deep copy of joinConf in the inheritance chain prevents unintended modifications during unit tests.

459-463: LGTM! Consistent usage.

References to joinConfCloned in derivation handling are correctly updated.

496-496: LGTM! Consistent validation.

References to joinConfCloned in validation and bootstrap parts handling are correctly updated.

Also applies to: 502-502

spark/src/main/scala/ai/chronon/spark/Analyzer.scala (6)

90-90: Parameter rename is inconsistently applied.

The rename from enableHitter to skewDetection is incomplete. The parameter is still referenced as enableHitter in multiple locations.

424-426: LGTM! Schema validation signature update.

The parameter rename is correctly applied here.

447-448: LGTM! Table permission validation signature update.

The parameter rename is correctly applied here.

534-534: LGTM! Method name update.

The method name change from dataFrameToMap to dataframeToMap is correctly applied here.

597-597: LGTM! Method visibility and name update.

The method visibility change to private and name change to dataframeToMap is correctly applied here.

616-616: LGTM! Parameter rename in run method.

The parameter rename from enableHitter to skewDetection is correctly applied here.

Also applies to: 619-619, 621-622

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (3)

flink/src/test/scala/ai/chronon/flink/test/SparkExpressionEvalFnTest.scala (1)

Line range hint 29-37: Add env.close() after test execution.

StreamExecutionEnvironment needs cleanup.

    val result = new DataStreamUtils(sparkExprEvalDS).collect.toSeq
    // let's check the size
    assert(result.size == elements.size, "Expect result sets to include all 3 rows")
    // let's check the id field
    assert(result.map(_.apply("id")).toSet == Set("test1", "test2", "test3"))
+   env.close()
  }

flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (1)

Line range hint 125-162: Parameterize hardcoded values.

Hardcoded table name, topic, and partition date limit reusability.

   def makeGroupBy(keyColumns: Seq[String], filters: Seq[String] = Seq.empty): GroupBy =
+    makeGroupBy(
+      keyColumns,
+      filters,
+      table = "events.my_stream_raw",
+      topic = "events.my_stream",
+      startPartition = "20231106"
+    )
+
+  def makeGroupBy(
+    keyColumns: Seq[String],
+    filters: Seq[String] = Seq.empty,
+    table: String,
+    topic: String,
+    startPartition: String
+  ): GroupBy =
     Builders.GroupBy(
       sources = Seq(
         Builders.Source.events(
-          table = "events.my_stream_raw",
-          topic = "events.my_stream",
+          table = table,
+          topic = topic,
           query = Builders.Query(
             selects = Map(
               "id" -> "id",
               "int_val" -> "int_val",
               "double_val" -> "double_val"
             ),
             wheres = filters,
             timeColumn = "created",
-            startPartition = "20231106"
+            startPartition = startPartition
           )
         )
       ),

spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1)

Line range hint 37-41: Migrate to ScalaTest cleanup

Replace JUnit's @afterclass with ScalaTest's cleanup hooks.

-  @AfterClass
-  def teardown(): Unit = {
-    FileUtils.deleteDirectory(tmpDir)
-  }
+  override def afterAll(): Unit = {
+    super.afterAll()
+    FileUtils.deleteDirectory(tmpDir)
+  }

🧹 Nitpick comments (38)

flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala (2)

Line range hint 21-37: Consider using ScalaTest matchers

Simple suggestion: Replace assert(result.nonEmpty) with more readable result should not be empty

Line range hint 43-60: Consider using ScalaTest matchers

Same suggestion: Replace assert(result.nonEmpty) with more readable result should not be empty
flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (2)
Line range hint 125-162: Add method documentation.

Missing scaladoc for public method parameters and return value.
+  /**
+   * Creates a GroupBy configuration for testing.
+   *
+   * @param keyColumns columns to group by
+   * @param filters optional filter conditions
+   * @return GroupBy configuration with daily SUM aggregation
+   */
   def makeGroupBy(keyColumns: Seq[String], filters: Seq[String] = Seq.empty): GroupBy =
Line range hint 125-162: Make aggregation configurable.

Fixed SUM operation and daily window limit test scenarios.
   def makeGroupBy(
     keyColumns: Seq[String],
     filters: Seq[String] = Seq.empty,
+    aggregations: Seq[Builders.Aggregation] = Seq(
+      Builders.Aggregation(
+        operation = Operation.SUM,
+        inputColumn = "double_val",
+        windows = Seq(new Window(1, TimeUnit.DAYS))
+      )
+    ),
     table: String,
     topic: String,
     startPartition: String
   ): GroupBy =
flink/src/test/scala/ai/chronon/flink/test/window/KeySelectorTest.scala (2)
Line range hint 8-25: Consider enhancing assertion messages

Add descriptive messages to assertions for better failure reporting.
-    assert(
-      keyFunctionOne(sampleSparkExprEvalOutput) == List(4242)
-    )
+    assert(
+      keyFunctionOne(sampleSparkExprEvalOutput) == List(4242),
+      "Single key selection should extract 'number' value"
+    )
Line range hint 26-53: Consider splitting into focused test cases

Break down into separate test cases for single key, multiple keys, and null handling.
it should "handle single key hashing" in {
  // Single key test
}

it should "handle multiple key hashing" in {
  // Multiple key test
}

it should "handle null values in keys" in {
  // Null value test
}
spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)
113-113: Typo in comment

Change 'a Array' to 'an Array'.
-// frame. The result is a Array of tuples of (column names, array of tuples of (frequent keys, counts))
+// frame. The result is an Array of tuples of (column names, array of tuples of (frequent keys, counts))
online/src/test/scala/ai/chronon/online/test/CatalystUtilHiveUDFTest.scala (1)
9-9: Fix test description

Correct the description to 'Hive UDFs via setups should work'.
-it should "hive ud fs via setups should work" in {
+it should "Hive UDFs via setups should work" in {
spark/src/test/scala/ai/chronon/spark/test/DataRangeTest.scala (1)

25-25: Consider using ScalaTest assertions

Replace JUnit's assertEquals with ScalaTest's assert(range1.intersect(range2) === range2).

Also applies to: 27-27, 31-35

aggregator/src/test/scala/ai/chronon/aggregator/test/MinHeapTest.scala (1)

21-21: Consider using ScalaTest matchers

Replace assertArrayEquals with ScalaTest's should contain theSameElementsInOrderAs.

Also applies to: 26-49

online/src/test/scala/ai/chronon/online/test/JoinCodecTest.scala (1)

21-21: Use ScalaTest matchers for Map comparison

Replace assertEquals with result should contain theSameElementsAs expected for better diff messages on failure.

Also applies to: 23-44
api/src/test/scala/ai/chronon/api/test/ParametricMacroTest.scala (1)
21-21: Migrate to ScalaTest assertions

Replace JUnit assertEquals with ScalaTest's should matchers for consistency.
-import org.junit.Assert.assertEquals
+import org.scalatest.matchers.should.Matchers
-class ParametricMacroTest extends AnyFlatSpec {
+class ParametricMacroTest extends AnyFlatSpec with Matchers {
Also applies to: 23-24
aggregator/src/test/scala/ai/chronon/aggregator/test/VarianceTest.scala (1)
Line range hint 29-45: Make helper methods private

Helper methods should be marked as private for better encapsulation.
-  def mean(elems: Seq[Double]): Double = 
+  private def mean(elems: Seq[Double]): Double = 
-  def naive(elems: Seq[Double]): Double =
+  private def naive(elems: Seq[Double]): Double =
-  def sumOfSquares(elems: Seq[Double]): Double =
+  private def sumOfSquares(elems: Seq[Double]): Double =
-  def welford(elems: Seq[Double], chunkSize: Int = 10): Double =
+  private def welford(elems: Seq[Double], chunkSize: Int = 10): Double =
-  def compare(cardinality: Int, min: Double = 0.0, max: Double = 1000000.0): Unit =
+  private def compare(cardinality: Int, min: Double = 0.0, max: Double = 1000000.0): Unit =
aggregator/src/test/scala/ai/chronon/aggregator/test/MomentTest.scala (2)
7-7: Consider using ScalaTest assertions

Replace JUnit assertions with ScalaTest's matchers for consistency.
-import org.junit.Assert._
+import org.scalatest.matchers.should.Matchers
 import org.scalatest.flatspec.AnyFlatSpec

-class MomentTest extends AnyFlatSpec {
+class MomentTest extends AnyFlatSpec with Matchers {
Also applies to: 9-9

39-39: Improve test descriptions

Test descriptions should be more descriptive of the behavior being tested.
-  it should "update" in {
+  it should "correctly calculate skew and kurtosis when updating values" in {

-  it should "insufficient sizes" in {
+  it should "return NaN for insufficient sample sizes" in {

-  it should "no variance" in {
+  it should "return NaN when all values are identical" in {

-  it should "merge" in {
+  it should "correctly merge two sets of values for skew and kurtosis" in {

-  it should "normalize" in {
+  it should "preserve values through normalization and denormalization" in {
Also applies to: 45-45, 51-51, 57-57, 64-64
aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxDistinctTest.scala (1)
53-53: Improve test descriptions and assertions

Test descriptions should better describe the test scenarios.
-  it should "error bounds" in {
+  it should "maintain error bounds within expected range for different sample sizes" in {

-  it should "merging error bounds" in {
+  it should "maintain error bounds when merging multiple sketches" in {
Also applies to: 59-59
online/src/test/scala/ai/chronon/online/test/TagsTest.scala (1)
Line range hint 48-83: Refactor test setup

Extract complex object construction to helper methods for better readability.
+  def createTestContext(join: String = "join1"): Metrics.Context = {
+    Metrics.Context(
+      Environment.JoinOffline,
+      Builders.Join(
+        metaData = Builders.MetaData(join, team = "team1"),
+        left = Builders.Source.events(query = null, table = "table1"),
+        joinParts = Builders.JoinPart(
+          groupBy = Builders.GroupBy(
+            metaData = Builders.MetaData("group_by1", team = "team2"),
+            sources = Seq(Builders.Source.events(query = null, table = "table2"))
+          )
+        ) :: Nil
+      )
+    )
+  }

   it should "cached tags are computed tags" in {
     val cache = new TTLCache[Metrics.Context, String](
       { ctx => ctx.toTags.mkString(",") },
       { ctx => ctx },
       ttlMillis = 5 * 24 * 60 * 60 * 1000
     )
-    val context = Metrics.Context(...)
+    val context = createTestContext()
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/GCSFormatTest.scala (1)
23-23: Improve test descriptions to follow BDD style.

Simplify verbose test descriptions by removing redundant "method should" phrases:
-  it should "partitions method should return correctly parsed partitions as maps" in {
+  it should "return correctly parsed partitions as maps" in {
-  it should "partitions method should handle empty partitions gracefully" in {
+  it should "handle empty partitions gracefully" in {
-  it should "partitions method should handle date types" in {
+  it should "handle date types" in {
Also applies to: 43-43, 63-63
spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala (1)
Line range hint 45-89: Consider using ScalaTest assertions.

Replace JUnit assertions with ScalaTest's matchers for consistency:
-    assertEquals(confPath, args.confPath())
-    assertTrue(args.localTableMapping.isEmpty)
+    args.confPath() shouldBe confPath
+    args.localTableMapping shouldBe empty
spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala (1)
67-67: Improve test description.

Make the test description more descriptive:
-  it should "metadata export" in {
+  it should "correctly export and validate metadata files" in {
spark/src/test/scala/ai/chronon/spark/test/LocalExportTableAbilityTest.scala (1)
50-50: Simplify test descriptions.

Make test descriptions more concise:
-  it should "local table exporter is not used when not in local mode" in {
+  it should "not use exporter in non-local mode" in {
-  it should "local table exporter is not used when not export path is not specified" in {
+  it should "not use exporter without export path" in {
-  it should "local table exporter is used when necessary" in {
+  it should "use exporter with valid configuration" in {
Also applies to: 56-56, 63-63
spark/src/test/scala/ai/chronon/spark/test/ExternalSourcesTest.scala (1)

33-33: Consider a more descriptive test name.

The current test name "fetch" is too generic. Consider something like "fetch data from external sources" for better clarity.

spark/src/test/scala/ai/chronon/spark/test/StatsComputeTest.scala (1)

38-38: Consider using BDD-style test descriptions.

Convert test names to behavior-driven style, e.g.:

"summary" → "compute basic summary statistics"

"snapshot summary" → "compute summary statistics for snapshots"

"generated summary" → "compute summary statistics for generated data"

"generated summary no ts" → "compute summary statistics without timestamps"

"generated summary byte" → "compute summary statistics for byte columns"

Also applies to: 55-55, 77-77, 104-104, 134-134
spark/src/test/scala/ai/chronon/spark/test/MigrationCompareTest.scala (1)
98-98: Consider grouping related test scenarios.

Use ScalaTest's behavior-driven syntax to group related tests:
behavior of "Migration Compare"

it should "compare identical datasets" in { ... }
it should "handle subset of columns" in { ... }
it should "handle different time windows" in { ... }
it should "detect data differences" in { ... }
Also applies to: 107-107, 125-125, 134-134
spark/src/test/scala/ai/chronon/spark/test/TableUtilsFormatTest.scala (2)
35-35: Improve test descriptions for clarity.

Current descriptions are implementation-focused. Consider:

"testing dynamic classloading" → "load format providers dynamically"

"test insertion of partitioned data and adding of columns" → "handle schema evolution when adding columns"

"test insertion of partitioned data and removal of columns" → "handle schema evolution when removing columns"

"test insertion of partitioned data and modification of columns" → "handle schema evolution when modifying column types"

Also applies to: 39-39, 74-74, 109-109

19-19: Replace JUnit assertions with ScalaTest matchers.

Replace org.junit.Assert._ with ScalaTest's matchers for consistency:
-import org.junit.Assert._
+import org.scalatest.matchers.should.Matchers
+
+class TableUtilsFormatTest extends AnyFlatSpec with Matchers {
Then update assertions:
-assertEquals(expected, actual)
+actual shouldBe expected

-assertTrue(condition)
+condition shouldBe true
online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1)
95-95: Consider improving test method names.

The test names could be more descriptive of the behavior being tested.
-  it should "tile codec ir ser round trip" in {
+  it should "correctly serialize and deserialize tile intermediate results" in {

-  it should "tile codec ir ser round trip_with buckets" in {
+  it should "correctly serialize and deserialize tile intermediate results with buckets" in {
Also applies to: 129-129
flink/src/test/scala/ai/chronon/flink/test/window/FlinkRowAggregationFunctionTest.scala (1)
53-53: Test names could be more descriptive.

Consider using more behavior-focused test names.
-  it should "flink aggregator produces correct results" in {
+  it should "correctly aggregate values across multiple input rows" in {
Also applies to: 95-95
spark/src/test/scala/ai/chronon/spark/test/JoinUtilsTest.scala (2)
38-38: Replace JUnit imports with ScalaTest.
-import org.junit.Assert._
Also applies to: 43-43

Line range hint 49-82: Replace JUnit assertions with ScalaTest matchers.
-assertTrue(df.isFailure)
+df.isFailure shouldBe true

-assertTrue(df.isSuccess)
+df.isSuccess shouldBe true
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (1)
35-37: Replace JUnit Rule with ScalaTest fixture.
-@RunWith(classOf[JUnit4])
-  // ugly hacks to wire up the emulator
-  val _bigtableEmulator: BigtableEmulatorRule = BigtableEmulatorRule.create()
-  @Rule
-  def bigtableEmulator: BigtableEmulatorRule = _bigtableEmulator
+  val bigtableEmulator: BigtableEmulatorRule = BigtableEmulatorRule.create()
Also applies to: 41-44
online/src/test/scala/ai/chronon/online/test/DataStreamBuilderTest.scala (1)
34-34: Consider using ScalaTest assertions.

While the migration to AnyFlatSpec is correct, consider replacing JUnit assertions with ScalaTest's for consistency.
-import org.junit.Assert.assertTrue
+import org.scalatest.matchers.should.Matchers
-class DataStreamBuilderTest extends AnyFlatSpec {
+class DataStreamBuilderTest extends AnyFlatSpec with Matchers {
Also applies to: 38-38
aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxPercentilesTest.scala (1)
23-23: Consider using ScalaTest assertions.

While the migration is correct, replacing JUnit assertions with ScalaTest's would improve consistency.
-import org.junit.Assert._
+import org.scalatest.matchers.should.Matchers
-class ApproxPercentilesTest extends AnyFlatSpec {
+class ApproxPercentilesTest extends AnyFlatSpec with Matchers {
Also applies to: 29-29
aggregator/src/test/scala/ai/chronon/aggregator/test/RowAggregatorTest.scala (1)

51-52: Consider breaking down the large test.

The test contains multiple scenarios. Consider splitting into focused test cases for better maintainability:

"should correctly handle basic row updates"

"should properly process deletions"

"should aggregate different data types"

aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothOnlineAggregatorTest.scala (1)

37-37: Improve test name clarity.

Rename to "should maintain consistency between batch and online aggregation" for better clarity.

api/src/test/scala/ai/chronon/api/test/ExtensionsTest.scala (1)

34-34: Consider adding a shared test base class.

With multiple test classes now using AnyFlatSpec, consider creating a shared base class with common test utilities and matchers.

Also applies to: 51-51, 35-35, 53-53
spark/src/test/scala/ai/chronon/spark/test/CompareTest.scala (1)
Line range hint 56-85: Improve test readability with descriptive assertions.

Use ScalaTest's more expressive matchers instead of raw assertions.
-    assert(result.series.length == 4, "Invalid result length")
+    result.series.length shouldBe 4
spark/src/test/scala/ai/chronon/spark/test/FeatureWithLabelJoinTest.scala (1)

Line range hint 53-114: Consider breaking down large test methods.

Split the test into smaller, focused scenarios for better maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 56a028b and 919730c.

📒 Files selected for processing (81)

.github/workflows/test_scala_no_spark.yaml (2 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxDistinctTest.scala (2 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxHistogramTest.scala (8 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxPercentilesTest.scala (3 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala (6 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/MinHeapTest.scala (1 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/MomentTest.scala (2 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/RowAggregatorTest.scala (2 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala (3 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothOnlineAggregatorTest.scala (1 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala (2 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/VarianceTest.scala (2 hunks)
api/src/test/scala/ai/chronon/api/test/DataTypeConversionTest.scala (1 hunks)
api/src/test/scala/ai/chronon/api/test/ExtensionsTest.scala (8 hunks)
api/src/test/scala/ai/chronon/api/test/ParametricMacroTest.scala (1 hunks)
build.sbt (1 hunks)
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (8 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (4 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (13 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (4 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/GCSFormatTest.scala (3 hunks)
docs/source/test_deploy_serve/Test.md (1 hunks)
flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala (2 hunks)
flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (3 hunks)
flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (1 hunks)
flink/src/test/scala/ai/chronon/flink/test/SparkExpressionEvalFnTest.scala (2 hunks)
flink/src/test/scala/ai/chronon/flink/test/window/FlinkRowAggregationFunctionTest.scala (4 hunks)
flink/src/test/scala/ai/chronon/flink/test/window/KeySelectorTest.scala (2 hunks)
online/src/test/scala/ai/chronon/online/test/CatalystUtilHiveUDFTest.scala (2 hunks)
online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (23 hunks)
online/src/test/scala/ai/chronon/online/test/DataStreamBuilderTest.scala (3 hunks)
online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (9 hunks)
online/src/test/scala/ai/chronon/online/test/FetcherCacheTest.scala (12 hunks)
online/src/test/scala/ai/chronon/online/test/JoinCodecTest.scala (1 hunks)
online/src/test/scala/ai/chronon/online/test/LRUCacheTest.scala (1 hunks)
online/src/test/scala/ai/chronon/online/test/TaggedFilterSuite.scala (1 hunks)
online/src/test/scala/ai/chronon/online/test/TagsTest.scala (2 hunks)
online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (3 hunks)
online/src/test/scala/ai/chronon/online/test/stats/DriftMetricsTest.scala (5 hunks)
scripts/codemod/test_replace.py (1 hunks)
spark/src/main/scala/ai/chronon/spark/Analyzer.scala (17 hunks)
spark/src/main/scala/ai/chronon/spark/Driver.scala (2 hunks)
spark/src/main/scala/ai/chronon/spark/Join.scala (10 hunks)
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (17 hunks)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2 hunks)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (4 hunks)
spark/src/main/scala/ai/chronon/spark/stats/CompareJob.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/stats/drift/Expressions.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/AnalyzerTest.scala (12 hunks)
spark/src/test/scala/ai/chronon/spark/test/AvroTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/ChainingFetcherTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/CompareTest.scala (7 hunks)
spark/src/test/scala/ai/chronon/spark/test/DataRangeTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/EditDistanceTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/ExternalSourcesTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/FeatureWithLabelJoinTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/FetcherTest.scala (6 hunks)
spark/src/test/scala/ai/chronon/spark/test/GroupByTest.scala (13 hunks)
spark/src/test/scala/ai/chronon/spark/test/GroupByUploadTest.scala (4 hunks)
spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (18 hunks)
spark/src/test/scala/ai/chronon/spark/test/JoinUtilsTest.scala (13 hunks)
spark/src/test/scala/ai/chronon/spark/test/KafkaStreamBuilderTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/LabelJoinTest.scala (11 hunks)
spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/LocalExportTableAbilityTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/MigrationCompareTest.scala (5 hunks)
spark/src/test/scala/ai/chronon/spark/test/MutationsTest.scala (11 hunks)
spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/ResultValidationAbilityTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionTest.scala (4 hunks)
spark/src/test/scala/ai/chronon/spark/test/StagingQueryTest.scala (5 hunks)
spark/src/test/scala/ai/chronon/spark/test/StatsComputeTest.scala (5 hunks)
spark/src/test/scala/ai/chronon/spark/test/StreamingTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/TableUtilsFormatTest.scala (4 hunks)
spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala (16 hunks)
spark/src/test/scala/ai/chronon/spark/test/bootstrap/DerivationTest.scala (5 hunks)
spark/src/test/scala/ai/chronon/spark/test/bootstrap/LogBootstrapTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala (3 hunks)

✅ Files skipped from review due to trivial changes (2)

online/src/test/scala/ai/chronon/online/test/TaggedFilterSuite.scala
spark/src/main/scala/ai/chronon/spark/stats/drift/Expressions.scala

👮 Files not reviewed due to content moderation or server errors (12)

aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala
spark/src/test/scala/ai/chronon/spark/test/StreamingTest.scala
aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxHistogramTest.scala
spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala
spark/src/test/scala/ai/chronon/spark/test/bootstrap/LogBootstrapTest.scala
spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala
aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala
spark/src/test/scala/ai/chronon/spark/test/StagingQueryTest.scala
spark/src/main/scala/ai/chronon/spark/TableUtils.scala
spark/src/test/scala/ai/chronon/spark/test/FetcherTest.scala
spark/src/test/scala/ai/chronon/spark/test/MutationsTest.scala

🧰 Additional context used

📓 Learnings (1)

cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (3)

Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-11-12T09:38:33.532Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:82-89
Timestamp: 2024-11-12T09:38:33.532Z
Learning: DynamoDB does not create duplicate tables when attempting to create a table that already exists, so checking for duplicates in tests is redundant.

Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-11-12T09:38:33.532Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.

🪛 LanguageTool

docs/source/test_deploy_serve/Test.md

[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...Optional parameters: --skew-detection: enable skewed data analysis - include t...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 Ruff (0.8.2)

scripts/codemod/test_replace.py

23-23: Found useless expression. Either assign it to a variable or remove it.

(B018)

23-23: Undefined name zz

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: other_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: join_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: no_spark_scala_tests
GitHub Check: fetcher_spark_tests

🔇 Additional comments (71)

flink/src/test/scala/ai/chronon/flink/test/SparkExpressionEvalFnTest.scala (2)

9-9: LGTM: Clean framework migration.

Framework migration looks good with proper import and class extension.

Also applies to: 11-11

13-13: LGTM: Test style conversion.

Test method successfully converted to descriptive BDD style.

flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala (2)

11-12: LGTM: Correct ScalaTest imports added

14-14: LGTM: Correct migration to AnyFlatSpec

flink/src/test/scala/ai/chronon/flink/test/window/KeySelectorTest.scala (2)

5-5: LGTM: Clean import update

7-7: LGTM: Proper ScalaTest extension
scripts/codemod/test_replace.py (5)

23-23: Remove invalid code.

The line contains an undefined variable zz.
-zz
🧰 Tools

🪛 Ruff (0.8.2)

23-23: Found useless expression. Either assign it to a variable or remove it.

(B018)

23-23: Undefined name zz

(F821)

137-142: Improve test method detection.

Add modifier guards and simplify return type check.
         if (
             is_test_method
             and "def " in line
             and "private" not in line
-            and (("(): Unit" in line) or ("): Unit" not in line))
+            and "): Unit" in line
+            and not any(x in line.lower() for x in ["abstract", "override"])
         ):
197-207: Add backup before file modification.

Create a backup to prevent data loss.
 def convert(handler, file_path):
     test_name = get_test_class_name(file_path)
     with open(file_path, "r") as file:
         lines = file.readlines()
         converted = handler(lines, test_name)
 
+    # Create backup
+    import shutil
+    backup_path = file_path + ".bak"
+    shutil.copy2(file_path, backup_path)
+
     with open(file_path, "w") as file:
         file.write(converted)
233-233: Exclude Vertx test directories.

As per PR objectives, Vertx tests should be excluded from conversion.
-    test_files = glob.glob("**/*Test.scala", recursive=True)
+    test_files = [f for f in glob.glob("**/*Test.scala", recursive=True)
+                  if "vertx" not in f.lower()]
120-122: Handle additional JUnit imports.

Convert all JUnit imports to their ScalaTest equivalents.
#!/bin/bash
# Search for other JUnit imports that might need conversion
rg "import org.junit" | grep -v "import org.junit.Test"
spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)

Line range hint 90-622: Parameter rename is now consistently applied

The rename from enableHitter to skewDetection appears to be consistently applied in this file.
spark/src/test/scala/ai/chronon/spark/test/KafkaStreamBuilderTest.scala (1)

31-33: Use 'assertThrows' for exception testing

Consider using assertThrows[RuntimeException] for clearer exception testing.

Apply this diff:
-intercept[RuntimeException] {
+assertThrows[RuntimeException] {
online/src/test/scala/ai/chronon/online/test/LRUCacheTest.scala (1)

5-5: Clean migration to AnyFlatSpec

Test cases properly converted with descriptive names and maintained functionality.

Also applies to: 8-8, 11-35

spark/src/test/scala/ai/chronon/spark/test/EditDistanceTest.scala (1)

21-21: Migrate to ScalaTest assertions

Replace JUnit assertions with ScalaTest matchers.

Also applies to: 23-23, 25-25

api/src/test/scala/ai/chronon/api/test/DataTypeConversionTest.scala (1)

23-23: Migrate to ScalaTest assertions

Replace JUnit assertions with ScalaTest matchers.

Also applies to: 27-27, 29-29

aggregator/src/test/scala/ai/chronon/aggregator/test/VarianceTest.scala (1)

21-21: Migrate to ScalaTest assertions

Replace JUnit assertions with ScalaTest matchers.

Also applies to: 25-25

spark/src/test/scala/ai/chronon/spark/test/ResultValidationAbilityTest.scala (3)

33-34: LGTM! Clean migration to AnyFlatSpec.

The transition from JUnit to ScalaTest's AnyFlatSpec is well-structured.

Also applies to: 36-36

41-44: LGTM! Correct usage of before block.

Setup code properly migrated from JUnit's @before.

Line range hint 53-91: LGTM! Test methods properly migrated.

All test cases successfully converted to AnyFlatSpec's "it should" syntax.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (3)

14-17: LGTM! Clean migration while preserving MockitoSugar.

Line range hint 19-68: LGTM! Test methods correctly migrated to AnyFlatSpec style.

Line range hint 71-108: LGTM! Ignored tests properly migrated.

Correctly uses AnyFlatSpec's ignore syntax.

online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1)

27-27: LGTM! Clean migration to AnyFlatSpec.

Test properly converted from JUnit to ScalaTest style.

Also applies to: 31-31, 33-33

online/src/test/scala/ai/chronon/online/test/stats/DriftMetricsTest.scala (1)

7-10: LGTM! Well-structured migration to AnyFlatSpec.

Statistical test cases properly converted while maintaining comprehensive assertions.

Also applies to: 78-128

online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1)

23-23: LGTM: Framework migration changes.

Clean transition to ScalaTest.

Also applies to: 29-29

flink/src/test/scala/ai/chronon/flink/test/window/FlinkRowAggregationFunctionTest.scala (1)

6-6: LGTM: Clean import changes.

Appropriate ScalaTest imports added.

Also applies to: 11-11

flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (1)

16-17: LGTM: Implemented suggested BeforeAndAfter trait.

Successfully implemented the previous review suggestion.

Also applies to: 22-22, 54-63

spark/src/main/scala/ai/chronon/spark/stats/CompareJob.scala (1)

101-101: Parameter rename reflects updated terminology.

Changed from enableHitter to skewDetection for clarity.
spark/src/test/scala/ai/chronon/spark/test/ChainingFetcherTest.scala (1)

35-37: Remove JUnit @RunWith annotation.
-@RunWith(classOf[JUnit4])
spark/src/test/scala/ai/chronon/spark/test/GroupByUploadTest.scala (1)

40-40: LGTM!

The migration to ScalaTest is clean and complete.

spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2)

359-362: LGTM! Good defensive programming.

Cloning groupBy prevents concurrent modification issues.

396-397: LGTM! Helpful logging.

Added logging improves debuggability.

spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionTest.scala (1)

318-321: LGTM! Better error handling.

Using require provides clearer error messages for precondition failures.

spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala (1)

47-47: LGTM! Clean test framework migration.

Successfully migrated to ScalaTest while preserving test logic.

spark/src/test/scala/ai/chronon/spark/test/LabelJoinTest.scala (2)

29-29: Clean migration to ScalaTest.

The import and class extension changes correctly implement the ScalaTest framework.

Also applies to: 32-32

Line range hint 47-253: Well-structured test conversions.

Test methods have been cleanly converted to ScalaTest's behavior-driven style while preserving the original test logic.

build.sbt (1)

196-196: Proper test dependency configuration.

The dependency update correctly includes test scope for the online project.

online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (2)

24-24: Consistent ScalaTest migration.

Import and class extension changes match the project-wide migration pattern.

Also applies to: 169-169

Line range hint 171-545: Clean test method conversion.

Complex test methods successfully migrated to ScalaTest style while preserving all test logic.

spark/src/test/scala/ai/chronon/spark/test/bootstrap/DerivationTest.scala (2)

35-35: Consistent framework migration.

Import and class extension changes align with the project-wide migration pattern.

Also applies to: 42-42

Line range hint 49-626: Clean test conversion with preserved logic.

Complex test methods successfully migrated to ScalaTest style while maintaining all assertions and test logic.

spark/src/main/scala/ai/chronon/spark/Driver.scala (1)

434-436: LGTM! Clear variable renames.

Variable names and descriptions now accurately reflect their skew detection purpose.

Also applies to: 440-440, 445-445, 447-447, 451-451, 464-464, 466-466

spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (3)

31-31: LGTM! Clean import changes.

The import changes correctly support the transition to AnyFlatSpec.

Also applies to: 43-43

59-59: LGTM! Clean class signature update.

The class signature correctly extends AnyFlatSpec and includes explicit type declaration.

Also applies to: 62-62

74-74: LGTM! Clean test method updates.

Test methods correctly follow the AnyFlatSpec style.

Also applies to: 113-113, 316-316, 734-734, 814-814, 968-968, 985-985, 1034-1034, 1090-1090, 1139-1139, 1197-1197

.github/workflows/test_scala_no_spark.yaml (2)

51-51: Memory allocation increased for Flink tests.

Memory settings optimized and module access flag added.

62-65: New test step added for Catalyst Util tests.

Test isolation improves reliability.

docs/source/test_deploy_serve/Test.md (1)

50-50: Parameter renamed for clarity.

--enable-hitter → --skew-detection better reflects the functionality.

Also applies to: 55-55

spark/src/test/scala/ai/chronon/spark/test/AnalyzerTest.scala (3)

31-33: Migrated to ScalaTest framework.

Modern testing framework with better readability.

Also applies to: 37-39

77-79: Parameter renamed for consistency.

enableHitter → skewDetection aligns with new terminology.

Also applies to: 120-124, 162-165, 290-294, 322-326, 359-363, 378-382

117-119: Enhanced logging for debugging.

Added table content visibility before assertions.

Also applies to: 159-161

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (2)

13-13: LGTM: Proper migration to AnyFlatSpec.

Clean transition from AnyFunSuite, maintaining test functionality.

Also applies to: 16-16

33-33: LGTM: Test cases properly converted to BDD style.

Test descriptions are clear and follow the "it should" pattern.

Also applies to: 37-37, 45-45, 52-52, 63-63, 77-77

online/src/test/scala/ai/chronon/online/test/DataStreamBuilderTest.scala (1)

50-50: LGTM: Tests properly converted to BDD style.

Clear and descriptive test cases using "it should" syntax.

Also applies to: 66-66

spark/src/test/scala/ai/chronon/spark/test/AvroTest.scala (1)

28-28: LGTM: Clean migration to AnyFlatSpec.

Test structure properly converted while preserving complex setup.

Also applies to: 30-30, 37-37

aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxPercentilesTest.scala (1)

59-59: LGTM: Tests properly converted to BDD style.

Clear test descriptions using "it should" syntax.

Also applies to: 77-77

api/src/test/scala/ai/chronon/api/test/ExtensionsTest.scala (1)

34-34: LGTM! Excellent test structure.

Well-organized test cases with clear, descriptive names following BDD style.

online/src/test/scala/ai/chronon/online/test/FetcherCacheTest.scala (3)

20-20: LGTM! Required ScalaTest imports added.

36-36: LGTM! Class correctly extends AnyFlatSpec.

Line range hint 42-193: LGTM! Test methods properly converted to ScalaTest style.

The conversion from JUnit's @Test methods to ScalaTest's it should syntax is correct and maintains the original test logic.

spark/src/main/scala/ai/chronon/spark/JoinBase.scala (2)

47-57: LGTM! Parameter renamed to indicate cloned state.

The renaming from joinConf to joinConfCloned better reflects that this is a cloned instance.

Line range hint 140-161: LGTM! Consistent usage of renamed parameter.

All references updated to use joinConfCloned consistently.

spark/src/main/scala/ai/chronon/spark/Join.scala (1)

75-76: LGTM! Good documentation of cloning rationale.

Clear explanation of why the configuration is cloned.

spark/src/test/scala/ai/chronon/spark/test/GroupByTest.scala (2)

51-55: LGTM! Proper ScalaTest setup.

Correct import and class extension for AnyFlatSpec.

Line range hint 61-621: LGTM! Test methods successfully converted to ScalaTest style.

All test methods properly migrated from JUnit's @Test to ScalaTest's it should syntax while preserving test logic.

spark/src/test/scala/ai/chronon/spark/test/CompareTest.scala (2)

27-27: LGTM: Framework migration to ScalaTest.

Clean transition to AnyFlatSpec.

Also applies to: 31-31

Line range hint 87-122: Extract common test logic.

The mapping test duplicates verification logic from the basic test.

online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (1)

38-39: LGTM: Clean migration to ScalaTest.

Proper use of BeforeAndAfter trait.

Also applies to: 51-51

cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (1)

13-14: LGTM: Framework migration complete.

Clean transition to AnyFlatSpec, maintaining existing test patterns.

Also applies to: 34-34

spark/src/test/scala/ai/chronon/spark/test/FeatureWithLabelJoinTest.scala (1)

36-36: LGTM: Framework migration complete.

Clean transition to AnyFlatSpec.

Also applies to: 40-40

@test

## Why? we have tests written in multiple flavors - extending junit TestCase class - using @test annotation - using AnyFunSuite - using AnyFlatSpec - using vertx junit runner bazel silently fails to run the tests when they are not uniform! ## How? This PR translates almost all of the tests to AnyFlatSpec except for vertx tests. I used the script at scripts/codemod/test_replace.py ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit ## Release Notes - **Testing Framework Update** - Transitioned from JUnit and AnyFunSuite to ScalaTest's AnyFlatSpec across multiple test classes. - Improved test readability and expressiveness. - Maintained existing test logic and functionality. - Updated test method syntax from `@Test` to `it should` format. - **Affected Components** - Aggregator test suite - API test suite - Cloud AWS/GCP test suites - Flink test suite - Online test suite - Spark test suite - New Python script added for automated conversion of tests to AnyFlatSpec format. - **Benefits** - More descriptive test case definitions. - Enhanced test documentation. - Consistent testing approach across the project.

@test

## Why? we have tests written in multiple flavors - extending junit TestCase class - using @test annotation - using AnyFunSuite - using AnyFlatSpec - using vertx junit runner bazel silently fails to run the tests when they are not uniform! ## How? This PR translates almost all of the tests to AnyFlatSpec except for vertx tests. I used the script at scripts/codemod/test_replace.py ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit ## Release Notes - **Testing Framework Update** - Transitioned from JUnit and AnyFunSuite to ScalaTest's AnyFlatSpec across multiple test classes. - Improved test readability and expressiveness. - Maintained existing test logic and functionality. - Updated test method syntax from `@Test` to `it should` format. - **Affected Components** - Aggregator test suite - API test suite - Cloud AWS/GCP test suites - Flink test suite - Online test suite - Spark test suite - New Python script added for automated conversion of tests to AnyFlatSpec format. - **Benefits** - More descriptive test case definitions. - Enhanced test documentation. - Consistent testing approach across the project.

@test

## Why? we have tests written in multiple flavors - extending junit TestCase class - using @test annotation - using AnyFunSuite - using AnyFlatSpec - using vertx junit runner bazel silently fails to run the tests when they are not uniform! ## How? This PR translates almost all of the tests to AnyFlatSpec except for vertx tests. I used the script at scripts/codemod/test_replace.py ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit ## Release Notes - **Testing Framework Update** - Transitioned from JUnit and AnyFunSuite to ScalaTest's AnyFlatSpec across multiple test classes. - Improved test readability and expressiveness. - Maintained existing test logic and functionality. - Updated test method syntax from `@Test` to `it should` format. - **Affected Components** - Aggregator test suite - API test suite - Cloud AWS/GCP test suites - Flink test suite - Online test suite - Spark test suite - New Python script added for automated conversion of tests to AnyFlatSpec format. - **Benefits** - More descriptive test case definitions. - Enhanced test documentation. - Consistent testing approach across the project.

@test

## Why? we have tests written in multiple flavors - extending junit TestCase class - using @test annotation - using AnyFunSuite - using AnyFlatSpec - using vertx junit runner bazel silently fails to run the tests when they are not uniform! ## How? This PR translates almost all of the tests to AnyFlatSpec except for vertx tests. I used the script at scripts/codemod/test_replace.py ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit ## Release Notes - **Testing Framework Update** - Transitioned from JUnit and AnyFunSuite to ScalaTest's AnyFlatSpec across multiple test classes. - Improved test readability and expressiveness. - Maintained existing test logic and functionality. - Updated test method syntax from `@Test` to `it should` format. - **Affected Components** - Aggregator test suite - API test suite - Cloud AWS/GCP test suites - Flink test suite - Online test suite - Spark test suite - New Python script added for automated conversion of tests to AnyFlatSpec format. - **Benefits** - More descriptive test case definitions. - Enhanced test documentation. - Consistent testing approach across the project.

@test

## Why? we have tests written in multiple flavors - extending junit TestCase class - using @test annotation - using AnyFunSuite - using AnyFlatSpec - using vertx junit runner bazel silently fails to run the tests when they are not uniform! ## How? This PR translates almost all of the tests to AnyFlatSpec except for vertx tests. I used the script at scripts/codemod/test_replace.py ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit ## Release Notes - **Testing Framework Update** - Transitioned from JUnit and AnyFunSuite to ScalaTest's AnyFlatSpec across multiple test classes. - Improved test readability and expressiveness. - Maintained existing test logic and functionality. - Updated test method syntax from `@Test` to `it should` format. - **Affected Components** - Aggregator test suite - API test suite - Cloud AWS/GCP test suites - Flink test suite - Online test suite - Spark test suite - New Python script added for automated conversion of tests to AnyFlatSpec format. - **Benefits** - More descriptive test case definitions. - Enhanced test documentation. - Consistent testing approach across the project.

unifying all tests into flatspec except vertx ones

4291b99

nikhil-zlai added 3 commits January 14, 2025 09:41

script used for translating - still requires last mile changes by hand.

ed1d402

codemod script

60ab733

comments

a1d41a0

coderabbitai bot reviewed Jan 14, 2025

View reviewed changes

before & after methods

6840dc9

coderabbitai bot reviewed Jan 14, 2025

View reviewed changes

before after

346b38d

coderabbitai bot reviewed Jan 14, 2025

View reviewed changes

asyncwriter jdk 17 opens

0ea7842

coderabbitai bot reviewed Jan 14, 2025

View reviewed changes

tchow-zlai approved these changes Jan 14, 2025

View reviewed changes

error intercepts

0fef86d

coderabbitai bot reviewed Jan 15, 2025

View reviewed changes

nikhil-zlai added 2 commits January 14, 2025 23:25

lru test cache

54d4eec

flink test fix

3ba567c

coderabbitai bot reviewed Jan 15, 2025

View reviewed changes

nikhil-zlai added 4 commits January 15, 2025 00:16

undo flink test fix

4707538

concurrent join modification fix

f742872

Merge branch 'main' into tests_consolidation_into_flatspec

31f24a7

rename heavy hitters to skew keys

f038b89

coderabbitai bot reviewed Jan 15, 2025

View reviewed changes

isolate catalyst util test

919730c

coderabbitai bot reviewed Jan 15, 2025

View reviewed changes

nikhil-zlai merged commit 0faaf40 into main Jan 15, 2025
10 checks passed

nikhil-zlai deleted the tests_consolidation_into_flatspec branch January 15, 2025 17:44

coderabbitai bot mentioned this pull request Jan 21, 2025

python api: allow string windows #253

Merged

4 tasks

	it should "variance: unit = {" in {
	it should "test variance" in {



		if __name__ == "__main__":
		test_files = glob.glob("*/Test.scala", recursive=True)

	test_files = glob.glob("*/Test.scala", recursive=True)
	test_files = [f for f in glob.glob("*/Test.scala", recursive=True)
	if "vertx" not in f.lower()]

unifying all tests into flatspec except vertx ones #212

unifying all tests into flatspec except vertx ones #212

Uh oh!

Conversation

nikhil-zlai commented Jan 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

How?

Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Possibly related PRs

Suggested Reviewers

Poem

Review ran into problems

Finishing Touches

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tchow-zlai left a comment

Choose a reason for hiding this comment

Uh oh!

nikhil-zlai commented Jan 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 14, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)