Vz/add test case for different partition formats #753

nikhil-zlai · 2025-05-08T06:05:52Z

Summary

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

New Features
- Added support for translating partition columns and formats across datasets, enabling joins between tables with different partitioning schemes.
- Introduced new methods for partition spec translation and window handling, improving flexibility in partition management.
Bug Fixes
- Improved consistency and correctness in partition handling during joins and group-by operations, especially for tables with different partition columns or formats.
Tests
- Added comprehensive test coverage for joins involving heterogeneous partition columns, dynamic partition overwrites, and selective join part computation.
- Removed outdated or redundant join tests and introduced new tests for advanced scenarios such as no aggregation joins, migration, versioning, and skipping backfill under certain conditions.
Chores
- Updated workflow configurations to use more powerful test runners for faster and more reliable CI.
- Enhanced .gitignore to exclude additional local configuration files.
Style
- Applied minor formatting improvements and comment corrections throughout the codebase for better readability.

WIP WIP

coderabbitai · 2025-05-08T06:05:59Z

Walkthrough

This update introduces partition spec translation capabilities across the codebase. New methods for converting partition strings and ranges between differing partition specs are added, and all relevant Spark join, groupby, and catalog logic is refactored to use these translations. Extensive new and refactored tests validate heterogeneous partition handling.

Changes

File(s)	Change Summary
`api/src/main/scala/ai/chronon/api/DataRange.scala`, `api/src/main/scala/ai/chronon/api/PartitionSpec.scala`	Added `translate` methods for partition spec and range conversion; assertion changed to requirement.
`spark/src/main/scala/ai/chronon/spark/Extensions.scala`	Added DataFrame partition spec translation; new accessor methods in `SourceSparkOps`.
`spark/src/main/scala/ai/chronon/spark/Analyzer.scala`, `spark/src/main/scala/ai/chronon/spark/JoinBase.scala`, `spark/src/main/scala/ai/chronon/spark/JoinUtils.scala`, `spark/src/main/scala/ai/chronon/spark/GroupBy.scala`	Refactored to use partition spec translation and updated range-to-fill logic.
`spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala`	Partition methods now handle explicit partition specs; signatures updated; debug output added.
`spark/src/main/scala/ai/chronon/spark/catalog/Hive.scala`	Minor refactor for partition parsing.
`spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala`	Comment formatting only.
`spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala`	Partition format parameterization added to data generators.
`spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala`	Test updated for explicit partitioning in data generation.
`spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala`	Entire test suite deleted.
`spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala`, `spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala`	New and refactored join tests for heterogeneous partition columns, cumulative joins, versioning, and more; updates to use new partition translation logic.
Other files (`.gitignore`, `.github/workflows/*`, aggregator, cloud_aws, cloud_gcp, online, etc.)	Minor formatting, comment, or runner environment changes only.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant SparkJob
    participant TableUtils
    participant PartitionSpec
    participant PartitionRange

    User->>SparkJob: Initiate join/groupby with sources
    SparkJob->>TableUtils: Request partitions for source(s)
    TableUtils->>PartitionRange: .translate(targetSpec)
    PartitionRange->>PartitionSpec: .translate(date, targetSpec)
    PartitionSpec-->>PartitionRange: Translated date string
    PartitionRange-->>TableUtils: New PartitionRange with translated bounds
    TableUtils-->>SparkJob: Partitions in unified spec
    SparkJob->>SparkJob: Execute logic with aligned partitions

Possibly related PRs

zipline-ai/chronon#731: Adds a column field to PartitionSpec and updates related code; both PRs focus on partition spec structure and translation.

Suggested reviewers

nikhil-zlai

Poem

Partition specs now dance in sync,
Translated strings—no need to think!
Joins and groupbys, now aligned,
Across all formats, intertwined.
With specs in hand, the tests delight—
Heterogeneous columns? Works just right!
🎉

Tip

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.

Enjoy the performance boost—your workflow just got faster.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 0744ccc and 8907ddf.

📒 Files selected for processing (3)

spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (6 hunks)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

spark/src/main/scala/ai/chronon/spark/Analyzer.scala
spark/src/main/scala/ai/chronon/spark/GroupBy.scala
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala

⏰ Context from checks skipped due to timeout of 90000ms (29)

GitHub Check: service_tests
GitHub Check: cloud_gcp_tests
GitHub Check: cloud_gcp_tests
GitHub Check: streaming_tests
GitHub Check: cloud_aws_tests
GitHub Check: streaming_tests
GitHub Check: service_tests
GitHub Check: join_tests
GitHub Check: groupby_tests
GitHub Check: online_tests
GitHub Check: service_commons_tests
GitHub Check: groupby_tests
GitHub Check: flink_tests
GitHub Check: analyzer_tests
GitHub Check: cloud_aws_tests
GitHub Check: fetcher_tests
GitHub Check: spark_tests
GitHub Check: api_tests
GitHub Check: analyzer_tests
GitHub Check: aggregator_tests
GitHub Check: join_tests
GitHub Check: flink_tests
GitHub Check: spark_tests
GitHub Check: fetcher_tests
GitHub Check: api_tests
GitHub Check: batch_tests
GitHub Check: batch_tests
GitHub Check: aggregator_tests
GitHub Check: enforce_triggered_workflows

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🔭 Outside diff range comments (2)

api/src/main/scala/ai/chronon/api/DataRange.scala (1)
91-96: ⚠️ Potential issue

Validation removal risks NPE / infinite stream

Commenting out the require(wellDefined, ...) allows start or end to be null, after which
Stream.iterate(start)(partitionSpec.after) will throw or loop forever.

Please restore the check (or an equivalent guard) before producing the stream.
-//    TODO: UNDO the comment below
-//    require(wellDefined, s"Invalid partition range $this")
+require(wellDefined, s"Invalid partition range $this")
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
606-608: ⚠️ Potential issue

Partition-spec mismatch leads to wrong window calculation.
minQuery is derived with tableUtils.partitionSpec but queryStart is already in source.partitionSpec → incorrect date math for non-default specs.
-          val minQuery = tableUtils.partitionSpec.before(queryStart)
-          val windowStart: String = window.map(tableUtils.partitionSpec.minus(minQuery, _)).orNull
+          val minQuery       = source.partitionSpec.before(queryStart)
+          val windowStart: String = window.map(source.partitionSpec.minus(minQuery, _)).orNull

🧹 Nitpick comments (10)

api/src/main/scala/ai/chronon/api/PartitionSpec.scala (1)
92-95: Null safety & early-exit for translate

If date is null the current call will sdf.parse(null) and NPE.
Consider guarding nulls and skipping work when the target spec is identical.
-  def translate(date: String, targetSpec: PartitionSpec): String = {
-    val millis = epochMillis(date)
-    targetSpec.at(millis)
-  }
+  def translate(date: String, targetSpec: PartitionSpec): String =
+    Option(date)
+      .map(d => if (this eq targetSpec) d else targetSpec.at(epochMillis(d)))
+      .orNull
api/src/main/scala/ai/chronon/api/DataRange.scala (1)
156-162: translate could be tiny bit tighter

You already have null-handling; micro-saving: skip translation when specs match.
-    val newStart = Option(start).map(d => partitionSpec.translate(d, otherSpec)).orNull
+    val newStart =
+      if (partitionSpec eq otherSpec) start else Option(start).map(partitionSpec.translate(_, otherSpec)).orNull
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)
291-304: Range expansion may blow up memory

rangeToFill.partitions materialises every partition between start & end before filtering; for long backfills this can be millions of strings.

Consider streaming or early-filtering instead:
val fillable = tableUtils
  .partitions(...)
  .filter(p => p >= rangeToFill.start && p <= rangeToFill.end)
require(fillable.nonEmpty, "…")
spark/src/main/scala/ai/chronon/spark/Extensions.scala (3)

20-23: Super-nit: redundant wildcard import.
ScalaJavaConversions._ already brings SourceOps & WindowOps; the extra explicit import is unnecessary.

270-291: Null-safe parse & perf edge-case in translatePartitionSpec.

to_date returns null for unparsable strings; consider guarding with when(col.isNotNull, …).

to_date truncates any time component – fine for daily partitions, risky otherwise. Clarify assumption or handle timestamps.

311-328: Minor: avoid repeated Option(...).getOrElse pattern.
Storing val q = source.query once reduces allocations/readability hit.

spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (1)

679-869: Spelling & verbosity in new test.
Method name hetergeneous → heterogeneous; lots of println slows CI – consider removing.

spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (3)

240-242: show() triggers expensive actions

dfRearranged.show() executes a full scan; remove or guard with a debug flag.

270-271: Second eager show()

Same performance hit for finalizedDf.show(). Drop it or wrap in if (logger.isDebugEnabled).

392-403: Cross-product loop may explode

Nested loops over inputPartitionColumnNames × inputPartitionSpecs × inputTables can duplicate work.
Consider zipping specs with columns or passing a map.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3779e99 and 517ff8e.

📒 Files selected for processing (13)

api/src/main/scala/ai/chronon/api/DataRange.scala (2 hunks)
api/src/main/scala/ai/chronon/api/PartitionSpec.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/Extensions.scala (3 hunks)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (6 hunks)
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (3 hunks)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (4 hunks)
spark/src/main/scala/ai/chronon/spark/catalog/Hive.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (8 hunks)
spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

spark/src/main/scala/ai/chronon/spark/JoinBase.scala (4)

api/src/main/scala/ai/chronon/api/PartitionSpec.scala (2)

PartitionSpec (28-96)

PartitionSpec (98-100)

spark/src/main/scala/ai/chronon/spark/Extensions.scala (4)

Extensions (37-334)

partitionRange (118-121)

partitionSpec (325-327)

effectivePartitionColumn (331-333)

spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (4)

TableUtils (43-611)

TableUtils (613-615)

partitions (129-160)

unfilledRanges (338-425)

spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2)

JoinUtils (39-541)

getRangeToFill (141-175)

⏰ Context from checks skipped due to timeout of 90000ms (31)

GitHub Check: cloud_gcp_tests
GitHub Check: cloud_aws_tests
GitHub Check: service_tests
GitHub Check: flink_tests
GitHub Check: aggregator_tests
GitHub Check: api_tests
GitHub Check: online_tests
GitHub Check: cloud_aws_tests
GitHub Check: service_tests
GitHub Check: streaming_tests
GitHub Check: service_commons_tests
GitHub Check: analyzer_tests
GitHub Check: cloud_gcp_tests
GitHub Check: api_tests
GitHub Check: groupby_tests
GitHub Check: online_tests
GitHub Check: join_tests
GitHub Check: flink_tests
GitHub Check: fetcher_tests
GitHub Check: spark_tests
GitHub Check: aggregator_tests
GitHub Check: batch_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: fetcher_tests
GitHub Check: streaming_tests
GitHub Check: groupby_tests
GitHub Check: analyzer_tests
GitHub Check: spark_tests
GitHub Check: join_tests
GitHub Check: batch_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (11)

spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala (2)

288-288: Updated method call for consistency

Method call updated from getRangesToFill to getRangeToFill to match the signature change in the implementation.

305-305: Method signature change properly implemented

Updated call from getRangesToFill to getRangeToFill with the additional parameters required by the new implementation.

spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)

267-267: Updated to use renamed method with consistent parameters

Method call changed from getRangesToFill to getRangeToFill along with proper parameter inclusion, consistent with the API changes.

spark/src/main/scala/ai/chronon/spark/catalog/Hive.scala (1)

20-24: Improved code readability

Introduced a local variable to store intermediate results instead of method chaining. Makes code slightly more readable without changing functionality.

spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala (2)

92-93: Added partition specification to test data generation

Now explicitly using TableUtils to provide partition column and format when generating test data.

105-106: Consistent partition handling for weight table test data

Added explicit partition column and format for weight table data generation, maintaining consistency with the first table.
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)

142-149: Missing inputPartitionSpecs → inconsistent gap detection

unfilledRanges elsewhere now receives both inputPartitionColumnNames & inputPartitionSpecs.
Here only column names are sent; if the table’s spec differs from the global one, gaps can be mis-computed.
-         inputPartitionColumnNames = Seq(joinConfCloned.left.query.effectivePartitionColumn)
+         inputPartitionColumnNames = Seq(joinConfCloned.left.query.effectivePartitionColumn),
+         inputPartitionSpecs = Seq(joinConfCloned.left.partitionSpec)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)

698-707: Nice touch on spec translation.
Translating the scanned DF back to the global spec keeps downstream logic untouched.

spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (2)

316-324: 👍 Explicit partition format in test increases coverage.

366-370: Future-dated startPartition looks odd.
Starting a backfill two days in the future may hide bugs; confirm intent.

spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)

106-107: Verify translated DF contains global partition col

After translatePartitionSpec, ensure the new column actually exists and duplicates are dropped.

coderabbitai · 2025-05-08T06:11:07Z

api/src/main/scala/ai/chronon/api/PartitionSpec.scala

+  def intervalWindow: Window = {
+    if (spanMillis == WindowUtils.Day.millis) WindowUtils.Day
+    else if (spanMillis == WindowUtils.Hour.millis) WindowUtils.Hour
+    else
+      throw new UnsupportedOperationException(
+        s"Partition Intervals should be either hour or day - found ${spanMillis / 60 * 1000} minutes")
+  }


⚠️ Potential issue

Incorrect minute calculation in error message

spanMillis / 60 * 1000 first divides by 60 ms, then multiplies, giving a bogus number.
Use millisecond-to-minute conversion: spanMillis / (60 * 1000).

- s"Partition Intervals should be either hour or day - found ${spanMillis / 60 * 1000} minutes") + s"Partition intervals must be hour or day – got ${(spanMillis / 60000)} minutes")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def intervalWindow: Window = {

if (spanMillis == WindowUtils.Day.millis) WindowUtils.Day

else if (spanMillis == WindowUtils.Hour.millis) WindowUtils.Hour

else

throw new UnsupportedOperationException(

s"Partition Intervals should be either hour or day - found ${spanMillis / 60 * 1000} minutes")

}

def intervalWindow: Window = {

if (spanMillis == WindowUtils.Day.millis) WindowUtils.Day

else if (spanMillis == WindowUtils.Hour.millis) WindowUtils.Hour

else

throw new UnsupportedOperationException(

s"Partition intervals must be hour or day – got ${(spanMillis / 60000)} minutes")

}

coderabbitai · 2025-05-08T06:11:07Z

spark/src/main/scala/ai/chronon/spark/GroupBy.scala

+    val partitionConditions = intersectedRange.whereClauses



🛠️ Refactor suggestion

Filtering also uses the wrong spec.
intersectedRange is in the source spec, but whereClauses is rendered with the default tableUtils spec, yielding empty scans.

- val partitionConditions = intersectedRange.whereClauses + val partitionConditions = + intersectedRange.translate(tableUtils.partitionSpec).whereClauses

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

val partitionConditions = intersectedRange.whereClauses

val partitionConditions =

intersectedRange.translate(tableUtils.partitionSpec).whereClauses

coderabbitai · 2025-05-08T06:11:08Z

spark/src/main/scala/ai/chronon/spark/JoinUtils.scala

    var df = tableUtils.scanDf(joinConf.left.query,
                               joinConf.left.table,
                               Some((Map(tableUtils.partitionColumn -> null) ++ timeProjection).toMap),
-                               range = Some(range))
+                               range = Some(effectiveLeftRange))



⚠️ Potential issue

Partition-column mismatch in scanDf

Using tableUtils.partitionColumn here breaks when the left table has a custom partition column.
Use effectiveLeftSpec.column instead.

- Some((Map(tableUtils.partitionColumn -> null) ++ timeProjection).toMap), + Some((Map(effectiveLeftSpec.column -> null) ++ timeProjection).toMap),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

var df = tableUtils.scanDf(joinConf.left.query,

joinConf.left.table,

Some((Map(tableUtils.partitionColumn -> null) ++ timeProjection).toMap),

range = Some(range))

range = Some(effectiveLeftRange))

var df = tableUtils.scanDf(joinConf.left.query,

joinConf.left.table,

Some((Map(effectiveLeftSpec.column -> null) ++ timeProjection).toMap),

range = Some(effectiveLeftRange))

spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala

…mats

coderabbitai

Actionable comments posted: 15

🧹 Nitpick comments (10)

spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala (1)
28-28: Fix typo in test name.

There's a typo in "hetergeneous".
-  it should "test hetergeneous partition columns" in {
+  it should "test heterogeneous partition columns" in {
spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala (1)
78-79: Add newline at end of file.

Missing newline at end of file as flagged by Scala Fmt.
}
+
🧰 Tools

🪛 GitHub Actions: Scala Fmt

[error] 79-79: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala (1)
96-96: Add newline at end of file.

Missing newline at end of file as flagged by Scala Fmt.
}
+
spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (2)
138-152: Consider completing TODO comment.

There's a commented test section with a TODO about revisiting in a "logger world".

153-154: Add newline at end of file.

Missing newline at end of file as flagged by Scala Fmt.
}
+
🧰 Tools

🪛 GitHub Actions: Scala Fmt

[error] 154-154: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (2)
64-65: Remove commented code.

Remove or uncomment this debug println statement.
-    // println("Rupee Source start partition $month")
119-122: Fix multiline formatting.

The Scala Fmt pipeline reports formatting issues here.
-    val runner1 = new ai.chronon.spark.Join(joinConf = joinConf, endPartition = tableUtils.partitionSpec.minus(today, new Window(40, TimeUnit.DAYS)), tableUtils = tableUtils)
+    val runner1 = new ai.chronon.spark.Join(
+      joinConf = joinConf, 
+      endPartition = tableUtils.partitionSpec.minus(today, new Window(40, TimeUnit.DAYS)), 
+      tableUtils = tableUtils)
🧰 Tools

🪛 GitHub Actions: Scala Fmt

[error] 119-122: Code formatting issue: multiline formatting changes detected.
spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala (2)
1-152: Test looks robust. Add newline at end of file.

Test effectively validates joins across datasets with different partition columns and formats.
 }
+
🧰 Tools

🪛 GitHub Actions: Scala Fmt

[error] 152-152: Missing newline at end of file.

136-150: Consider enabling or removing commented test code.

This commented test case appears to validate behavior when input/output tables have identical partitions.

Either uncomment and implement this test or remove the commented code if not needed.
spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (1)
1-109: Well-structured base class. Add newline at end of file.

Base class provides comprehensive utilities for join testing.
 }
+
🧰 Tools

🪛 GitHub Actions: Scala Fmt

[error] 109-109: Missing newline at end of file.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 1f43a7a and 0c5d495.

📒 Files selected for processing (20)

.gitignore (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (0 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala (1 hunks)

💤 Files with no reviewable changes (1)

spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala

✅ Files skipped from review due to trivial changes (1)

.gitignore

🧰 Additional context used

🧬 Code Graph Analysis (3)

spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala (2)

api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)

ScalaJavaConversions (6-97)

spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (2)

TestRow (28-28)

TestRow (30-37)

spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala (4)

api/src/main/scala/ai/chronon/api/Builders.scala (5)

Builders (27-371)

Source (106-140)

Selects (29-39)

exprs (34-38)

MetaData (261-315)

spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala (1)

DataFrameGen (39-180)

api/src/main/scala/ai/chronon/api/DataType.scala (1)

StringType (152-152)

spark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)

computeJoin (229-231)

spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala (4)

api/src/main/scala/ai/chronon/api/Builders.scala (4)

Builders (27-371)

Source (106-140)

Selects (29-39)

MetaData (261-315)

spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala (1)

DataFrameGen (39-180)

spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (1)

getEventsEventsTemporal (91-108)

spark/src/main/scala/ai/chronon/spark/Comparison.scala (2)

Comparison (31-126)

sideBySide (63-120)

🪛 GitHub Actions: Scala Fmt

spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala

[error] 44-44: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala

[error] 135-138: Code formatting issue: multiline formatting changes detected.

spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala

[error] 81-81: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala

[error] 66-66: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala

[error] 114-115: Code formatting issue: multiline formatting changes detected.

[error] 121-121: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala

[error] 154-154: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala

[error] 154-154: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala

[error] 152-152: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala

[error] 85-85: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala

[error] 81-81: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala

[error] 119-122: Code formatting issue: multiline formatting changes detected.

spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala

[error] 79-79: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala

[error] 108-108: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala

[error] 76-76: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala

[error] 36-38: Code formatting issue: multiline formatting changes detected.

[error] 77-78: Code formatting issue: multiline formatting changes detected.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala

[error] 97-97: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala

[error] 102-102: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala

[error] 109-109: Missing newline at end of file.

⏰ Context from checks skipped due to timeout of 90000ms (30)

GitHub Check: cloud_aws_tests
GitHub Check: service_tests
GitHub Check: service_tests
GitHub Check: analyzer_tests
GitHub Check: service_commons_tests
GitHub Check: analyzer_tests
GitHub Check: online_tests
GitHub Check: streaming_tests
GitHub Check: cloud_gcp_tests
GitHub Check: streaming_tests
GitHub Check: cloud_gcp_tests
GitHub Check: online_tests
GitHub Check: spark_tests
GitHub Check: spark_tests
GitHub Check: flink_tests
GitHub Check: cloud_aws_tests
GitHub Check: join_tests
GitHub Check: join_tests
GitHub Check: api_tests
GitHub Check: groupby_tests
GitHub Check: api_tests
GitHub Check: groupby_tests
GitHub Check: aggregator_tests
GitHub Check: fetcher_tests
GitHub Check: flink_tests
GitHub Check: fetcher_tests
GitHub Check: batch_tests
GitHub Check: aggregator_tests
GitHub Check: batch_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (10)

spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala (1)

27-101: Good test structure for temporal joins.

The test thoroughly validates temporal join between events datasets using both the Chronon API and SQL-based validation.

spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala (1)

29-152: Good test for heterogeneous partition formats.

The test correctly validates joins with different partition columns and formats.

spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala (1)

32-80: Good bloom filter backfill test.

Test effectively validates bloom filter threshold behavior.

🧰 Tools

🪛 GitHub Actions: Scala Fmt

[error] 36-38: Code formatting issue: multiline formatting changes detected.

[error] 77-78: Code formatting issue: multiline formatting changes detected.

spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala (1)

29-106: Good test for joins without aggregation.

The test properly validates joins with no-aggregation scenarios.

spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala (1)

1-78: Test logic is sound and well structured.

The test case properly validates join behavior when key mapping overlaps with a grouping field, with clear setup and assertions.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala (1)

1-96: Test logic looks correct.

Test effectively verifies snapshot join functionality between events datasets.

spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (1)

38-40: Partition format is explicitly set.

Noting that this table uses explicit "yyyyMMdd" format while other tables use the default format.

spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (3)

28-37: Good use of implicit ordering.

Ordering implementation enables consistent comparison in test results.

54-89: Clear data generation method.

Method effectively creates test data with configurable cumulative behavior.

91-108: Concise join configuration builder.

Method appropriately configures temporal join tests with duplicated events.

spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala

spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala

coderabbitai · 2025-05-08T18:05:31Z

spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala

+    val joinJob = new ai.chronon.spark.Join(joinConf = joinConf,
+                         endPartition = today,
+                         tableUtils = tableUtils,
+                         selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1")))
+


⚠️ Potential issue

Fix multiline formatting.

Format constructor parameters consistently according to style guidelines.

- val joinJob = new ai.chronon.spark.Join(joinConf = joinConf, - endPartition = today, - tableUtils = tableUtils, - selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1"))) + val joinJob = new ai.chronon.spark.Join( + joinConf = joinConf, + endPartition = today, + tableUtils = tableUtils, + selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1")))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

val joinJob = new ai.chronon.spark.Join(joinConf = joinConf,

endPartition = today,

tableUtils = tableUtils,

selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1")))

val joinJob = new ai.chronon.spark.Join(

joinConf = joinConf,

endPartition = today,

tableUtils = tableUtils,

selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1")))

🧰 Tools

🪛 GitHub Actions: Scala Fmt

[error] 135-138: Code formatting issue: multiline formatting changes detected.

spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala

spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala

spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala

coderabbitai · 2025-05-08T18:05:32Z

spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala

+    val skipBloomComputed = new ai.chronon.spark.Join(joinConf = joinConf, endPartition = today, tableUtils = testTableUtils).computeJoin()
+    val leftSideCount = testSpark.sql(s"SELECT item, ts, ds from $itemQueriesTable where ds >= '$start'").count()


⚠️ Potential issue

Fix multi-line formatting.

The line is too long and triggering a formatting error.

- val skipBloomComputed = new ai.chronon.spark.Join(joinConf = joinConf, endPartition = today, tableUtils = testTableUtils).computeJoin() + val skipBloomComputed = new ai.chronon.spark.Join( + joinConf = joinConf, + endPartition = today, + tableUtils = testTableUtils + ).computeJoin()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

val skipBloomComputed = new ai.chronon.spark.Join(joinConf = joinConf, endPartition = today, tableUtils = testTableUtils).computeJoin()

val leftSideCount = testSpark.sql(s"SELECT item, ts, ds from $itemQueriesTable where ds >= '$start'").count()

val skipBloomComputed = new ai.chronon.spark.Join(

joinConf = joinConf,

endPartition = today,

tableUtils = testTableUtils

).computeJoin()

val leftSideCount = testSpark.sql(s"SELECT item, ts, ds from $itemQueriesTable where ds >= '$start'").count()

🧰 Tools

🪛 GitHub Actions: Scala Fmt

[error] 77-78: Code formatting issue: multiline formatting changes detected.

coderabbitai · 2025-05-08T18:05:32Z

spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala

+      submission.SparkSessionBuilder.build("JoinTest",
+                                         local = true,
+                                         additionalConfig =
+                                           Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100")))


⚠️ Potential issue

Fix multi-line formatting.

Adjust the indentation to match Scala formatting standards.

- val testSpark: SparkSession = - submission.SparkSessionBuilder.build("JoinTest", - local = true, - additionalConfig = - Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100"))) + val testSpark: SparkSession = + submission.SparkSessionBuilder.build( + "JoinTest", + local = true, + additionalConfig = Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100")) + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

submission.SparkSessionBuilder.build("JoinTest",

local = true,

additionalConfig =

Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100")))

val testSpark: SparkSession =

submission.SparkSessionBuilder.build(

"JoinTest",

local = true,

additionalConfig = Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100"))

)

🧰 Tools

🪛 GitHub Actions: Scala Fmt

[error] 36-38: Code formatting issue: multiline formatting changes detected.

spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala

coderabbitai · 2025-05-08T18:05:33Z

spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala

+          setups = Seq(
+            "create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
+            "create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
+            "create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'"
+          )


⚠️ Potential issue

Duplicate UDF creation.

The same UDF temp_replace_right_c is created twice.

setups = Seq( "create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'", "create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'", - "create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

setups = Seq(

"create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",

"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",

"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'"

)

setups = Seq(

"create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",

"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'"

)

tchow-zlai · 2025-05-08T18:50:47Z

spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala

+import ai.chronon.spark.Extensions._
+import org.junit.Assert._
+
+class EventsEventsCumulativeTest extends BaseJoinTest {


…mats

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

.github/workflows/test_scala_2_12_spark.yaml (2)

100-100: Runner label invalid

🧰 Tools

🪛 actionlint (1.7.4)

100-100: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

126-126: Runner label invalid

🧰 Tools

🪛 actionlint (1.7.4)

126-126: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🧹 Nitpick comments (1)

.github/workflows/test_scala_2_12_spark.yaml (1)

21-150: Consolidate job blocks
Use a matrix or YAML anchors for shared runs-on, container, and steps to avoid duplication.

🧰 Tools

🪛 actionlint (1.7.4)

23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

48-48: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

74-74: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

100-100: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

126-126: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 0c5d495 and 00d5eab.

📒 Files selected for processing (2)

.github/workflows/test_scala_2_12_spark.yaml (3 hunks)
.github/workflows/test_scala_2_13_spark.yaml (3 hunks)

🧰 Additional context used

🪛 actionlint (1.7.4)

.github/workflows/test_scala_2_12_spark.yaml

74-74: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

100-100: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

126-126: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

.github/workflows/test_scala_2_13_spark.yaml

77-77: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

104-104: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

131-131: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🔇 Additional comments (3)

.github/workflows/test_scala_2_13_spark.yaml (2)

130-131:

❓ Verification inconclusive

Unknown runner label
ubuntu_32_core_128gb is not a standard GitHub-hosted label. Confirm it’s registered on your self-hosted runners or update to a valid label to avoid workflow failures.

Unrecognized runner label
ubuntu_32_core_128gb isn’t a standard GitHub-hosted runner. If this is meant to target a self-hosted runner, confirm you’ve registered that label; otherwise switch to a valid label (e.g. ubuntu-latest) to avoid failures.

• File: .github/workflows/test_scala_2_13_spark.yaml
• Lines: 130–131

🧰 Tools

🪛 actionlint (1.7.4)

131-131: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

103-104:

❓ Verification inconclusive

Unknown runner label
ubuntu_32_core_128gb is not a standard GitHub-hosted label. Confirm it’s registered on your self-hosted runners or update to a valid label to avoid workflow failures.

Confirm self-hosted runner label or use a valid GitHub-hosted runner
The runs-on: ubuntu_32_core_128gb label isn’t a standard GitHub-hosted runner.
• Ensure a self-hosted runner is registered with that exact label, or
• Switch to a valid GitHub label (e.g., ubuntu-latest, ubuntu-20.04) to avoid failures.

Location: .github/workflows/test_scala_2_13_spark.yaml lines 103–104

🧰 Tools

🪛 actionlint (1.7.4)

104-104: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

.github/workflows/test_scala_2_12_spark.yaml (1)

74-74: Runner label invalid
actionlint warns ubuntu_32_core_128gb is unrecognized. Confirm self-hosted config or switch to a supported runner.

🧰 Tools

🪛 actionlint (1.7.4)

74-74: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

.github/workflows/test_scala_2_13_spark.yaml

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (1)

135-149: Commented code needs attention.

Consider either implementing or removing the commented test case. The TODO at line 138 suggests revisiting "in a logger world."

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 00d5eab and 147ae2c.

📒 Files selected for processing (1)

spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (28)

GitHub Check: service_tests
GitHub Check: streaming_tests
GitHub Check: service_commons_tests
GitHub Check: service_tests
GitHub Check: groupby_tests
GitHub Check: cloud_gcp_tests
GitHub Check: cloud_gcp_tests
GitHub Check: streaming_tests
GitHub Check: cloud_aws_tests
GitHub Check: analyzer_tests
GitHub Check: cloud_aws_tests
GitHub Check: groupby_tests
GitHub Check: spark_tests
GitHub Check: online_tests
GitHub Check: analyzer_tests
GitHub Check: flink_tests
GitHub Check: flink_tests
GitHub Check: join_tests
GitHub Check: spark_tests
GitHub Check: api_tests
GitHub Check: join_tests
GitHub Check: fetcher_tests
GitHub Check: api_tests
GitHub Check: aggregator_tests
GitHub Check: fetcher_tests
GitHub Check: aggregator_tests
GitHub Check: batch_tests
GitHub Check: batch_tests

🔇 Additional comments (3)

spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (3)

32-54: Good use of explicit partition format.

Line 39 explicitly sets partitionFormat = Some("yyyyMMdd") for weight data, aligning with PR's goal of handling different partition formats.

55-73: Missing partition format for height data.

Height data creation doesn't specify a partition format, unlike weight data. This difference is likely intentional to test heterogeneous partition formats.

80-89: Good partition specs usage.

Lines 80-81 use tableUtils.partitionSpec.minus() for date calculations, demonstrating the improved partition specification translation.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (1)

75-77: Fix duplicate UDF creation.

The UDF temp_replace_right_c is created twice.

          setups = Seq(
            "create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
            "create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
-            "create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'"
          )

🧹 Nitpick comments (2)

spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (2)
64-64: Remove commented debug print.

Remove unused comment.
-    // println("Rupee Source start partition $month")
31-235: Extract test logic into smaller methods.

The test method is very long. Consider decomposing into smaller methods.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 147ae2c and 0744ccc.

📒 Files selected for processing (45)

aggregator/src/main/scala/ai/chronon/aggregator/base/MinHeap.scala (1 hunks)
aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala (2 hunks)
api/python/test/sample/scripts/data-loader.scala (1 hunks)
cloud_aws/src/test/scala/ai/chronon/integrations/aws/HudiTableUtilsTest.scala (1 hunks)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (1 hunks)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpApiImpl.scala (1 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (3 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryExternalTest.scala (1 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (2 hunks)
flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala (4 hunks)
flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (1 hunks)
online/src/main/scala/ai/chronon/online/metrics/FlexibleExecutionContext.scala (1 hunks)
online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
online/src/test/scala/ai/chronon/online/test/stats/AssignIntervalsTest.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/Analyzer.scala (2 hunks)
spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (7 hunks)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (7 hunks)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (2 hunks)
spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/catalog/CreationUtils.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala (8 hunks)
spark/src/test/scala/ai/chronon/spark/test/batch/ModularJoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala (1 hunks)

✅ Files skipped from review due to trivial changes (24)

spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala
spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala
spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
online/src/main/scala/ai/chronon/online/metrics/FlexibleExecutionContext.scala
cloud_aws/src/test/scala/ai/chronon/integrations/aws/HudiTableUtilsTest.scala
spark/src/main/scala/ai/chronon/spark/catalog/CreationUtils.scala
api/python/test/sample/scripts/data-loader.scala
spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryExternalTest.scala
aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala
flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
online/src/test/scala/ai/chronon/online/test/stats/AssignIntervalsTest.scala
spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala
flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala
online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
spark/src/test/scala/ai/chronon/spark/test/batch/ModularJoinTest.scala
spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala
aggregator/src/main/scala/ai/chronon/aggregator/base/MinHeap.scala
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala

🚧 Files skipped from review as they are similar to previous changes (19)

spark/src/main/scala/ai/chronon/spark/Analyzer.scala
spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala
spark/src/main/scala/ai/chronon/spark/GroupBy.scala
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala
spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala

⏰ Context from checks skipped due to timeout of 90000ms (27)

GitHub Check: flink_tests
GitHub Check: online_tests
GitHub Check: streaming_tests
GitHub Check: streaming_tests
GitHub Check: service_tests
GitHub Check: join_tests
GitHub Check: cloud_gcp_tests
GitHub Check: spark_tests
GitHub Check: online_tests
GitHub Check: cloud_aws_tests
GitHub Check: groupby_tests
GitHub Check: join_tests
GitHub Check: cloud_gcp_tests
GitHub Check: analyzer_tests
GitHub Check: groupby_tests
GitHub Check: service_tests
GitHub Check: cloud_aws_tests
GitHub Check: spark_tests
GitHub Check: fetcher_tests
GitHub Check: service_commons_tests
GitHub Check: fetcher_tests
GitHub Check: flink_tests
GitHub Check: analyzer_tests
GitHub Check: aggregator_tests
GitHub Check: batch_tests
GitHub Check: batch_tests
GitHub Check: aggregator_tests

🔇 Additional comments (2)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpApiImpl.scala (1)

193-193: Comment formatting improved.

Space added after comment marker follows standard style conventions.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (1)

1-236: LGTM except for minor issues.

The test looks comprehensive and tests important edge cases including join part offset handling.

tchow-zlai · 2025-05-08T20:11:54Z

spark/src/main/scala/ai/chronon/spark/Extensions.scala

@@ -267,6 +267,28 @@ object Extensions {
      logger.info(s"schema: ${df.schema.fieldNames.mkString("Array(", ", ", ")")}")
      df.replaceWithReadableTime(availableColumns, dropOriginal = true).show(truncate = false)
    }
+
+    def translatePartitionSpec(existingSpec: PartitionSpec, newSpec: PartitionSpec): DataFrame = {


ezvz and others added 3 commits May 7, 2025 21:19

Adding test for heterogeneous columns

ecd24f2

WIP WIP

multiple partition format support

1618661

WIP

9399429

working

517ff8e

coderabbitai bot reviewed May 8, 2025

View reviewed changes

thomaschow and others added 8 commits May 7, 2025 23:15

comment

f83c712

add a simple assertion

d6b998c

Merge branch 'main' into vz/add_test_case_for_different_partition_for…

96e5d76

…mats

add a simple assertion

9ae16c5

uncomment

ff2e548

format

1f43a7a

splitting join tests

74c8cb8

Merge branch 'main' into vz/add_test_case_for_different_partition_for…

0c5d495

…mats

coderabbitai bot reviewed May 8, 2025

View reviewed changes

tchow-zlai reviewed May 8, 2025

View reviewed changes

nikhil-zlai added 2 commits May 8, 2025 12:10

fat runner

81e04aa

Merge branch 'main' into vz/add_test_case_for_different_partition_for…

00d5eab

…mats

coderabbitai bot reviewed May 8, 2025

View reviewed changes

.github/workflows/test_scala_2_13_spark.yaml Show resolved Hide resolved

fix entities entities test

147ae2c

coderabbitai bot reviewed May 8, 2025

View reviewed changes

scalafmt overall

0744ccc

coderabbitai bot reviewed May 8, 2025

View reviewed changes

tchow-zlai approved these changes May 8, 2025

View reviewed changes

fmt

8907ddf

tchow-zlai reviewed May 8, 2025

View reviewed changes

nikhil-zlai merged commit 487d4d3 into main May 8, 2025
35 checks passed

nikhil-zlai deleted the vz/add_test_case_for_different_partition_formats branch May 8, 2025 21:46

chewy-zlai pushed a commit that referenced this pull request May 15, 2025

Vz/add test case for different partition formats (#753)

176f219

chewy-zlai pushed a commit that referenced this pull request May 15, 2025

Vz/add test case for different partition formats (#753)

99f2455

chewy-zlai pushed a commit that referenced this pull request May 16, 2025

Vz/add test case for different partition formats (#753)

f6d67ab

coderabbitai bot mentioned this pull request Jul 16, 2025

fix: flaky tests due to insufficient generated data #970

Open

4 tasks

	val partitionConditions = intersectedRange.whereClauses
	val partitionConditions =
	intersectedRange.translate(tableUtils.partitionSpec).whereClauses

		val skipBloomComputed = new ai.chronon.spark.Join(joinConf = joinConf, endPartition = today, tableUtils = testTableUtils).computeJoin()
		val leftSideCount = testSpark.sql(s"SELECT item, ts, ds from $itemQueriesTable where ds >= '$start'").count()

Vz/add test case for different partition formats #753

Vz/add test case for different partition formats #753

Uh oh!

Conversation

nikhil-zlai commented May 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

tchow-zlai May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tchow-zlai May 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nikhil-zlai commented May 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 8, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)