Skip to content

Vz/add test case for different partition formats #753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
May 8, 2025

Conversation

nikhil-zlai
Copy link
Contributor

@nikhil-zlai nikhil-zlai commented May 8, 2025

Summary

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • New Features

    • Added support for translating partition columns and formats across datasets, enabling joins between tables with different partitioning schemes.
    • Introduced new methods for partition spec translation and window handling, improving flexibility in partition management.
  • Bug Fixes

    • Improved consistency and correctness in partition handling during joins and group-by operations, especially for tables with different partition columns or formats.
  • Tests

    • Added comprehensive test coverage for joins involving heterogeneous partition columns, dynamic partition overwrites, and selective join part computation.
    • Removed outdated or redundant join tests and introduced new tests for advanced scenarios such as no aggregation joins, migration, versioning, and skipping backfill under certain conditions.
  • Chores

    • Updated workflow configurations to use more powerful test runners for faster and more reliable CI.
    • Enhanced .gitignore to exclude additional local configuration files.
  • Style

    • Applied minor formatting improvements and comment corrections throughout the codebase for better readability.

Copy link
Contributor

coderabbitai bot commented May 8, 2025

Walkthrough

This update introduces partition spec translation capabilities across the codebase. New methods for converting partition strings and ranges between differing partition specs are added, and all relevant Spark join, groupby, and catalog logic is refactored to use these translations. Extensive new and refactored tests validate heterogeneous partition handling.

Changes

File(s) Change Summary
api/src/main/scala/ai/chronon/api/DataRange.scala, api/src/main/scala/ai/chronon/api/PartitionSpec.scala Added translate methods for partition spec and range conversion; assertion changed to requirement.
spark/src/main/scala/ai/chronon/spark/Extensions.scala Added DataFrame partition spec translation; new accessor methods in SourceSparkOps.
spark/src/main/scala/ai/chronon/spark/Analyzer.scala, spark/src/main/scala/ai/chronon/spark/JoinBase.scala, spark/src/main/scala/ai/chronon/spark/JoinUtils.scala, spark/src/main/scala/ai/chronon/spark/GroupBy.scala Refactored to use partition spec translation and updated range-to-fill logic.
spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala Partition methods now handle explicit partition specs; signatures updated; debug output added.
spark/src/main/scala/ai/chronon/spark/catalog/Hive.scala Minor refactor for partition parsing.
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala Comment formatting only.
spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala Partition format parameterization added to data generators.
spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala Test updated for explicit partitioning in data generation.
spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala Entire test suite deleted.
spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala, spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala New and refactored join tests for heterogeneous partition columns, cumulative joins, versioning, and more; updates to use new partition translation logic.
Other files (.gitignore, .github/workflows/*, aggregator, cloud_aws, cloud_gcp, online, etc.) Minor formatting, comment, or runner environment changes only.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant SparkJob
    participant TableUtils
    participant PartitionSpec
    participant PartitionRange

    User->>SparkJob: Initiate join/groupby with sources
    SparkJob->>TableUtils: Request partitions for source(s)
    TableUtils->>PartitionRange: .translate(targetSpec)
    PartitionRange->>PartitionSpec: .translate(date, targetSpec)
    PartitionSpec-->>PartitionRange: Translated date string
    PartitionRange-->>TableUtils: New PartitionRange with translated bounds
    TableUtils-->>SparkJob: Partitions in unified spec
    SparkJob->>SparkJob: Execute logic with aligned partitions
Loading

Possibly related PRs

  • zipline-ai/chronon#731: Adds a column field to PartitionSpec and updates related code; both PRs focus on partition spec structure and translation.

Suggested reviewers

  • nikhil-zlai

Poem

Partition specs now dance in sync,
Translated strings—no need to think!
Joins and groupbys, now aligned,
Across all formats, intertwined.
With specs in hand, the tests delight—
Heterogeneous columns? Works just right!
🎉

Tip

⚡️ Faster reviews with caching
  • CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.

Enjoy the performance boost—your workflow just got faster.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 0744ccc and 8907ddf.

📒 Files selected for processing (3)
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
⏰ Context from checks skipped due to timeout of 90000ms (29)
  • GitHub Check: service_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: streaming_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: streaming_tests
  • GitHub Check: service_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: online_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: groupby_tests
  • GitHub Check: flink_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: spark_tests
  • GitHub Check: api_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: join_tests
  • GitHub Check: flink_tests
  • GitHub Check: spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: api_tests
  • GitHub Check: batch_tests
  • GitHub Check: batch_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: enforce_triggered_workflows

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (2)
api/src/main/scala/ai/chronon/api/DataRange.scala (1)

91-96: ⚠️ Potential issue

Validation removal risks NPE / infinite stream

Commenting out the require(wellDefined, ...) allows start or end to be null, after which
Stream.iterate(start)(partitionSpec.after) will throw or loop forever.

Please restore the check (or an equivalent guard) before producing the stream.

-//    TODO: UNDO the comment below
-//    require(wellDefined, s"Invalid partition range $this")
+require(wellDefined, s"Invalid partition range $this")
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)

606-608: ⚠️ Potential issue

Partition-spec mismatch leads to wrong window calculation.
minQuery is derived with tableUtils.partitionSpec but queryStart is already in source.partitionSpec → incorrect date math for non-default specs.

-          val minQuery = tableUtils.partitionSpec.before(queryStart)
-          val windowStart: String = window.map(tableUtils.partitionSpec.minus(minQuery, _)).orNull
+          val minQuery       = source.partitionSpec.before(queryStart)
+          val windowStart: String = window.map(source.partitionSpec.minus(minQuery, _)).orNull
🧹 Nitpick comments (10)
api/src/main/scala/ai/chronon/api/PartitionSpec.scala (1)

92-95: Null safety & early-exit for translate

If date is null the current call will sdf.parse(null) and NPE.
Consider guarding nulls and skipping work when the target spec is identical.

-  def translate(date: String, targetSpec: PartitionSpec): String = {
-    val millis = epochMillis(date)
-    targetSpec.at(millis)
-  }
+  def translate(date: String, targetSpec: PartitionSpec): String =
+    Option(date)
+      .map(d => if (this eq targetSpec) d else targetSpec.at(epochMillis(d)))
+      .orNull
api/src/main/scala/ai/chronon/api/DataRange.scala (1)

156-162: translate could be tiny bit tighter

You already have null-handling; micro-saving: skip translation when specs match.

-    val newStart = Option(start).map(d => partitionSpec.translate(d, otherSpec)).orNull
+    val newStart =
+      if (partitionSpec eq otherSpec) start else Option(start).map(partitionSpec.translate(_, otherSpec)).orNull
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)

291-304: Range expansion may blow up memory

rangeToFill.partitions materialises every partition between start & end before filtering; for long backfills this can be millions of strings.

Consider streaming or early-filtering instead:

val fillable = tableUtils
  .partitions(...)
  .filter(p => p >= rangeToFill.start && p <= rangeToFill.end)
require(fillable.nonEmpty, "")
spark/src/main/scala/ai/chronon/spark/Extensions.scala (3)

20-23: Super-nit: redundant wildcard import.
ScalaJavaConversions._ already brings SourceOps & WindowOps; the extra explicit import is unnecessary.


270-291: Null-safe parse & perf edge-case in translatePartitionSpec.

  1. to_date returns null for unparsable strings; consider guarding with when(col.isNotNull, …).
  2. to_date truncates any time component – fine for daily partitions, risky otherwise. Clarify assumption or handle timestamps.

311-328: Minor: avoid repeated Option(...).getOrElse pattern.
Storing val q = source.query once reduces allocations/readability hit.

spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (1)

679-869: Spelling & verbosity in new test.
Method name hetergeneousheterogeneous; lots of println slows CI – consider removing.

spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (3)

240-242: show() triggers expensive actions

dfRearranged.show() executes a full scan; remove or guard with a debug flag.


270-271: Second eager show()

Same performance hit for finalizedDf.show(). Drop it or wrap in if (logger.isDebugEnabled).


392-403: Cross-product loop may explode

Nested loops over inputPartitionColumnNames × inputPartitionSpecs × inputTables can duplicate work.
Consider zipping specs with columns or passing a map.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3779e99 and 517ff8e.

📒 Files selected for processing (13)
  • api/src/main/scala/ai/chronon/api/DataRange.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/PartitionSpec.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/Hive.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (8 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (4)
api/src/main/scala/ai/chronon/api/PartitionSpec.scala (2)
  • PartitionSpec (28-96)
  • PartitionSpec (98-100)
spark/src/main/scala/ai/chronon/spark/Extensions.scala (4)
  • Extensions (37-334)
  • partitionRange (118-121)
  • partitionSpec (325-327)
  • effectivePartitionColumn (331-333)
spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (4)
  • TableUtils (43-611)
  • TableUtils (613-615)
  • partitions (129-160)
  • unfilledRanges (338-425)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2)
  • JoinUtils (39-541)
  • getRangeToFill (141-175)
⏰ Context from checks skipped due to timeout of 90000ms (31)
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: service_tests
  • GitHub Check: flink_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: api_tests
  • GitHub Check: online_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: service_tests
  • GitHub Check: streaming_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: api_tests
  • GitHub Check: groupby_tests
  • GitHub Check: online_tests
  • GitHub Check: join_tests
  • GitHub Check: flink_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: spark_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: batch_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: fetcher_tests
  • GitHub Check: streaming_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: batch_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (11)
spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala (2)

288-288: Updated method call for consistency

Method call updated from getRangesToFill to getRangeToFill to match the signature change in the implementation.


305-305: Method signature change properly implemented

Updated call from getRangesToFill to getRangeToFill with the additional parameters required by the new implementation.

spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)

267-267: Updated to use renamed method with consistent parameters

Method call changed from getRangesToFill to getRangeToFill along with proper parameter inclusion, consistent with the API changes.

spark/src/main/scala/ai/chronon/spark/catalog/Hive.scala (1)

20-24: Improved code readability

Introduced a local variable to store intermediate results instead of method chaining. Makes code slightly more readable without changing functionality.

spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala (2)

92-93: Added partition specification to test data generation

Now explicitly using TableUtils to provide partition column and format when generating test data.


105-106: Consistent partition handling for weight table test data

Added explicit partition column and format for weight table data generation, maintaining consistency with the first table.

spark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)

142-149: Missing inputPartitionSpecs → inconsistent gap detection

unfilledRanges elsewhere now receives both inputPartitionColumnNames & inputPartitionSpecs.
Here only column names are sent; if the table’s spec differs from the global one, gaps can be mis-computed.

-         inputPartitionColumnNames = Seq(joinConfCloned.left.query.effectivePartitionColumn)
+         inputPartitionColumnNames = Seq(joinConfCloned.left.query.effectivePartitionColumn),
+         inputPartitionSpecs = Seq(joinConfCloned.left.partitionSpec)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)

698-707: Nice touch on spec translation.
Translating the scanned DF back to the global spec keeps downstream logic untouched.

spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (2)

316-324: 👍 Explicit partition format in test increases coverage.


366-370: Future-dated startPartition looks odd.
Starting a backfill two days in the future may hide bugs; confirm intent.

spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)

106-107: Verify translated DF contains global partition col

After translatePartitionSpec, ensure the new column actually exists and duplicates are dropped.

Comment on lines +84 to +90
def intervalWindow: Window = {
if (spanMillis == WindowUtils.Day.millis) WindowUtils.Day
else if (spanMillis == WindowUtils.Hour.millis) WindowUtils.Hour
else
throw new UnsupportedOperationException(
s"Partition Intervals should be either hour or day - found ${spanMillis / 60 * 1000} minutes")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect minute calculation in error message

spanMillis / 60 * 1000 first divides by 60 ms, then multiplies, giving a bogus number.
Use millisecond-to-minute conversion: spanMillis / (60 * 1000).

-        s"Partition Intervals should be either hour or day - found ${spanMillis / 60 * 1000} minutes")
+        s"Partition intervals must be hour or day – got ${(spanMillis / 60000)} minutes")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def intervalWindow: Window = {
if (spanMillis == WindowUtils.Day.millis) WindowUtils.Day
else if (spanMillis == WindowUtils.Hour.millis) WindowUtils.Hour
else
throw new UnsupportedOperationException(
s"Partition Intervals should be either hour or day - found ${spanMillis / 60 * 1000} minutes")
}
def intervalWindow: Window = {
if (spanMillis == WindowUtils.Day.millis) WindowUtils.Day
else if (spanMillis == WindowUtils.Hour.millis) WindowUtils.Hour
else
throw new UnsupportedOperationException(
s"Partition intervals must be hour or day – got ${(spanMillis / 60000)} minutes")
}

Comment on lines +671 to 672
val partitionConditions = intersectedRange.whereClauses

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Filtering also uses the wrong spec.
intersectedRange is in the source spec, but whereClauses is rendered with the default tableUtils spec, yielding empty scans.

-    val partitionConditions = intersectedRange.whereClauses
+    val partitionConditions =
+      intersectedRange.translate(tableUtils.partitionSpec).whereClauses
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val partitionConditions = intersectedRange.whereClauses
val partitionConditions =
intersectedRange.translate(tableUtils.partitionSpec).whereClauses

Comment on lines 86 to 90
var df = tableUtils.scanDf(joinConf.left.query,
joinConf.left.table,
Some((Map(tableUtils.partitionColumn -> null) ++ timeProjection).toMap),
range = Some(range))
range = Some(effectiveLeftRange))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Partition-column mismatch in scanDf

Using tableUtils.partitionColumn here breaks when the left table has a custom partition column.
Use effectiveLeftSpec.column instead.

-                               Some((Map(tableUtils.partitionColumn -> null) ++ timeProjection).toMap),
+                               Some((Map(effectiveLeftSpec.column -> null) ++ timeProjection).toMap),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var df = tableUtils.scanDf(joinConf.left.query,
joinConf.left.table,
Some((Map(tableUtils.partitionColumn -> null) ++ timeProjection).toMap),
range = Some(range))
range = Some(effectiveLeftRange))
var df = tableUtils.scanDf(joinConf.left.query,
joinConf.left.table,
Some((Map(effectiveLeftSpec.column -> null) ++ timeProjection).toMap),
range = Some(effectiveLeftRange))

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (10)
spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala (1)

28-28: Fix typo in test name.

There's a typo in "hetergeneous".

-  it should "test hetergeneous partition columns" in {
+  it should "test heterogeneous partition columns" in {
spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala (1)

78-79: Add newline at end of file.

Missing newline at end of file as flagged by Scala Fmt.

}
+
🧰 Tools
🪛 GitHub Actions: Scala Fmt

[error] 79-79: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala (1)

96-96: Add newline at end of file.

Missing newline at end of file as flagged by Scala Fmt.

}
+
spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (2)

138-152: Consider completing TODO comment.

There's a commented test section with a TODO about revisiting in a "logger world".


153-154: Add newline at end of file.

Missing newline at end of file as flagged by Scala Fmt.

}
+
🧰 Tools
🪛 GitHub Actions: Scala Fmt

[error] 154-154: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (2)

64-65: Remove commented code.

Remove or uncomment this debug println statement.

-    // println("Rupee Source start partition $month")

119-122: Fix multiline formatting.

The Scala Fmt pipeline reports formatting issues here.

-    val runner1 = new ai.chronon.spark.Join(joinConf = joinConf, endPartition = tableUtils.partitionSpec.minus(today, new Window(40, TimeUnit.DAYS)), tableUtils = tableUtils)
+    val runner1 = new ai.chronon.spark.Join(
+      joinConf = joinConf, 
+      endPartition = tableUtils.partitionSpec.minus(today, new Window(40, TimeUnit.DAYS)), 
+      tableUtils = tableUtils)
🧰 Tools
🪛 GitHub Actions: Scala Fmt

[error] 119-122: Code formatting issue: multiline formatting changes detected.

spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala (2)

1-152: Test looks robust. Add newline at end of file.

Test effectively validates joins across datasets with different partition columns and formats.

 }
+
🧰 Tools
🪛 GitHub Actions: Scala Fmt

[error] 152-152: Missing newline at end of file.


136-150: Consider enabling or removing commented test code.

This commented test case appears to validate behavior when input/output tables have identical partitions.

Either uncomment and implement this test or remove the commented code if not needed.

spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (1)

1-109: Well-structured base class. Add newline at end of file.

Base class provides comprehensive utilities for join testing.

 }
+
🧰 Tools
🪛 GitHub Actions: Scala Fmt

[error] 109-109: Missing newline at end of file.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 1f43a7a and 0c5d495.

📒 Files selected for processing (20)
  • .gitignore (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (0 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala (1 hunks)
💤 Files with no reviewable changes (1)
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala
✅ Files skipped from review due to trivial changes (1)
  • .gitignore
🧰 Additional context used
🧬 Code Graph Analysis (3)
spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala (2)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
  • ScalaJavaConversions (6-97)
spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (2)
  • TestRow (28-28)
  • TestRow (30-37)
spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala (4)
api/src/main/scala/ai/chronon/api/Builders.scala (5)
  • Builders (27-371)
  • Source (106-140)
  • Selects (29-39)
  • exprs (34-38)
  • MetaData (261-315)
spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala (1)
  • DataFrameGen (39-180)
api/src/main/scala/ai/chronon/api/DataType.scala (1)
  • StringType (152-152)
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)
  • computeJoin (229-231)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala (4)
api/src/main/scala/ai/chronon/api/Builders.scala (4)
  • Builders (27-371)
  • Source (106-140)
  • Selects (29-39)
  • MetaData (261-315)
spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala (1)
  • DataFrameGen (39-180)
spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (1)
  • getEventsEventsTemporal (91-108)
spark/src/main/scala/ai/chronon/spark/Comparison.scala (2)
  • Comparison (31-126)
  • sideBySide (63-120)
🪛 GitHub Actions: Scala Fmt
spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala

[error] 44-44: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala

[error] 135-138: Code formatting issue: multiline formatting changes detected.

spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala

[error] 81-81: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala

[error] 66-66: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala

[error] 114-115: Code formatting issue: multiline formatting changes detected.


[error] 121-121: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala

[error] 154-154: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala

[error] 154-154: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala

[error] 152-152: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala

[error] 85-85: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala

[error] 81-81: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala

[error] 119-122: Code formatting issue: multiline formatting changes detected.

spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala

[error] 79-79: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala

[error] 108-108: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala

[error] 76-76: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala

[error] 36-38: Code formatting issue: multiline formatting changes detected.


[error] 77-78: Code formatting issue: multiline formatting changes detected.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala

[error] 97-97: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala

[error] 102-102: Missing newline at end of file.

spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala

[error] 109-109: Missing newline at end of file.

⏰ Context from checks skipped due to timeout of 90000ms (30)
  • GitHub Check: cloud_aws_tests
  • GitHub Check: service_tests
  • GitHub Check: service_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: online_tests
  • GitHub Check: streaming_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: streaming_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: online_tests
  • GitHub Check: spark_tests
  • GitHub Check: spark_tests
  • GitHub Check: flink_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: join_tests
  • GitHub Check: join_tests
  • GitHub Check: api_tests
  • GitHub Check: groupby_tests
  • GitHub Check: api_tests
  • GitHub Check: groupby_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: flink_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: batch_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: batch_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (10)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala (1)

27-101: Good test structure for temporal joins.

The test thoroughly validates temporal join between events datasets using both the Chronon API and SQL-based validation.

spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala (1)

29-152: Good test for heterogeneous partition formats.

The test correctly validates joins with different partition columns and formats.

spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala (1)

32-80: Good bloom filter backfill test.

Test effectively validates bloom filter threshold behavior.

🧰 Tools
🪛 GitHub Actions: Scala Fmt

[error] 36-38: Code formatting issue: multiline formatting changes detected.


[error] 77-78: Code formatting issue: multiline formatting changes detected.

spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala (1)

29-106: Good test for joins without aggregation.

The test properly validates joins with no-aggregation scenarios.

spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala (1)

1-78: Test logic is sound and well structured.

The test case properly validates join behavior when key mapping overlaps with a grouping field, with clear setup and assertions.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala (1)

1-96: Test logic looks correct.

Test effectively verifies snapshot join functionality between events datasets.

spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (1)

38-40: Partition format is explicitly set.

Noting that this table uses explicit "yyyyMMdd" format while other tables use the default format.

spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (3)

28-37: Good use of implicit ordering.

Ordering implementation enables consistent comparison in test results.


54-89: Clear data generation method.

Method effectively creates test data with configurable cumulative behavior.


91-108: Concise join configuration builder.

Method appropriately configures temporal join tests with duplicated events.

Comment on lines 134 to 138
val joinJob = new ai.chronon.spark.Join(joinConf = joinConf,
endPartition = today,
tableUtils = tableUtils,
selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1")))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix multiline formatting.

Format constructor parameters consistently according to style guidelines.

-    val joinJob = new ai.chronon.spark.Join(joinConf = joinConf,
-                         endPartition = today,
-                         tableUtils = tableUtils,
-                         selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1")))
+    val joinJob = new ai.chronon.spark.Join(
+      joinConf = joinConf,
+      endPartition = today,
+      tableUtils = tableUtils,
+      selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1")))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val joinJob = new ai.chronon.spark.Join(joinConf = joinConf,
endPartition = today,
tableUtils = tableUtils,
selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1")))
val joinJob = new ai.chronon.spark.Join(
joinConf = joinConf,
endPartition = today,
tableUtils = tableUtils,
selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1")))
🧰 Tools
🪛 GitHub Actions: Scala Fmt

[error] 135-138: Code formatting issue: multiline formatting changes detected.

Comment on lines 77 to 78
val skipBloomComputed = new ai.chronon.spark.Join(joinConf = joinConf, endPartition = today, tableUtils = testTableUtils).computeJoin()
val leftSideCount = testSpark.sql(s"SELECT item, ts, ds from $itemQueriesTable where ds >= '$start'").count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix multi-line formatting.

The line is too long and triggering a formatting error.

-    val skipBloomComputed = new ai.chronon.spark.Join(joinConf = joinConf, endPartition = today, tableUtils = testTableUtils).computeJoin()
+    val skipBloomComputed = new ai.chronon.spark.Join(
+      joinConf = joinConf, 
+      endPartition = today, 
+      tableUtils = testTableUtils
+    ).computeJoin()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val skipBloomComputed = new ai.chronon.spark.Join(joinConf = joinConf, endPartition = today, tableUtils = testTableUtils).computeJoin()
val leftSideCount = testSpark.sql(s"SELECT item, ts, ds from $itemQueriesTable where ds >= '$start'").count()
val skipBloomComputed = new ai.chronon.spark.Join(
joinConf = joinConf,
endPartition = today,
tableUtils = testTableUtils
).computeJoin()
val leftSideCount = testSpark.sql(s"SELECT item, ts, ds from $itemQueriesTable where ds >= '$start'").count()
🧰 Tools
🪛 GitHub Actions: Scala Fmt

[error] 77-78: Code formatting issue: multiline formatting changes detected.

Comment on lines 35 to 38
submission.SparkSessionBuilder.build("JoinTest",
local = true,
additionalConfig =
Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix multi-line formatting.

Adjust the indentation to match Scala formatting standards.

-    val testSpark: SparkSession =
-      submission.SparkSessionBuilder.build("JoinTest",
-                                         local = true,
-                                         additionalConfig =
-                                           Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100")))
+    val testSpark: SparkSession =
+      submission.SparkSessionBuilder.build(
+        "JoinTest",
+        local = true,
+        additionalConfig = Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100"))
+      )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
submission.SparkSessionBuilder.build("JoinTest",
local = true,
additionalConfig =
Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100")))
val testSpark: SparkSession =
submission.SparkSessionBuilder.build(
"JoinTest",
local = true,
additionalConfig = Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100"))
)
🧰 Tools
🪛 GitHub Actions: Scala Fmt

[error] 36-38: Code formatting issue: multiline formatting changes detected.

Comment on lines +73 to +77
setups = Seq(
"create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Duplicate UDF creation.

The same UDF temp_replace_right_c is created twice.

          setups = Seq(
            "create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
            "create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
-            "create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'"
          )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
setups = Seq(
"create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'"
)
setups = Seq(
"create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'"
)

import ai.chronon.spark.Extensions._
import org.junit.Assert._

class EventsEventsCumulativeTest extends BaseJoinTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
.github/workflows/test_scala_2_12_spark.yaml (2)

100-100: Runner label invalid

🧰 Tools
🪛 actionlint (1.7.4)

100-100: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


126-126: Runner label invalid

🧰 Tools
🪛 actionlint (1.7.4)

126-126: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🧹 Nitpick comments (1)
.github/workflows/test_scala_2_12_spark.yaml (1)

21-150: Consolidate job blocks
Use a matrix or YAML anchors for shared runs-on, container, and steps to avoid duplication.

🧰 Tools
🪛 actionlint (1.7.4)

23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


48-48: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


74-74: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


100-100: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


126-126: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 0c5d495 and 00d5eab.

📒 Files selected for processing (2)
  • .github/workflows/test_scala_2_12_spark.yaml (3 hunks)
  • .github/workflows/test_scala_2_13_spark.yaml (3 hunks)
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/test_scala_2_12_spark.yaml

74-74: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


100-100: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


126-126: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

.github/workflows/test_scala_2_13_spark.yaml

77-77: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


104-104: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


131-131: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🔇 Additional comments (3)
.github/workflows/test_scala_2_13_spark.yaml (2)

130-131:

❓ Verification inconclusive

Unknown runner label
ubuntu_32_core_128gb is not a standard GitHub-hosted label. Confirm it’s registered on your self-hosted runners or update to a valid label to avoid workflow failures.


Unrecognized runner label
ubuntu_32_core_128gb isn’t a standard GitHub-hosted runner. If this is meant to target a self-hosted runner, confirm you’ve registered that label; otherwise switch to a valid label (e.g. ubuntu-latest) to avoid failures.

• File: .github/workflows/test_scala_2_13_spark.yaml
• Lines: 130–131

🧰 Tools
🪛 actionlint (1.7.4)

131-131: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


103-104:

❓ Verification inconclusive

Unknown runner label
ubuntu_32_core_128gb is not a standard GitHub-hosted label. Confirm it’s registered on your self-hosted runners or update to a valid label to avoid workflow failures.


Confirm self-hosted runner label or use a valid GitHub-hosted runner
The runs-on: ubuntu_32_core_128gb label isn’t a standard GitHub-hosted runner.
• Ensure a self-hosted runner is registered with that exact label, or
• Switch to a valid GitHub label (e.g., ubuntu-latest, ubuntu-20.04) to avoid failures.

Location: .github/workflows/test_scala_2_13_spark.yaml lines 103–104

🧰 Tools
🪛 actionlint (1.7.4)

104-104: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

.github/workflows/test_scala_2_12_spark.yaml (1)

74-74: Runner label invalid
actionlint warns ubuntu_32_core_128gb is unrecognized. Confirm self-hosted config or switch to a supported runner.

🧰 Tools
🪛 actionlint (1.7.4)

74-74: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (1)

135-149: Commented code needs attention.

Consider either implementing or removing the commented test case. The TODO at line 138 suggests revisiting "in a logger world."

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 00d5eab and 147ae2c.

📒 Files selected for processing (1)
  • spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (28)
  • GitHub Check: service_tests
  • GitHub Check: streaming_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: service_tests
  • GitHub Check: groupby_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: streaming_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: groupby_tests
  • GitHub Check: spark_tests
  • GitHub Check: online_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: flink_tests
  • GitHub Check: flink_tests
  • GitHub Check: join_tests
  • GitHub Check: spark_tests
  • GitHub Check: api_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: api_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: batch_tests
  • GitHub Check: batch_tests
🔇 Additional comments (3)
spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (3)

32-54: Good use of explicit partition format.

Line 39 explicitly sets partitionFormat = Some("yyyyMMdd") for weight data, aligning with PR's goal of handling different partition formats.


55-73: Missing partition format for height data.

Height data creation doesn't specify a partition format, unlike weight data. This difference is likely intentional to test heterogeneous partition formats.


80-89: Good partition specs usage.

Lines 80-81 use tableUtils.partitionSpec.minus() for date calculations, demonstrating the improved partition specification translation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (1)

75-77: Fix duplicate UDF creation.

The UDF temp_replace_right_c is created twice.

          setups = Seq(
            "create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
            "create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
-            "create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'"
          )
🧹 Nitpick comments (2)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (2)

64-64: Remove commented debug print.

Remove unused comment.

-    // println("Rupee Source start partition $month")

31-235: Extract test logic into smaller methods.

The test method is very long. Consider decomposing into smaller methods.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 147ae2c and 0744ccc.

📒 Files selected for processing (45)
  • aggregator/src/main/scala/ai/chronon/aggregator/base/MinHeap.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala (2 hunks)
  • api/python/test/sample/scripts/data-loader.scala (1 hunks)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/HudiTableUtilsTest.scala (1 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (1 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpApiImpl.scala (1 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (3 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryExternalTest.scala (1 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala (4 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/metrics/FlexibleExecutionContext.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/stats/AssignIntervalsTest.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (7 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (7 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/CreationUtils.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala (8 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/batch/ModularJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala (1 hunks)
✅ Files skipped from review due to trivial changes (24)
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
  • online/src/main/scala/ai/chronon/online/metrics/FlexibleExecutionContext.scala
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/HudiTableUtilsTest.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/CreationUtils.scala
  • api/python/test/sample/scripts/data-loader.scala
  • spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryExternalTest.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
  • online/src/test/scala/ai/chronon/online/test/stats/AssignIntervalsTest.scala
  • spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala
  • flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/batch/ModularJoinTest.scala
  • spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/base/MinHeap.scala
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala
🚧 Files skipped from review as they are similar to previous changes (19)
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala
⏰ Context from checks skipped due to timeout of 90000ms (27)
  • GitHub Check: flink_tests
  • GitHub Check: online_tests
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: service_tests
  • GitHub Check: join_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: spark_tests
  • GitHub Check: online_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: groupby_tests
  • GitHub Check: join_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: groupby_tests
  • GitHub Check: service_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: flink_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: batch_tests
  • GitHub Check: batch_tests
  • GitHub Check: aggregator_tests
🔇 Additional comments (2)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpApiImpl.scala (1)

193-193: Comment formatting improved.

Space added after comment marker follows standard style conventions.

spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (1)

1-236: LGTM except for minor issues.

The test looks comprehensive and tests important edge cases including join part offset handling.

@@ -267,6 +267,28 @@ object Extensions {
logger.info(s"schema: ${df.schema.fieldNames.mkString("Array(", ", ", ")")}")
df.replaceWithReadableTime(availableColumns, dropOriginal = true).show(truncate = false)
}

def translatePartitionSpec(existingSpec: PartitionSpec, newSpec: PartitionSpec): DataFrame = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@nikhil-zlai nikhil-zlai merged commit 487d4d3 into main May 8, 2025
35 checks passed
@nikhil-zlai nikhil-zlai deleted the vz/add_test_case_for_different_partition_formats branch May 8, 2025 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants