-
Notifications
You must be signed in to change notification settings - Fork 1
Vz/add test case for different partition formats #753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vz/add test case for different partition formats #753
Conversation
WalkthroughThis update introduces partition spec translation capabilities across the codebase. New methods for converting partition strings and ranges between differing partition specs are added, and all relevant Spark join, groupby, and catalog logic is refactored to use these translations. Extensive new and refactored tests validate heterogeneous partition handling. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant SparkJob
participant TableUtils
participant PartitionSpec
participant PartitionRange
User->>SparkJob: Initiate join/groupby with sources
SparkJob->>TableUtils: Request partitions for source(s)
TableUtils->>PartitionRange: .translate(targetSpec)
PartitionRange->>PartitionSpec: .translate(date, targetSpec)
PartitionSpec-->>PartitionRange: Translated date string
PartitionRange-->>TableUtils: New PartitionRange with translated bounds
TableUtils-->>SparkJob: Partitions in unified spec
SparkJob->>SparkJob: Execute logic with aligned partitions
Possibly related PRs
Suggested reviewers
Poem
Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
⏰ Context from checks skipped due to timeout of 90000ms (29)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (2)
api/src/main/scala/ai/chronon/api/DataRange.scala (1)
91-96
:⚠️ Potential issueValidation removal risks NPE / infinite stream
Commenting out the
require(wellDefined, ...)
allowsstart
orend
to be null, after which
Stream.iterate(start)(partitionSpec.after)
will throw or loop forever.Please restore the check (or an equivalent guard) before producing the stream.
-// TODO: UNDO the comment below -// require(wellDefined, s"Invalid partition range $this") +require(wellDefined, s"Invalid partition range $this")spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
606-608
:⚠️ Potential issuePartition-spec mismatch leads to wrong window calculation.
minQuery
is derived withtableUtils.partitionSpec
butqueryStart
is already insource.partitionSpec
→ incorrect date math for non-default specs.- val minQuery = tableUtils.partitionSpec.before(queryStart) - val windowStart: String = window.map(tableUtils.partitionSpec.minus(minQuery, _)).orNull + val minQuery = source.partitionSpec.before(queryStart) + val windowStart: String = window.map(source.partitionSpec.minus(minQuery, _)).orNull
🧹 Nitpick comments (10)
api/src/main/scala/ai/chronon/api/PartitionSpec.scala (1)
92-95
: Null safety & early-exit for translateIf
date
is null the current call willsdf.parse(null)
and NPE.
Consider guarding nulls and skipping work when the target spec is identical.- def translate(date: String, targetSpec: PartitionSpec): String = { - val millis = epochMillis(date) - targetSpec.at(millis) - } + def translate(date: String, targetSpec: PartitionSpec): String = + Option(date) + .map(d => if (this eq targetSpec) d else targetSpec.at(epochMillis(d))) + .orNullapi/src/main/scala/ai/chronon/api/DataRange.scala (1)
156-162
: translate could be tiny bit tighterYou already have null-handling; micro-saving: skip translation when specs match.
- val newStart = Option(start).map(d => partitionSpec.translate(d, otherSpec)).orNull + val newStart = + if (partitionSpec eq otherSpec) start else Option(start).map(partitionSpec.translate(_, otherSpec)).orNullspark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)
291-304
: Range expansion may blow up memory
rangeToFill.partitions
materialises every partition between start & end before filtering; for long backfills this can be millions of strings.Consider streaming or early-filtering instead:
val fillable = tableUtils .partitions(...) .filter(p => p >= rangeToFill.start && p <= rangeToFill.end) require(fillable.nonEmpty, "…")spark/src/main/scala/ai/chronon/spark/Extensions.scala (3)
20-23
: Super-nit: redundant wildcard import.
ScalaJavaConversions._
already bringsSourceOps
&WindowOps
; the extra explicit import is unnecessary.
270-291
: Null-safe parse & perf edge-case intranslatePartitionSpec
.
to_date
returnsnull
for unparsable strings; consider guarding withwhen(col.isNotNull, …)
.to_date
truncates any time component – fine for daily partitions, risky otherwise. Clarify assumption or handle timestamps.
311-328
: Minor: avoid repeatedOption(...).getOrElse
pattern.
Storingval q = source.query
once reduces allocations/readability hit.spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (1)
679-869
: Spelling & verbosity in new test.
Method namehetergeneous
→heterogeneous
; lots ofprintln
slows CI – consider removing.spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (3)
240-242
:show()
triggers expensive actions
dfRearranged.show()
executes a full scan; remove or guard with a debug flag.
270-271
: Second eagershow()
Same performance hit for
finalizedDf.show()
. Drop it or wrap inif (logger.isDebugEnabled)
.
392-403
: Cross-product loop may explodeNested loops over
inputPartitionColumnNames
×inputPartitionSpecs
×inputTables
can duplicate work.
Consider zipping specs with columns or passing a map.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
api/src/main/scala/ai/chronon/api/DataRange.scala
(2 hunks)api/src/main/scala/ai/chronon/api/PartitionSpec.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/Analyzer.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/Extensions.scala
(3 hunks)spark/src/main/scala/ai/chronon/spark/GroupBy.scala
(6 hunks)spark/src/main/scala/ai/chronon/spark/JoinBase.scala
(3 hunks)spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
(4 hunks)spark/src/main/scala/ai/chronon/spark/catalog/Hive.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala
(8 hunks)spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala
(3 hunks)spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala
(2 hunks)spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala
(3 hunks)spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala
(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (4)
api/src/main/scala/ai/chronon/api/PartitionSpec.scala (2)
PartitionSpec
(28-96)PartitionSpec
(98-100)spark/src/main/scala/ai/chronon/spark/Extensions.scala (4)
Extensions
(37-334)partitionRange
(118-121)partitionSpec
(325-327)effectivePartitionColumn
(331-333)spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (4)
TableUtils
(43-611)TableUtils
(613-615)partitions
(129-160)unfilledRanges
(338-425)spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2)
JoinUtils
(39-541)getRangeToFill
(141-175)
⏰ Context from checks skipped due to timeout of 90000ms (31)
- GitHub Check: cloud_gcp_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: service_tests
- GitHub Check: flink_tests
- GitHub Check: aggregator_tests
- GitHub Check: api_tests
- GitHub Check: online_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: service_tests
- GitHub Check: streaming_tests
- GitHub Check: service_commons_tests
- GitHub Check: analyzer_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: api_tests
- GitHub Check: groupby_tests
- GitHub Check: online_tests
- GitHub Check: join_tests
- GitHub Check: flink_tests
- GitHub Check: fetcher_tests
- GitHub Check: spark_tests
- GitHub Check: aggregator_tests
- GitHub Check: batch_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: fetcher_tests
- GitHub Check: streaming_tests
- GitHub Check: groupby_tests
- GitHub Check: analyzer_tests
- GitHub Check: spark_tests
- GitHub Check: join_tests
- GitHub Check: batch_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (11)
spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala (2)
288-288
: Updated method call for consistencyMethod call updated from
getRangesToFill
togetRangeToFill
to match the signature change in the implementation.
305-305
: Method signature change properly implementedUpdated call from
getRangesToFill
togetRangeToFill
with the additional parameters required by the new implementation.spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)
267-267
: Updated to use renamed method with consistent parametersMethod call changed from
getRangesToFill
togetRangeToFill
along with proper parameter inclusion, consistent with the API changes.spark/src/main/scala/ai/chronon/spark/catalog/Hive.scala (1)
20-24
: Improved code readabilityIntroduced a local variable to store intermediate results instead of method chaining. Makes code slightly more readable without changing functionality.
spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala (2)
92-93
: Added partition specification to test data generationNow explicitly using TableUtils to provide partition column and format when generating test data.
105-106
: Consistent partition handling for weight table test dataAdded explicit partition column and format for weight table data generation, maintaining consistency with the first table.
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)
142-149
: MissinginputPartitionSpecs
→ inconsistent gap detection
unfilledRanges
elsewhere now receives bothinputPartitionColumnNames
&inputPartitionSpecs
.
Here only column names are sent; if the table’s spec differs from the global one, gaps can be mis-computed.- inputPartitionColumnNames = Seq(joinConfCloned.left.query.effectivePartitionColumn) + inputPartitionColumnNames = Seq(joinConfCloned.left.query.effectivePartitionColumn), + inputPartitionSpecs = Seq(joinConfCloned.left.partitionSpec)spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
698-707
: Nice touch on spec translation.
Translating the scanned DF back to the global spec keeps downstream logic untouched.spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala (2)
316-324
: 👍 Explicit partition format in test increases coverage.
366-370
: Future-datedstartPartition
looks odd.
Starting a backfill two days in the future may hide bugs; confirm intent.spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)
106-107
: Verify translated DF contains global partition colAfter
translatePartitionSpec
, ensure the new column actually exists and duplicates are dropped.
def intervalWindow: Window = { | ||
if (spanMillis == WindowUtils.Day.millis) WindowUtils.Day | ||
else if (spanMillis == WindowUtils.Hour.millis) WindowUtils.Hour | ||
else | ||
throw new UnsupportedOperationException( | ||
s"Partition Intervals should be either hour or day - found ${spanMillis / 60 * 1000} minutes") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect minute calculation in error message
spanMillis / 60 * 1000
first divides by 60 ms, then multiplies, giving a bogus number.
Use millisecond-to-minute conversion: spanMillis / (60 * 1000)
.
- s"Partition Intervals should be either hour or day - found ${spanMillis / 60 * 1000} minutes")
+ s"Partition intervals must be hour or day – got ${(spanMillis / 60000)} minutes")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def intervalWindow: Window = { | |
if (spanMillis == WindowUtils.Day.millis) WindowUtils.Day | |
else if (spanMillis == WindowUtils.Hour.millis) WindowUtils.Hour | |
else | |
throw new UnsupportedOperationException( | |
s"Partition Intervals should be either hour or day - found ${spanMillis / 60 * 1000} minutes") | |
} | |
def intervalWindow: Window = { | |
if (spanMillis == WindowUtils.Day.millis) WindowUtils.Day | |
else if (spanMillis == WindowUtils.Hour.millis) WindowUtils.Hour | |
else | |
throw new UnsupportedOperationException( | |
s"Partition intervals must be hour or day – got ${(spanMillis / 60000)} minutes") | |
} |
val partitionConditions = intersectedRange.whereClauses | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Filtering also uses the wrong spec.
intersectedRange
is in the source spec, but whereClauses
is rendered with the default tableUtils
spec, yielding empty scans.
- val partitionConditions = intersectedRange.whereClauses
+ val partitionConditions =
+ intersectedRange.translate(tableUtils.partitionSpec).whereClauses
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
val partitionConditions = intersectedRange.whereClauses | |
val partitionConditions = | |
intersectedRange.translate(tableUtils.partitionSpec).whereClauses |
var df = tableUtils.scanDf(joinConf.left.query, | ||
joinConf.left.table, | ||
Some((Map(tableUtils.partitionColumn -> null) ++ timeProjection).toMap), | ||
range = Some(range)) | ||
range = Some(effectiveLeftRange)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partition-column mismatch in scanDf
Using tableUtils.partitionColumn
here breaks when the left table has a custom partition column.
Use effectiveLeftSpec.column
instead.
- Some((Map(tableUtils.partitionColumn -> null) ++ timeProjection).toMap),
+ Some((Map(effectiveLeftSpec.column -> null) ++ timeProjection).toMap),
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
var df = tableUtils.scanDf(joinConf.left.query, | |
joinConf.left.table, | |
Some((Map(tableUtils.partitionColumn -> null) ++ timeProjection).toMap), | |
range = Some(range)) | |
range = Some(effectiveLeftRange)) | |
var df = tableUtils.scanDf(joinConf.left.query, | |
joinConf.left.table, | |
Some((Map(effectiveLeftSpec.column -> null) ++ timeProjection).toMap), | |
range = Some(effectiveLeftRange)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
🧹 Nitpick comments (10)
spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala (1)
28-28
: Fix typo in test name.There's a typo in "hetergeneous".
- it should "test hetergeneous partition columns" in { + it should "test heterogeneous partition columns" in {spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala (1)
78-79
: Add newline at end of file.Missing newline at end of file as flagged by Scala Fmt.
} +
🧰 Tools
🪛 GitHub Actions: Scala Fmt
[error] 79-79: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala (1)
96-96
: Add newline at end of file.Missing newline at end of file as flagged by Scala Fmt.
} +
spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (2)
138-152
: Consider completing TODO comment.There's a commented test section with a TODO about revisiting in a "logger world".
153-154
: Add newline at end of file.Missing newline at end of file as flagged by Scala Fmt.
} +
🧰 Tools
🪛 GitHub Actions: Scala Fmt
[error] 154-154: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (2)
64-65
: Remove commented code.Remove or uncomment this debug println statement.
- // println("Rupee Source start partition $month")
119-122
: Fix multiline formatting.The Scala Fmt pipeline reports formatting issues here.
- val runner1 = new ai.chronon.spark.Join(joinConf = joinConf, endPartition = tableUtils.partitionSpec.minus(today, new Window(40, TimeUnit.DAYS)), tableUtils = tableUtils) + val runner1 = new ai.chronon.spark.Join( + joinConf = joinConf, + endPartition = tableUtils.partitionSpec.minus(today, new Window(40, TimeUnit.DAYS)), + tableUtils = tableUtils)🧰 Tools
🪛 GitHub Actions: Scala Fmt
[error] 119-122: Code formatting issue: multiline formatting changes detected.
spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala (2)
1-152
: Test looks robust. Add newline at end of file.Test effectively validates joins across datasets with different partition columns and formats.
} +
🧰 Tools
🪛 GitHub Actions: Scala Fmt
[error] 152-152: Missing newline at end of file.
136-150
: Consider enabling or removing commented test code.This commented test case appears to validate behavior when input/output tables have identical partitions.
Either uncomment and implement this test or remove the commented code if not needed.
spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (1)
1-109
: Well-structured base class. Add newline at end of file.Base class provides comprehensive utilities for join testing.
} +
🧰 Tools
🪛 GitHub Actions: Scala Fmt
[error] 109-109: Missing newline at end of file.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (20)
.gitignore
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala
(0 hunks)spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala
(1 hunks)
💤 Files with no reviewable changes (1)
- spark/src/test/scala/ai/chronon/spark/test/join/JoinTest.scala
✅ Files skipped from review due to trivial changes (1)
- .gitignore
🧰 Additional context used
🧬 Code Graph Analysis (3)
spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala (2)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
ScalaJavaConversions
(6-97)spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (2)
TestRow
(28-28)TestRow
(30-37)
spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala (4)
api/src/main/scala/ai/chronon/api/Builders.scala (5)
Builders
(27-371)Source
(106-140)Selects
(29-39)exprs
(34-38)MetaData
(261-315)spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala (1)
DataFrameGen
(39-180)api/src/main/scala/ai/chronon/api/DataType.scala (1)
StringType
(152-152)spark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)
computeJoin
(229-231)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala (4)
api/src/main/scala/ai/chronon/api/Builders.scala (4)
Builders
(27-371)Source
(106-140)Selects
(29-39)MetaData
(261-315)spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala (1)
DataFrameGen
(39-180)spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (1)
getEventsEventsTemporal
(91-108)spark/src/main/scala/ai/chronon/spark/Comparison.scala (2)
Comparison
(31-126)sideBySide
(63-120)
🪛 GitHub Actions: Scala Fmt
spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala
[error] 44-44: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala
[error] 135-138: Code formatting issue: multiline formatting changes detected.
spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala
[error] 81-81: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala
[error] 66-66: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala
[error] 114-115: Code formatting issue: multiline formatting changes detected.
[error] 121-121: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala
[error] 154-154: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala
[error] 154-154: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala
[error] 152-152: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala
[error] 85-85: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala
[error] 81-81: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala
[error] 119-122: Code formatting issue: multiline formatting changes detected.
spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala
[error] 79-79: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala
[error] 108-108: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala
[error] 76-76: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala
[error] 36-38: Code formatting issue: multiline formatting changes detected.
[error] 77-78: Code formatting issue: multiline formatting changes detected.
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala
[error] 97-97: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala
[error] 102-102: Missing newline at end of file.
spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala
[error] 109-109: Missing newline at end of file.
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: cloud_aws_tests
- GitHub Check: service_tests
- GitHub Check: service_tests
- GitHub Check: analyzer_tests
- GitHub Check: service_commons_tests
- GitHub Check: analyzer_tests
- GitHub Check: online_tests
- GitHub Check: streaming_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: streaming_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: online_tests
- GitHub Check: spark_tests
- GitHub Check: spark_tests
- GitHub Check: flink_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: join_tests
- GitHub Check: join_tests
- GitHub Check: api_tests
- GitHub Check: groupby_tests
- GitHub Check: api_tests
- GitHub Check: groupby_tests
- GitHub Check: aggregator_tests
- GitHub Check: fetcher_tests
- GitHub Check: flink_tests
- GitHub Check: fetcher_tests
- GitHub Check: batch_tests
- GitHub Check: aggregator_tests
- GitHub Check: batch_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (10)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala (1)
27-101
: Good test structure for temporal joins.The test thoroughly validates temporal join between events datasets using both the Chronon API and SQL-based validation.
spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala (1)
29-152
: Good test for heterogeneous partition formats.The test correctly validates joins with different partition columns and formats.
spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala (1)
32-80
: Good bloom filter backfill test.Test effectively validates bloom filter threshold behavior.
🧰 Tools
🪛 GitHub Actions: Scala Fmt
[error] 36-38: Code formatting issue: multiline formatting changes detected.
[error] 77-78: Code formatting issue: multiline formatting changes detected.
spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala (1)
29-106
: Good test for joins without aggregation.The test properly validates joins with no-aggregation scenarios.
spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala (1)
1-78
: Test logic is sound and well structured.The test case properly validates join behavior when key mapping overlaps with a grouping field, with clear setup and assertions.
spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala (1)
1-96
: Test logic looks correct.Test effectively verifies snapshot join functionality between events datasets.
spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (1)
38-40
: Partition format is explicitly set.Noting that this table uses explicit "yyyyMMdd" format while other tables use the default format.
spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala (3)
28-37
: Good use of implicit ordering.Ordering implementation enables consistent comparison in test results.
54-89
: Clear data generation method.Method effectively creates test data with configurable cumulative behavior.
91-108
: Concise join configuration builder.Method appropriately configures temporal join tests with duplicated events.
spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala
Outdated
Show resolved
Hide resolved
val joinJob = new ai.chronon.spark.Join(joinConf = joinConf, | ||
endPartition = today, | ||
tableUtils = tableUtils, | ||
selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1"))) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix multiline formatting.
Format constructor parameters consistently according to style guidelines.
- val joinJob = new ai.chronon.spark.Join(joinConf = joinConf,
- endPartition = today,
- tableUtils = tableUtils,
- selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1")))
+ val joinJob = new ai.chronon.spark.Join(
+ joinConf = joinConf,
+ endPartition = today,
+ tableUtils = tableUtils,
+ selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1")))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
val joinJob = new ai.chronon.spark.Join(joinConf = joinConf, | |
endPartition = today, | |
tableUtils = tableUtils, | |
selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1"))) | |
val joinJob = new ai.chronon.spark.Join( | |
joinConf = joinConf, | |
endPartition = today, | |
tableUtils = tableUtils, | |
selectedJoinParts = Some(List("user1_unit_test_item_views_selected_join_parts_1"))) |
🧰 Tools
🪛 GitHub Actions: Scala Fmt
[error] 135-138: Code formatting issue: multiline formatting changes detected.
spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala
Outdated
Show resolved
Hide resolved
val skipBloomComputed = new ai.chronon.spark.Join(joinConf = joinConf, endPartition = today, tableUtils = testTableUtils).computeJoin() | ||
val leftSideCount = testSpark.sql(s"SELECT item, ts, ds from $itemQueriesTable where ds >= '$start'").count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix multi-line formatting.
The line is too long and triggering a formatting error.
- val skipBloomComputed = new ai.chronon.spark.Join(joinConf = joinConf, endPartition = today, tableUtils = testTableUtils).computeJoin()
+ val skipBloomComputed = new ai.chronon.spark.Join(
+ joinConf = joinConf,
+ endPartition = today,
+ tableUtils = testTableUtils
+ ).computeJoin()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
val skipBloomComputed = new ai.chronon.spark.Join(joinConf = joinConf, endPartition = today, tableUtils = testTableUtils).computeJoin() | |
val leftSideCount = testSpark.sql(s"SELECT item, ts, ds from $itemQueriesTable where ds >= '$start'").count() | |
val skipBloomComputed = new ai.chronon.spark.Join( | |
joinConf = joinConf, | |
endPartition = today, | |
tableUtils = testTableUtils | |
).computeJoin() | |
val leftSideCount = testSpark.sql(s"SELECT item, ts, ds from $itemQueriesTable where ds >= '$start'").count() |
🧰 Tools
🪛 GitHub Actions: Scala Fmt
[error] 77-78: Code formatting issue: multiline formatting changes detected.
submission.SparkSessionBuilder.build("JoinTest", | ||
local = true, | ||
additionalConfig = | ||
Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix multi-line formatting.
Adjust the indentation to match Scala formatting standards.
- val testSpark: SparkSession =
- submission.SparkSessionBuilder.build("JoinTest",
- local = true,
- additionalConfig =
- Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100")))
+ val testSpark: SparkSession =
+ submission.SparkSessionBuilder.build(
+ "JoinTest",
+ local = true,
+ additionalConfig = Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100"))
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
submission.SparkSessionBuilder.build("JoinTest", | |
local = true, | |
additionalConfig = | |
Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100"))) | |
val testSpark: SparkSession = | |
submission.SparkSessionBuilder.build( | |
"JoinTest", | |
local = true, | |
additionalConfig = Some(Map("spark.chronon.backfill.bloomfilter.threshold" -> "100")) | |
) |
🧰 Tools
🪛 GitHub Actions: Scala Fmt
[error] 36-38: Code formatting issue: multiline formatting changes detected.
spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala
Outdated
Show resolved
Hide resolved
setups = Seq( | ||
"create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'", | ||
"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'", | ||
"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate UDF creation.
The same UDF temp_replace_right_c
is created twice.
setups = Seq(
"create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'",
- "create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'"
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
setups = Seq( | |
"create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'", | |
"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'", | |
"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'" | |
) | |
setups = Seq( | |
"create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'", | |
"create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'" | |
) |
import ai.chronon.spark.Extensions._ | ||
import org.junit.Assert._ | ||
|
||
class EventsEventsCumulativeTest extends BaseJoinTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
.github/workflows/test_scala_2_12_spark.yaml (2)
100-100
: Runner label invalid🧰 Tools
🪛 actionlint (1.7.4)
100-100: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
126-126
: Runner label invalid🧰 Tools
🪛 actionlint (1.7.4)
126-126: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
🧹 Nitpick comments (1)
.github/workflows/test_scala_2_12_spark.yaml (1)
21-150
: Consolidate job blocks
Use a matrix or YAML anchors for sharedruns-on
,container
, and steps to avoid duplication.🧰 Tools
🪛 actionlint (1.7.4)
23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
48-48: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
74-74: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
100-100: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
126-126: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
.github/workflows/test_scala_2_12_spark.yaml
(3 hunks).github/workflows/test_scala_2_13_spark.yaml
(3 hunks)
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/test_scala_2_12_spark.yaml
74-74: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
100-100: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
126-126: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
.github/workflows/test_scala_2_13_spark.yaml
77-77: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
104-104: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
131-131: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
🔇 Additional comments (3)
.github/workflows/test_scala_2_13_spark.yaml (2)
130-131
:❓ Verification inconclusive
Unknown runner label
ubuntu_32_core_128gb
is not a standard GitHub-hosted label. Confirm it’s registered on your self-hosted runners or update to a valid label to avoid workflow failures.
Unrecognized runner label
ubuntu_32_core_128gb
isn’t a standard GitHub-hosted runner. If this is meant to target a self-hosted runner, confirm you’ve registered that label; otherwise switch to a valid label (e.g.ubuntu-latest
) to avoid failures.• File: .github/workflows/test_scala_2_13_spark.yaml
• Lines: 130–131🧰 Tools
🪛 actionlint (1.7.4)
131-131: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
103-104
:❓ Verification inconclusive
Unknown runner label
ubuntu_32_core_128gb
is not a standard GitHub-hosted label. Confirm it’s registered on your self-hosted runners or update to a valid label to avoid workflow failures.
Confirm self-hosted runner label or use a valid GitHub-hosted runner
Theruns-on: ubuntu_32_core_128gb
label isn’t a standard GitHub-hosted runner.
• Ensure a self-hosted runner is registered with that exact label, or
• Switch to a valid GitHub label (e.g.,ubuntu-latest
,ubuntu-20.04
) to avoid failures.Location:
.github/workflows/test_scala_2_13_spark.yaml
lines 103–104🧰 Tools
🪛 actionlint (1.7.4)
104-104: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
.github/workflows/test_scala_2_12_spark.yaml (1)
74-74
: Runner label invalid
actionlint warnsubuntu_32_core_128gb
is unrecognized. Confirm self-hosted config or switch to a supported runner.🧰 Tools
🪛 actionlint (1.7.4)
74-74: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (1)
135-149
: Commented code needs attention.Consider either implementing or removing the commented test case. The TODO at line 138 suggests revisiting "in a logger world."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (28)
- GitHub Check: service_tests
- GitHub Check: streaming_tests
- GitHub Check: service_commons_tests
- GitHub Check: service_tests
- GitHub Check: groupby_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: streaming_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: analyzer_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: groupby_tests
- GitHub Check: spark_tests
- GitHub Check: online_tests
- GitHub Check: analyzer_tests
- GitHub Check: flink_tests
- GitHub Check: flink_tests
- GitHub Check: join_tests
- GitHub Check: spark_tests
- GitHub Check: api_tests
- GitHub Check: join_tests
- GitHub Check: fetcher_tests
- GitHub Check: api_tests
- GitHub Check: aggregator_tests
- GitHub Check: fetcher_tests
- GitHub Check: aggregator_tests
- GitHub Check: batch_tests
- GitHub Check: batch_tests
🔇 Additional comments (3)
spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (3)
32-54
: Good use of explicit partition format.Line 39 explicitly sets
partitionFormat = Some("yyyyMMdd")
for weight data, aligning with PR's goal of handling different partition formats.
55-73
: Missing partition format for height data.Height data creation doesn't specify a partition format, unlike weight data. This difference is likely intentional to test heterogeneous partition formats.
80-89
: Good partition specs usage.Lines 80-81 use
tableUtils.partitionSpec.minus()
for date calculations, demonstrating the improved partition specification translation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (1)
75-77
: Fix duplicate UDF creation.The UDF
temp_replace_right_c
is created twice.setups = Seq( "create temporary function temp_replace_right_b as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'", "create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'", - "create temporary function temp_replace_right_c as 'org.apache.hadoop.hive.ql.udf.UDFRegExpReplace'" )
🧹 Nitpick comments (2)
spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (2)
64-64
: Remove commented debug print.Remove unused comment.
- // println("Rupee Source start partition $month")
31-235
: Extract test logic into smaller methods.The test method is very long. Consider decomposing into smaller methods.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (45)
aggregator/src/main/scala/ai/chronon/aggregator/base/MinHeap.scala
(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala
(2 hunks)api/python/test/sample/scripts/data-loader.scala
(1 hunks)cloud_aws/src/test/scala/ai/chronon/integrations/aws/HudiTableUtilsTest.scala
(1 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
(1 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpApiImpl.scala
(1 hunks)cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala
(3 hunks)cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryExternalTest.scala
(1 hunks)cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala
(2 hunks)flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala
(4 hunks)flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
(1 hunks)online/src/main/scala/ai/chronon/online/metrics/FlexibleExecutionContext.scala
(1 hunks)online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
(1 hunks)online/src/test/scala/ai/chronon/online/test/stats/AssignIntervalsTest.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/Analyzer.scala
(2 hunks)spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/GroupBy.scala
(7 hunks)spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
(7 hunks)spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala
(2 hunks)spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/catalog/CreationUtils.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala
(1 hunks)spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala
(8 hunks)spark/src/test/scala/ai/chronon/spark/test/batch/ModularJoinTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala
(1 hunks)spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala
(1 hunks)
✅ Files skipped from review due to trivial changes (24)
- spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala
- spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala
- spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
- online/src/main/scala/ai/chronon/online/metrics/FlexibleExecutionContext.scala
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/HudiTableUtilsTest.scala
- spark/src/main/scala/ai/chronon/spark/catalog/CreationUtils.scala
- api/python/test/sample/scripts/data-loader.scala
- spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
- spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala
- cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryExternalTest.scala
- aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala
- flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
- spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
- online/src/test/scala/ai/chronon/online/test/stats/AssignIntervalsTest.scala
- spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala
- flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala
- online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
- spark/src/test/scala/ai/chronon/spark/test/batch/ModularJoinTest.scala
- spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala
- spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala
- cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala
- aggregator/src/main/scala/ai/chronon/aggregator/base/MinHeap.scala
- cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala
🚧 Files skipped from review as they are similar to previous changes (19)
- spark/src/main/scala/ai/chronon/spark/Analyzer.scala
- spark/src/test/scala/ai/chronon/spark/test/join/EndPartitionJoinTest.scala
- spark/src/main/scala/ai/chronon/spark/GroupBy.scala
- spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala
- spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala
- spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala
- spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala
- spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala
- spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala
- spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala
- spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala
- spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsSnapshotTest.scala
- spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala
- spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala
- spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala
- spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
- spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala
- spark/src/test/scala/ai/chronon/spark/test/join/BaseJoinTest.scala
- spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala
⏰ Context from checks skipped due to timeout of 90000ms (27)
- GitHub Check: flink_tests
- GitHub Check: online_tests
- GitHub Check: streaming_tests
- GitHub Check: streaming_tests
- GitHub Check: service_tests
- GitHub Check: join_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: spark_tests
- GitHub Check: online_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: groupby_tests
- GitHub Check: join_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: analyzer_tests
- GitHub Check: groupby_tests
- GitHub Check: service_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: spark_tests
- GitHub Check: fetcher_tests
- GitHub Check: service_commons_tests
- GitHub Check: fetcher_tests
- GitHub Check: flink_tests
- GitHub Check: analyzer_tests
- GitHub Check: aggregator_tests
- GitHub Check: batch_tests
- GitHub Check: batch_tests
- GitHub Check: aggregator_tests
🔇 Additional comments (2)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpApiImpl.scala (1)
193-193
: Comment formatting improved.Space added after comment marker follows standard style conventions.
spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (1)
1-236
: LGTM except for minor issues.The test looks comprehensive and tests important edge cases including join part offset handling.
@@ -267,6 +267,28 @@ object Extensions { | |||
logger.info(s"schema: ${df.schema.fieldNames.mkString("Array(", ", ", ")")}") | |||
df.replaceWithReadableTime(availableColumns, dropOriginal = true).show(truncate = false) | |||
} | |||
|
|||
def translatePartitionSpec(existingSpec: PartitionSpec, newSpec: PartitionSpec): DataFrame = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳
Summary
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores
.gitignore
to exclude additional local configuration files.Style