fix: remove references to custom json, float essential apis to top #492

nikhil-zlai · 2025-03-10T21:42:44Z

Summary

custom json is not supposed to be relied on in scala code at all - it is purely meant for users to attach THEIR metadata to objects.

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

New Features
- Introduced new configuration flags that improve control over tiling behavior and streaming error handling.
- Expanded metadata support to include additional output partition columns, enhancing data staging and query processing.
Documentation
- Updated instructions for enabling tiled reads to reflect the new configuration parameter.
Tests
- Enhanced test coverage to validate the new serving and tiling configurations.

…evel

coderabbitai · 2025-03-10T21:42:53Z

Walkthrough

The PR removes legacy JSON parsing and team extraction methods from metadata extensions and online endpoints. It replaces obsolete tiling checks with new flag-based methods (e.g., servingFlagValue, tilingFlag, dontThrowOnDecodeFailFlag). Updates occur across API, online, testing, Spark, Flink, Python, Thrift definitions, and documentation. Additionally, the MetaData builder now accepts new parameters for partition columns and execution information.

Changes

Files	Change Summary
`api/.../Extensions.scala` `api/.../Builders.scala`	Removed legacy JSON/team extraction methods; added flag methods to `GroupByOps` and updated MetaData constructor (removed `dependencies`, added `additionalOutputPartitionColumns` and `executionInfo`).
`api/.../test/ExtensionsTest.scala` `spark/.../FetcherTest.scala` `spark/.../StagingQueryTest.scala`	Removed outdated tests; replaced custom JSON tests with serving flag validations; updated Spark session handling and assertions for partition columns & tiling flag.
`api/thrift/api.thrift` `api/thrift/common.thrift`	Modified MetaData structure: replaced/renumbered fields and added serving-related fields; introduced new structs for environment & configuration properties with a serving map.
`online/.../MetadataEndPoint.scala` `online/.../Metrics.scala` `online/.../GroupByServingInfoParsed.scala`	Removed methods for JSON-based team extraction and tiling checks; updated references from `owningTeam` to direct `team` property.
`online/.../fetcher/{FetchContext.scala, GroupByFetcher.scala, GroupByResponseHandler.scala}`	Replaced context-based tiling and error flag checks with operation-specific flag methods; updated conditions in streaming decode error handling.
`spark/.../StagingQuery.scala`	Simplified partition column initialization using new additional output partition columns from metadata.
`flink/.../FlinkRowAggregators.scala`	Added mutation detection logic with `isMutation` and `reversalIndex` to support deletion handling.
`api/py/.../parse_teams.py`	Extended merging logic to include the new `serving` property during team parsing.
`docs/source/Tiled_Architecture.md`	Updated configuration instructions from `enable_tiling=true` to `tiling=true` within `metaData.executionInfo.conf.serving`.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant GroupByOps
    participant MetadataEndpoint
    participant ResponseHandler

    Client->>GroupByOps: Request flag values
    GroupByOps-->>Client: Return servingFlagValue, tilingFlag, dontThrowOnDecodeFailFlag
    Client->>MetadataEndpoint: Get team info (direct access)
    MetadataEndpoint-->>Client: Return team value
    Client->>ResponseHandler: Process streaming decode using tilingFlag
    ResponseHandler-->>Client: Handle errors based on dontThrowOnDecodeFailFlag

Possibly related PRs

Rework BigTableKV Store & GCP Api #135: Addresses similar removals of JSON methods and introduction of flag handling in legacy modules.
refactor: split fetcher logic into multiple files #425: Refactors tiling logic in fetcher classes, aligning with changes in GroupByResponseHandler and GroupByFetcher.
Enable tiling support in Gcp Flink jobs #345: Focuses on enabling tiling support and updating flag checks, closely matching the PR’s modifications.

Suggested reviewers

tchow-zlai
piyush-zlai

Poem

Code refines in flag and tile,
Old paths fade with gentle style.
Tests and docs now sing the tune,
New parameters rise like moon.
Cheers to changes, crisp and bright!
🚀✨

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 93f183a and 45b8964.

📒 Files selected for processing (4)

api/src/main/scala/ai/chronon/api/Extensions.scala (1 hunks)
flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (4 hunks)
online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala

⏰ Context from checks skipped due to timeout of 90000ms (16)

GitHub Check: streaming_tests
GitHub Check: streaming_tests
GitHub Check: analyzer_tests
GitHub Check: analyzer_tests
GitHub Check: groupby_tests
GitHub Check: spark_tests
GitHub Check: spark_tests
GitHub Check: join_tests
GitHub Check: join_tests
GitHub Check: groupby_tests
GitHub Check: fetcher_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: fetcher_tests
GitHub Check: non_spark_tests
GitHub Check: non_spark_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (8)

api/src/main/scala/ai/chronon/api/Extensions.scala (3)

582-592: New flag-based configuration access method.

Clean implementation for retrieving serving flag values from execution metadata.

594-594: Convenient helper for tiling configuration.

Good use of the new servingFlagValue method to check if tiling is enabled.

596-596: Helper for decoding failure behavior.

Provides a clean way to check the decode failure flag setting.

flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (3)

45-50: Added mutation detection logic.

Properly identifies if we're dealing with entity mutation sources.

52-60: Added reversal column index tracking.

Ensures proper validation when using mutation sources.

100-106: Enhanced update logic to handle deletions.

Now correctly differentiates between updates and deletions based on reversal flag.

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala (2)

558-565: Updated metadata configuration to use new flag approach.

Replaces customJson with executionInfo and serving flag configuration.

567-567: Added verification for tiling flag.

Good test assertion to ensure the flag is properly set.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

nikhil-zlai · 2025-03-10T22:25:37Z

spark/src/main/scala/ai/chronon/spark/StagingQuery.scala

-        .getOrElse(new java.util.ArrayList[String]())
-        .asInstanceOf[java.util.ArrayList[String]]
-        .toScala
+    Seq(tableUtils.partitionColumn) ++ stagingQueryConf.metaData.additionalOutputPartitionColumns.toScala


this is the main change that ben needs

david-zlai · 2025-03-10T23:11:50Z

api/thrift/api.thrift

+    5: optional list<string> additionalOutputPartitionColumns
+
+    6: optional map<string, string> tableProperties


Should we keep tableProperties the same field number as before (5)

thought outputPartitionCols are more important than table props. Safe to change these for now actually.

I guess we should call out that folks need to recompile their existing configs right?

good call - we need to release a wheel and cut them over. was working on it separately. (basically I am doing the compile for them)

david-zlai · 2025-03-10T23:12:38Z

api/thrift/api.thrift


    // tag_key -> tag_value - tags allow for repository wide querying, deprecations etc
    // this is object level tag - applies to all columns produced by the object - GroupBy, Join, Model etc
-    6: optional map<string, string> tags
+    20: optional map<string, string> tags


Field number here too?

the spacing in field nums allows for adding things new fields in the right order. so if we find some other thing later, we can add it in the right place instead of in the end.

piyush-zlai

Q on removing some of the fetcher events based code (if tests pass we should probably be ok..)

piyush-zlai · 2025-03-10T22:56:03Z

api/src/main/scala/ai/chronon/api/Extensions.scala

@@ -462,9 +448,9 @@ object Extensions {

    // Check if tiling is enabled for a given GroupBy. Defaults to false if the 'enable_tiling' flag isn't set.
    def isTilingEnabled: Boolean =


can delete I think as this was moved to flagStore

piyush-zlai · 2025-03-11T00:18:16Z

api/thrift/api.thrift

+    5: optional list<string> additionalOutputPartitionColumns
+
+    6: optional map<string, string> tableProperties


I guess we should call out that folks need to recompile their existing configs right?

piyush-zlai · 2025-03-11T02:15:03Z

online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala

-    }
-  }
-
-  private def mergeRawEventsFromStreaming(queryTimeMs: Long,


don't we need this for the cdc side?

i don't think so - we can still tile. we can't deal with undeletable aggs in either case.

so there is a chaining tests that i want to keep on - which currently can only correctly respond to untiled entries in the inmemory kv store. :-/

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 1bb7c0c and 4604e61.

📒 Files selected for processing (5)

flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (4 hunks)
online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (2 hunks)
spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (16)

GitHub Check: streaming_tests
GitHub Check: groupby_tests
GitHub Check: streaming_tests
GitHub Check: analyzer_tests
GitHub Check: analyzer_tests
GitHub Check: spark_tests
GitHub Check: join_tests
GitHub Check: groupby_tests
GitHub Check: join_tests
GitHub Check: non_spark_tests
GitHub Check: spark_tests
GitHub Check: fetcher_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: non_spark_tests
GitHub Check: fetcher_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (26)

spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala (2)

263-271: Added tilingEnabled parameter to match updated API.

The parameter replaces the custom JSON-based tiling check with an explicit flag.

273-279: Updated API call with explicit tiling parameter.

Explicit tilingEnabled parameter replaces the previous JSON-based check, aligning with PR objectives.

spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala (5)

65-65: Formatting improvement.

Added blank line enhances readability between code blocks.

72-72: Formatting improvement.

Added blank line separates logical sections of code.

54-65: Removed assertion that enforced tiling restriction.

Based on the AI summary, an assertion preventing tiling for Entity groupBys was removed. This aligns with the PR objective to remove custom JSON dependencies.

163-163: Added tilingEnabled parameter with default false.

New parameter allows explicit tiling control, replacing previous JSON-based checks.

186-186: Using tilingEnabled parameter.

Parameter is properly passed to the putStreaming method, completing the implementation.

online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (3)

115-115: Simplified control flow.

Direct call to mergeTiledIrsFromStreaming removes conditional branching based on tiling status.

123-146: Improved null safety using Option.

Wrapping streamingResponses in Option prevents NPEs and enables functional transformation.

158-158: Compatible interface adaptation.

Updated to handle the new Option-based approach with streamingIrs.map(_.iterator).orNull.

spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (10)

21-21: No issues with new import.

26-26: Imports look fine.

129-133: Mutation detection logic is concise.

134-139: Dynamic time column logic looks good.

140-142: Indexing appears correct.

143-146: Key builder usage is fine.

154-155: Key extraction for grouping is fine.

159-159: Key and timestamp unpacking is efficient.

167-169: isDelete logic is correct, but see aggregator calls below.

183-183: Final tuple creation is fine.

flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (6)

8-8: Import addition is okay.

45-50: isMutation calculation is clear.

52-60: Reversal column requirement is valid.

80-80: Timestamp retrieval is straightforward.

100-100: Accurate isDelete condition.

102-107: Delete/Update calls are correctly ordered.

coderabbitai · 2025-03-12T20:17:35Z

spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala

+        // TODO: refactor the prod code and use it in test instead
+        if (isDelete) {
+          rowAggregator.update(aggIr, chrononRow)
+        } else {
+          rowAggregator.delete(aggIr, chrononRow)
+        }
+


⚠️ Potential issue

Aggregator deletion logic is reversed.
If isDelete is true, call delete; otherwise call update:

-if (isDelete) { - rowAggregator.update(aggIr, chrononRow) -} else { - rowAggregator.delete(aggIr, chrononRow) -} +if (isDelete) { + rowAggregator.delete(aggIr, chrononRow) +} else { + rowAggregator.update(aggIr, chrononRow) +}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// TODO: refactor the prod code and use it in test instead

if (isDelete) {

rowAggregator.update(aggIr, chrononRow)

} else {

rowAggregator.delete(aggIr, chrononRow)

}

// TODO: refactor the prod code and use it in test instead

if (isDelete) {

rowAggregator.delete(aggIr, chrononRow)

} else {

rowAggregator.update(aggIr, chrononRow)

}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 08e3315 and 4c75619.

📒 Files selected for processing (8)

api/py/ai/chronon/cli/compile/parse_teams.py (2 hunks)
api/src/main/scala/ai/chronon/api/Builders.scala (3 hunks)
api/src/main/scala/ai/chronon/api/Extensions.scala (1 hunks)
api/src/test/scala/ai/chronon/api/test/ExtensionsTest.scala (2 hunks)
api/thrift/common.thrift (1 hunks)
online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala (1 hunks)
online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala (6 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala

⏰ Context from checks skipped due to timeout of 90000ms (17)

GitHub Check: streaming_tests
GitHub Check: streaming_tests
GitHub Check: spark_tests
GitHub Check: analyzer_tests
GitHub Check: join_tests
GitHub Check: spark_tests
GitHub Check: join_tests
GitHub Check: groupby_tests
GitHub Check: non_spark_tests
GitHub Check: groupby_tests
GitHub Check: fetcher_tests
GitHub Check: non_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: fetcher_tests
GitHub Check: analyzer_tests
GitHub Check: python_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (19)

api/py/ai/chronon/cli/compile/parse_teams.py (2)

126-126: Added serving property support.

Properly extends the merge functionality to include the new serving property.

139-139: Consistent serving property handling.

Matches the pattern used for other properties (backfill, upload, streaming).

api/src/test/scala/ai/chronon/api/test/ExtensionsTest.scala (3)

21-23: Updated imports to support new flag-based approach.

Added necessary imports for ConfigProperties and ExecutionInfo.

121-134: Replaced JSON-based method with flag-based approach.

The implementation now correctly uses ExecutionInfo with ConfigProperties instead of customJson.

136-144: Updated assertions to use tilingFlag instead of isTilingEnabled.

Test cases appropriately check flag values directly instead of using the removed method.

spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala (5)

28-28: Added TileCodec import.

Required for the updated method signature.

58-65: Removed assertion restriction for tiling with entity groupings.

Simplifies code and removes unnecessary constraint.

88-90: Updated memoryStream type to use TileCodec.

Changed from tuple array to a tuple containing an array of TileUpdate and a TileCodec.

98-101: Updated putRequests mapping to use tileIrBytes.

Replaced tileBytes with tileIrBytes for consistency with new approach.

108-112: Used tileIrBytes in PutRequest.

Ensures consistent usage of the new bytes format.

api/src/main/scala/ai/chronon/api/Builders.scala (2)

273-275: Updated MetaData parameters.

Removed dependencies parameter and added support for additionalOutputPartitionColumns and executionInfo.

301-312: Added executionInfo handling logic.

Properly sets executionInfo with historicalBackfill and handles additionalOutputPartitionColumns.

api/thrift/common.thrift (4)

28-38: Well-documented struct explanation.

Good documentation explaining the environment variables layering and priority logic.

44-44: Added new serving configuration support.

New serving field enables structured configuration for serving rather than relying on custom JSON.

47-57: Good documentation for ConfigProperties.

Clear explanation of configuration properties and their layering hierarchy.

63-63: Consistent implementation of serving field.

Correctly added same configuration pattern to both structs.

api/src/main/scala/ai/chronon/api/Extensions.scala (3)

582-592: Good implementation of serving flag retrieval.

New method properly navigates the configuration hierarchy to retrieve serving flags.

594-594: Clean helper for tiling flag.

Simple helper that replaces previous custom JSON implementation with structured configuration.

596-596: Added helper for decode failure flag.

New clean helper method consistent with the flag-based approach.

coderabbitai · 2025-03-13T21:09:29Z

api/src/main/scala/ai/chronon/api/Extensions.scala

+    def servingFlagValue(flag: String): Option[String] = {
+      for (
+        execInfo <- Option(groupBy.metaData.executionInfo);
+        conf <- Option(execInfo.conf);
+        servingConf <- Option(conf.serving);
+        value <- Option(servingConf.get(flag))
+      ) {
+        return Some(value)
+      }
+      None
+    }
+
+    def tilingFlag: Boolean = servingFlagValue("tiling").exists(_.toLowerCase() == "true")
+
+    def throwOnDecodeFailFlag: Boolean = servingFlagValue("decode.throw_on_fail").exists(_.toLowerCase() == "true")


💡 Verification agent

🧩 Analysis chain

Verify flag-based approach is used consistently.

Ensure all previous custom JSON lookups are replaced with the new flag-based approach.

🏁 Script executed:

#!/bin/bash # Check for any remaining references to customJson in the codebase rg -i "customJson" --type scala

Length of output: 1105

Action required: Replace remaining custom JSON lookups.
The flag-based access in Extensions.scala is correct, but customJson is still used elsewhere. Update the following:

api/src/main/scala/ai/chronon/api/Builders.scala: Remove the customJson parameter and its usage.

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala & StagingQueryTest.scala: Refactor tests to use the new flag-based approach instead of supplying custom JSON.

api/src/test/scala/ai/chronon/api/test/ExtensionsTest.scala: Adjust test expectations/comments if needed.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (2)
178-180: Remove test-related TODO.

These comments suggest test code in production.
-        // This is not really testing the flink aggregation function
-        // TODO: refactor the prod code and use it in test instead
164-164: Remove commented code.

Dead code should be removed.
-    // val preAgg: Array[Byte] = tileCodec.makeTileIr(aggIr, isComplete = false)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4c75619 and e8983b5.

📒 Files selected for processing (1)

spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (16)

GitHub Check: streaming_tests
GitHub Check: join_tests
GitHub Check: groupby_tests
GitHub Check: analyzer_tests
GitHub Check: fetcher_tests
GitHub Check: spark_tests
GitHub Check: streaming_tests
GitHub Check: analyzer_tests
GitHub Check: join_tests
GitHub Check: groupby_tests
GitHub Check: fetcher_tests
GitHub Check: spark_tests
GitHub Check: non_spark_tests
GitHub Check: non_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (3)

spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (3)

43-45: Good case class addition.

TileUpdate encapsulates data well.

134-143: Clean mutation detection.

Good extraction of mutation logic.

127-127: Better return type.

Structured return type improves code clarity.

coderabbitai · 2025-03-13T21:16:48Z

spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala

        rowAggregator.update(aggIr, chrononRow)
-      }

-      val tileCodec = new TileCodec(groupBy, schema)
-      val preAgg: Array[Byte] = tileCodec.makeTileIr(aggIr, true)
-
-      (rowsKeys, tileTimestamp, preAgg)
+        val isDelete = isMutation && row.getAs[Boolean](reversalIndex)
+
+        // This is not really testing the flink aggregation function
+        // TODO: refactor the prod code and use it in test instead
+        if (isDelete) {
+          rowAggregator.delete(aggIr, chrononRow)
+        } else {
+          rowAggregator.update(aggIr, chrononRow)
+        }


⚠️ Potential issue

Redundant update call.

Update call on line 174 always happens before the deletion/update check, duplicating work.

- rowAggregator.update(aggIr, chrononRow) - - val isDelete = isMutation && row.getAs[Boolean](reversalIndex) - - // This is not really testing the flink aggregation function - // TODO: refactor the prod code and use it in test instead - if (isDelete) { - rowAggregator.delete(aggIr, chrononRow) - } else { - rowAggregator.update(aggIr, chrononRow) - } + val isDelete = isMutation && row.getAs[Boolean](reversalIndex) + + if (isDelete) { + rowAggregator.delete(aggIr, chrononRow) + } else { + rowAggregator.update(aggIr, chrononRow) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

rowAggregator.update(aggIr, chrononRow)

}

val tileCodec = new TileCodec(groupBy, schema)

val preAgg: Array[Byte] = tileCodec.makeTileIr(aggIr, true)

(rowsKeys, tileTimestamp, preAgg)

val isDelete = isMutation && row.getAs[Boolean](reversalIndex)

// This is not really testing the flink aggregation function

// TODO: refactor the prod code and use it in test instead

if (isDelete) {

rowAggregator.delete(aggIr, chrononRow)

} else {

rowAggregator.update(aggIr, chrononRow)

}

val isDelete = isMutation && row.getAs[Boolean](reversalIndex)

if (isDelete) {

rowAggregator.delete(aggIr, chrononRow)

} else {

rowAggregator.update(aggIr, chrononRow)

}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (2)

127-127: Update function documentation to match new return type.

Return type changed from Array[(Array[Any], Long, Array[Byte])] to (Array[TileUpdate], TileCodec).

176-177: Consider addressing TODO comment.

Could refactor the test code to use production code as suggested in the TODO.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between e8983b5 and 93f183a.

📒 Files selected for processing (2)

docs/source/Tiled_Architecture.md (1 hunks)
spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (16)

GitHub Check: streaming_tests
GitHub Check: streaming_tests
GitHub Check: join_tests
GitHub Check: groupby_tests
GitHub Check: groupby_tests
GitHub Check: non_spark_tests
GitHub Check: analyzer_tests
GitHub Check: join_tests
GitHub Check: fetcher_tests
GitHub Check: analyzer_tests
GitHub Check: spark_tests
GitHub Check: spark_tests
GitHub Check: fetcher_tests
GitHub Check: non_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (6)

docs/source/Tiled_Architecture.md (1)

82-83: Config update check.
Instruction now uses tiling=true on metaData.executionInfo.conf.serving—aligns with the new config and removes custom JSON references.

spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (5)

43-45: Good addition of TileUpdate case class.

Clean encapsulation of update data.

134-143: Clean mutation detection logic.

Properly determines time column based on mutation status.

148-151: Good use of key builder.

Leverages FastHashing for efficient key generation.

175-183: Fixed aggregation logic issues.

Correctly fixed the deletion/update logic as noted in previous reviews.

191-191: Clean return value.

Return tuple provides both updates and codec for downstream consumers.

) ## Summary custom json is not supposed to be relied on in scala code at all - it is purely meant for users to attach THEIR metadata to objects. ## Checklist - [x] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced new configuration flags that improve control over tiling behavior and streaming error handling. - Expanded metadata support to include additional output partition columns, enhancing data staging and query processing. - **Documentation** - Updated instructions for enabling tiled reads to reflect the new configuration parameter. - **Tests** - Enhanced test coverage to validate the new serving and tiling configurations.

) ## Summary custom json is not supposed to be relied on in scala code at all - it is purely meant for users to attach THEIR metadata to objects. ## Cheour clientslist - [x] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced new configuration flags that improve control over tiling behavior and streaming error handling. - Expanded metadata support to include additional output partition columns, enhancing data staging and query processing. - **Documentation** - Updated instructions for enabling tiled reads to reflect the new configuration parameter. - **Tests** - Enhanced test coverage to validate the new serving and tiling configurations.

fix: remove references to custom json and pull essential api to top l…

6e99aab

…evel

nikhil-zlai added 2 commits March 10, 2025 15:21

tests for additional cols

c4b1c9c

scala fmt

9322e9a

nikhil-zlai commented Mar 10, 2025

View reviewed changes

test fix

b270494

david-zlai reviewed Mar 10, 2025

View reviewed changes

remove tiling

b533d2c

varant-zlai approved these changes Mar 10, 2025

View reviewed changes

piyush-zlai approved these changes Mar 11, 2025

View reviewed changes

nikhil-zlai added 2 commits March 11, 2025 21:18

staging query test fix

1bb7c0c

deletes into tiled impl

4604e61

coderabbitai bot reviewed Mar 12, 2025

View reviewed changes

nikhil-zlai added 2 commits March 12, 2025 16:40

tile codec code cleanup

08e3315

backing out tiled fetching changes + adding tests for serving conf

4c75619

coderabbitai bot reviewed Mar 13, 2025

View reviewed changes

nikhil-zlai added 2 commits March 13, 2025 14:12

fix inmemory tiled delete application, thanks coderabbit

e8983b5

Merge branch 'main' into nikhil/kill_custom_json

60982bb

coderabbitai bot reviewed Mar 13, 2025

View reviewed changes

another bug, thanks coderabbit

93f183a

coderabbitai bot reviewed Mar 13, 2025

View reviewed changes

nikhil-zlai added 2 commits March 13, 2025 15:27

fix one test

99e98fa

fetcher test passing

45b8964

nikhil-zlai merged commit 20359a2 into main Mar 13, 2025
20 checks passed

nikhil-zlai deleted the nikhil/kill_custom_json branch March 13, 2025 23:18

		5: optional list<string> additionalOutputPartitionColumns

		6: optional map<string, string> tableProperties

		@@ -462,9 +448,9 @@ object Extensions {

		// Check if tiling is enabled for a given GroupBy. Defaults to false if the 'enable_tiling' flag isn't set.
		def isTilingEnabled: Boolean =

fix: remove references to custom json, float essential apis to top #492

fix: remove references to custom json, float essential apis to top #492

Uh oh!

Conversation

nikhil-zlai commented Mar 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

piyush-zlai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nikhil-zlai commented Mar 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 10, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)