Skip to content

perf: Online + Avro path optimizations #655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 17, 2025

Conversation

nikhil-zlai
Copy link
Contributor

@nikhil-zlai nikhil-zlai commented Apr 16, 2025

Summary

guided by flamegraph from one of our customers

  • pre-compute window.millis
  • eliminate branching in AvroConversions.toChrononRow - by pre-building a row generator func Row.fromCached
  • optimize expandWindowedTileIr by pre-computing the expander indexes.
  • bring-in fast-serde from linkedin which does generic data decoding, by pre-generating decoder classes.
  • optimize FetcherCache.getBatchBytes logic
  • early exit when externalParts are null
  • bigtablekvstore impl optimizations on how we construct the keys

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • New Features

    • Added a new method for converting cached data to Chronon row format.
    • Introduced a public field to expose the smallest tail hop duration in milliseconds for group-by serving information.
  • Bug Fixes

    • Improved error handling and null safety in batch response processing and fetcher logic.
    • Enhanced logging with better exception handling and sampling in online fetcher responses.
    • Added exception logging around synchronous KVStore response retrieval.
  • Refactor

    • Optimized windowing and mapping logic for performance and clarity in several aggregation and expansion methods.
    • Modularized derivation and logging logic in the fetcher for better maintainability.
    • Updated method and field names for clarity regarding window resolution and tail hop durations.
    • Refactored Avro serialization/deserialization to use LinkedIn Fast Avro SerDe with cached conversion functions.
    • Improved conversion utilities for Avro to Chronon row representations.
    • Centralized Scala-to-Java byte array conversions in tiling utilities.
  • Chores

    • Added LinkedIn Fast Avro SerDe as a dependency and updated related build and dependency files.
    • Updated utility methods for more consistent data type conversions.
  • Style

    • Improved code comments and documentation for clarity.

Copy link

coderabbitai bot commented Apr 16, 2025

Walkthrough

This update introduces new utility methods for cached and schema-aware data conversion, particularly enhancing Avro-to-Chronon row transformations. It modularizes derivation and logging within the fetcher, improves null safety, and optimizes mapping logic in tile expansion. The smallest tail hop calculation is clarified and propagated across APIs, online, and test code. Dependencies for Avro FastSerde are added, with associated Bazel and Maven configuration updates. Minor refactoring and comment adjustments are made for clarity and maintainability, with no changes to public API signatures except for extended case classes and new methods.

Changes

File(s) Change Summary
api/src/main/scala/ai/chronon/api/Row.scala
online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
Added fromCached method for schema-aware, cached row conversion; introduced genericRecordToChrononRowConverter and toChrononRowCached for efficient Avro-to-Chronon row transformations; replaced previous toChrononRow.
online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala Switched to LinkedIn FastSerde for Avro serialization/deserialization; cached row conversion function added; removed redundant encode method; refactored decode methods to use cached conversion.
online/src/main/scala/ai/chronon/online/TileCodec.scala Refactored windowed tile IR expansion by precomputing and caching mapping logic, reducing per-call overhead; added cached row converter.
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala Modularized derivation and logging logic into private methods; improved null safety, early returns, and error handling; minor code reorganization.
online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala Rewrote batch bytes retrieval logic to use explicit iteration and null checks for robustness.
online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala
Updated to use smallestTailHopMillis for tile key resolution and windowing, replacing previous window resolution logic; added field and converter to serving info class.
api/src/main/scala/ai/chronon/api/Extensions.scala Extended WindowMapping case class with millis field to explicitly track window duration.
aggregator/src/main/scala/ai/chronon/aggregator/windowing/Resolution.scala Renamed and refactored method for smallest tail hop calculation; improved null safety and iteration clarity.
aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala Replaced repeated property access with local variable for window duration; minor comment punctuation fix.
api/src/main/scala/ai/chronon/api/TilingUtils.scala Centralized Scala-to-Java byte array conversion with a new utility method.
online/src/main/scala/ai/chronon/online/fetcher/FetchContext.scala Added a TODO comment above a method; no functional changes.
online/src/main/scala/ai/chronon/online/Api.scala Wrapped blocking future await in try-catch to log exceptions.
online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala Simplified batch IR decoding by using converter function directly.
online/src/main/scala/ai/chronon/online/serde/AvroSerde.scala Added cached Avro-to-Chronon row converter; refactored fromBytes to use it.
maven_install.json
tools/build_rules/dependencies/load_dependencies.bzl
tools/build_rules/dependencies/maven_repository.bzl
online/BUILD.bazel
Added LinkedIn Avro FastSerde dependencies and repository; updated Bazel and Maven configs accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant AvroCodec
    participant AvroConversions
    participant Row
    participant ChrononRow

    AvroCodec->>AvroConversions: toChrononRowFunc(schema)
    AvroConversions->>Row: fromCached(dataType, ...)
    Row-->>AvroConversions: conversion function
    AvroConversions-->>AvroCodec: conversion function
    AvroCodec->>AvroConversions: conversionFunc(avroRecord)
    AvroConversions->>ChrononRow: Convert Avro to Chronon row
    ChrononRow-->>AvroCodec: Array[Any]
Loading

Poem

In bytes and rows, the schemas dance,
FastSerde joins the code’s expanse.
Cached conversions, window hops anew,
Mapping logic streamlined through and through.
Fetchers now log with modular grace—
Chronon’s code keeps up the pace!
🚀✨

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala (1)

101-101: Same null pointer concern.

🧹 Nitpick comments (1)
api/src/main/scala/ai/chronon/api/Row.scala (1)

126-197: Recursive converter looks flexible.
Consider numeric type edge cases.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between a574b7d and 70e5c20.

📒 Files selected for processing (11)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala (4 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/Row.scala (1 hunks)
  • maven_install.json (10 hunks)
  • online/BUILD.bazel (2 hunks)
  • online/src/main/scala/ai/chronon/online/TileCodec.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala (4 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (1 hunks)
  • tools/build_rules/dependencies/load_dependencies.bzl (1 hunks)
  • tools/build_rules/dependencies/maven_repository.bzl (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
online/src/main/scala/ai/chronon/online/TileCodec.scala (2)
aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala (1)
  • init (42-42)
aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala (2)
  • init (69-69)
  • clone (154-154)
⏰ Context from checks skipped due to timeout of 90000ms (21)
  • GitHub Check: service_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: hub_tests
  • GitHub Check: hub_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: api_tests
  • GitHub Check: api_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: service_tests
  • GitHub Check: online_tests
  • GitHub Check: flink_tests
  • GitHub Check: flink_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: online_tests
  • GitHub Check: bazel_config_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: orchestration_tests
  • GitHub Check: orchestration_tests
  • GitHub Check: aggregator_tests
🔇 Additional comments (24)
api/src/main/scala/ai/chronon/api/Extensions.scala (2)

259-259: Improved design by adding milliseconds to WindowMapping case class

Adding the millis field eliminates the need for repeated computation of window.millis in downstream code.


299-299: Good handling of null window case

Setting millis to -1 when window is null provides a clear sentinel value for code using this field.

aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala (4)

112-115: Performance optimization by caching window.millis

Pre-computing window.millis reduces repeated access and calculations in this hot loop.


131-135: Consistent optimization pattern across method

Same pattern applied here - caching window.millis to avoid repeated property access.


148-153: Performance improvement in tail hops processing

Caching window.millis optimizes this performance-sensitive code path.


159-159: Consistent use of cached value

The optimization is consistently applied throughout the code.

tools/build_rules/dependencies/maven_repository.bzl (1)

116-116: Dependency addition looks good.

Fastserde is correctly added for Avro optimizations.

tools/build_rules/dependencies/load_dependencies.bzl (1)

8-8: Repository addition is correct.

New Maven repo is needed for fastserde; no issues.

online/BUILD.bazel (1)

31-31: Build dependency update is correct.

fastserde is properly added to both targets.

Also applies to: 70-70

maven_install.json (1)

1-26992: All good—dependency lockfile updates look correct and expected.

online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala (1)

195-201: Potential edge case if latest.millis == batchEndTsMillis.
Might produce a null when equality occurs.

online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (2)

154-163: Neat wrapper around the cached function.
Keeps logic minimal.


165-190: Efficient Avro decoding.
Confirm concurrency safety if invoked from multiple threads.

online/src/main/scala/ai/chronon/online/TileCodec.scala (4)

32-32: Minimal import is fine.


99-100: Case class is concise.


101-124: Good precomputation for mappings.
Improves runtime.


126-132: Cloning logic in expansion step.
Double-check thread safety.

online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala (7)

19-19: Looks good.
Appropriate import for StructType.


24-24: No issues.
Necessary import for GenericRecord.


37-37: DatumWriter usage looks correct.
No obvious issues.


38-38: DatumReader usage looks correct.
Schema reference is consistent.


94-94: Potential null pointer.
decode(bytes) can be null; confirm safe usage.


26-27:

✅ Verification successful

Confirm fastserde version stability.
FastGenericDatumReader/Writer improves performance. Verify the library version is pinned to avoid regressions.


🏁 Script executed:

#!/bin/bash
rg "com.linkedin.avroutil1" -A 5

Length of output: 10539


Fastserde version pinned (0.4.25) confirmed.

  • Dependency is explicitly set to 0.4.25 in both maven_install.json and tools/build_rules/dependencies/maven_repository.bzl.

48-49:

✅ Verification successful

Check null handling in toChrononRowFunc.
decode can return null. Ensure safety.


🏁 Script executed:

#!/bin/bash
rg "def toChrononRowFunc" -A 10

Length of output: 1029


Null safety in toChrononRowFunc is verified.
The lambda in AvroConversions.toChrononRowFunc explicitly checks for a null input and returns null accordingly, ensuring that any null value from decode is safely handled. No further action is needed on this code.

Copy link
Contributor

@piyush-zlai piyush-zlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt of carving out the LI avro serde libs in a different change that we can take the time to test out in the various paths in our lambda that exercise the avro ser/de?

// cache these mapping out of hot-path
private case class ExpanderMapping(irPos: Int, bucketPos: Int)

private val expanderMappings: Array[ExpanderMapping] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool!

@nikhil-zlai nikhil-zlai force-pushed the nikhil/optimizations_latency_misc branch from f755f67 to 6cb5471 Compare April 16, 2025 17:22
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 6cb5471 and d53bca0.

📒 Files selected for processing (5)
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala (2 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (12 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (8 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
🚧 Files skipped from review as they are similar to previous changes (1)
  • spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala
🧰 Additional context used
🧬 Code Graph Analysis (1)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (3)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (8)
  • ScalaJavaConversions (6-97)
  • IteratorOps (51-55)
  • toScala (16-22)
  • toScala (32-38)
  • toScala (41-43)
  • toScala (52-54)
  • toScala (62-68)
  • toScala (80-86)
online/src/main/scala/ai/chronon/online/Api.scala (1)
  • KVStore (34-51)
api/src/main/scala/ai/chronon/api/TilingUtils.scala (2)
  • TilingUtils (9-43)
  • deserializeTileKey (15-19)
⏰ Context from checks skipped due to timeout of 90000ms (21)
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: service_tests
  • GitHub Check: service_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: online_tests
  • GitHub Check: hub_tests
  • GitHub Check: hub_tests
  • GitHub Check: online_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: api_tests
  • GitHub Check: flink_tests
  • GitHub Check: api_tests
  • GitHub Check: flink_tests
  • GitHub Check: orchestration_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: orchestration_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: spark_tests
🔇 Additional comments (3)
api/src/main/scala/ai/chronon/api/TilingUtils.scala (3)

5-5: Import added for new utility method.

The java.util import is necessary for ArrayList usage.


21-30: New utility method improves array-to-list conversion.

The imperative while loop approach avoids intermediate collections and preallocates the ArrayList to the correct size, which optimizes performance for a potentially hot code path.


38-38: Simplified keyBytes conversion.

Replaced complex conversion chain with direct utility method call, improving readability and likely performance.

@@ -113,6 +113,7 @@ maven_repository = repository(

# Avro
"org.apache.avro:avro:1.11.3",
"com.linkedin.avroutil1:avro-fastserde:0.4.25",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dannnggg

@nikhil-zlai nikhil-zlai force-pushed the nikhil/optimizations_latency_misc branch from d9f7777 to 789342b Compare April 17, 2025 03:17
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (2)

148-152: Null sentinel pattern retained despite previous feedback.

The zipping logic continues to use null when external responses are missing. Consider returning an empty sequence instead.


409-411: 🛠️ Refactor suggestion

Early exit optimization returns null.

While performance is improved by avoiding unnecessary processing, returning null breaks type safety.

-    if (validRequests.isEmpty) { return Future.successful(null) }
+    if (validRequests.isEmpty) { return Future.successful(Seq.empty) }
🧹 Nitpick comments (1)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (1)

125-125: Changed log level from debug to info.

Consider potential impact on log volume from this higher visibility setting.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 87de65b and 789342b.

📒 Files selected for processing (21)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/Resolution.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala (4 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/Row.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala (2 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (7 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (1 hunks)
  • maven_install.json (10 hunks)
  • online/BUILD.bazel (2 hunks)
  • online/src/main/scala/ai/chronon/online/Api.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/TileCodec.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetchContext.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (8 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala (4 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala (1 hunks)
  • tools/build_rules/dependencies/load_dependencies.bzl (1 hunks)
  • tools/build_rules/dependencies/maven_repository.bzl (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • online/src/main/scala/ai/chronon/online/fetcher/FetchContext.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
🚧 Files skipped from review as they are similar to previous changes (16)
  • tools/build_rules/dependencies/load_dependencies.bzl
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
  • online/BUILD.bazel
  • spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala
  • online/src/main/scala/ai/chronon/online/Api.scala
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala
  • tools/build_rules/dependencies/maven_repository.bzl
  • api/src/main/scala/ai/chronon/api/Extensions.scala
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/Resolution.scala
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
  • maven_install.json
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
  • api/src/main/scala/ai/chronon/api/Row.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala
⏰ Context from checks skipped due to timeout of 90000ms (36)
  • GitHub Check: service_tests
  • GitHub Check: service_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: hub_tests
  • GitHub Check: hub_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: online_tests
  • GitHub Check: online_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: flink_tests
  • GitHub Check: api_tests
  • GitHub Check: api_tests
  • GitHub Check: flink_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: orchestration_tests
  • GitHub Check: bazel_config_tests
  • GitHub Check: groupby_tests
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: join_tests
  • GitHub Check: spark_tests
  • GitHub Check: orchestration_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: batch_tests
  • GitHub Check: batch_tests
  • GitHub Check: spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (7)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (4)

179-180: Good modularization of derivation logic.

Extraction of processing logic into applyDerivations improves code maintainability.


191-252: Well-structured derivation logic with proper fallbacks.

The extracted method handles derivation with appropriate fallbacks and error handling. Metrics are correctly incremented for monitoring.


293-317: Clean separation of logging responsibility.

Refactored logResponse to delegate encoding and publishing to a dedicated method.


319-378: Well-extracted logging logic.

The new encodeAndPublishLog method correctly encapsulates encoding, sampling, and publishing functionality.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (3)

176-186: Simplified multi-get with bulk reads batcher.

Good usage of BigTable's batcher API with proper resource management through batcher.close().


291-293: Standardized future conversion pattern.

Consistently uses ApiFutureUtils.toCompletableFuture and FutureConverters.toScala for cleaner async handling.


358-360: Consistent future conversion pattern in multiPut.

Same standardized conversion approach applied to put operations.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (2)

166-172: Utility function for ArrayList creation

Helper method to create ArrayList from iterator, but not used in this file.

Consider removing this unused method or using it to reduce code duplication in the primitive array conversions below.


174-244: Optimized Avro to Chronon row conversion

Leverages the new Row.fromCached to create efficient conversion functions with optimizations for:

  1. Special handling of Avro FastSerde's primitive array types
  2. Efficient record field iteration with AbstractIterator
  3. Type-specific optimized converters

Comment on line 190 acknowledges potential performance concerns with case matching.

Consider using the buildArray helper to reduce repetition in array conversion cases.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 7053aae and 7c6c5d5.

📒 Files selected for processing (7)
  • api/src/main/scala/ai/chronon/api/Row.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/TileCodec.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala (4 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroSerde.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala
🧰 Additional context used
🧬 Code Graph Analysis (1)
online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (1)
api/src/main/scala/ai/chronon/api/Row.scala (2)
  • Row (72-126)
  • fromCached (129-203)
⏰ Context from checks skipped due to timeout of 90000ms (36)
  • GitHub Check: service_tests
  • GitHub Check: service_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: online_tests
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: hub_tests
  • GitHub Check: spark_tests
  • GitHub Check: groupby_tests
  • GitHub Check: online_tests
  • GitHub Check: flink_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: hub_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: join_tests
  • GitHub Check: flink_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: join_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: spark_tests
  • GitHub Check: groupby_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: api_tests
  • GitHub Check: batch_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: batch_tests
  • GitHub Check: orchestration_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: fetcher_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: bazel_config_tests
  • GitHub Check: api_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: orchestration_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (7)
online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (1)

300-316: Optimized and simplified Avro-to-Chronon row conversion

Uses the precomputed converter function from the serving info rather than doing conversion on the fly, which aligns with PR performance objectives.

online/src/main/scala/ai/chronon/online/serde/AvroSerde.scala (2)

57-57: Caching converter for better performance

Cached converter eliminates repeated construction of conversion logic for the same schema.


69-69: Simplified row conversion using cached converter

Replaces complex mapping logic with simple function call.

api/src/main/scala/ai/chronon/api/Row.scala (2)

126-126: Added utility identity function

Simple pass-through function for optimization paths.


128-203: Added efficient cached converter generator

Creates reusable conversion functions instead of performing conversions directly. Major performance optimization through:

  1. Pre-computing field conversion functions
  2. Optimizing list traversal when element conversion is identity
  3. Applying null guards efficiently

This addresses a key performance bottleneck identified in the PR objectives.

online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (2)

30-30: Added FastSerde dependency import

Integrates LinkedIn's fast serialization/deserialization library as noted in PR objectives.


155-164: Added high-level converter function

Clean wrapper that handles null case and type conversion.

@nikhil-zlai nikhil-zlai force-pushed the nikhil/optimizations_latency_misc branch from 9d667ad to 98819a7 Compare April 17, 2025 06:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (1)

166-172: Helper method extraction opportunity

This utility could replace duplicate code in array type handlers below.

 def buildArray(size: Int, iterator: util.Iterator[Any]): util.ArrayList[Any] = {
   val arr = new util.ArrayList[Any](size)
   while (iterator.hasNext) {
     arr.add(iterator.next())
   }
   arr
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 7c6c5d5 and 98819a7.

📒 Files selected for processing (22)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/Resolution.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala (4 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/Row.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (1 hunks)
  • maven_install.json (10 hunks)
  • online/BUILD.bazel (2 hunks)
  • online/src/main/scala/ai/chronon/online/Api.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/TileCodec.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetchContext.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (7 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala (4 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroSerde.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala (1 hunks)
  • tools/build_rules/dependencies/load_dependencies.bzl (1 hunks)
  • tools/build_rules/dependencies/maven_repository.bzl (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
  • tools/build_rules/dependencies/maven_repository.bzl
🚧 Files skipped from review as they are similar to previous changes (19)
  • online/src/main/scala/ai/chronon/online/fetcher/FetchContext.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
  • online/src/main/scala/ai/chronon/online/Api.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
  • spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroSerde.scala
  • api/src/main/scala/ai/chronon/api/Extensions.scala
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/Resolution.scala
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
  • tools/build_rules/dependencies/load_dependencies.bzl
  • maven_install.json
  • api/src/main/scala/ai/chronon/api/Row.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • online/BUILD.bazel
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
⏰ Context from checks skipped due to timeout of 90000ms (36)
  • GitHub Check: service_tests
  • GitHub Check: hub_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: service_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: streaming_tests
  • GitHub Check: api_tests
  • GitHub Check: hub_tests
  • GitHub Check: groupby_tests
  • GitHub Check: online_tests
  • GitHub Check: online_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: flink_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: join_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: api_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: spark_tests
  • GitHub Check: flink_tests
  • GitHub Check: batch_tests
  • GitHub Check: streaming_tests
  • GitHub Check: bazel_config_tests
  • GitHub Check: spark_tests
  • GitHub Check: orchestration_tests
  • GitHub Check: join_tests
  • GitHub Check: orchestration_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: batch_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (3)
online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (3)

30-30: Added FastSerde primitive array support

Importing LinkedIn's fastserde primitives for optimized array handling.


155-164: Optimized converter with caching

Cached conversion function improves performance by avoiding redundant schema processing.


174-244: Optimized cached conversion implementation

Efficient implementation ordering cases by frequency and handling specialized primitive arrays.

Comment on lines +189 to 241
{ // cases are ordered by most frequent use
// TODO: Leverage type info if this case match proves to be expensive
case doubles: fastavro.PrimitiveDoubleArrayList =>
val arr = new util.ArrayList[Any](doubles.size)
val iterator = doubles.iterator()
while (iterator.hasNext) {
arr.add(iterator.next())
}
arr

case longs: fastavro.PrimitiveLongArrayList =>
val arr = new util.ArrayList[Any](longs.size)
val iterator = longs.iterator()
while (iterator.hasNext) {
arr.add(iterator.next())
}
arr

case genericArray: GenericData.Array[Any] =>
val arr = new util.ArrayList[Any](genericArray.size)
val iterator = genericArray.iterator()
while (iterator.hasNext) {
arr.add(iterator.next())
}
arr

case ints: fastavro.PrimitiveIntArrayList =>
val arr = new util.ArrayList[Any](ints.size)
val iterator = ints.iterator()
while (iterator.hasNext) {
arr.add(iterator.next())
}
arr

case floats: fastavro.PrimitiveFloatArrayList =>
val arr = new util.ArrayList[Any](floats.size)
val iterator = floats.iterator()
while (iterator.hasNext) {
arr.add(iterator.next())
}
arr

case bools: fastavro.PrimitiveBooleanArrayList =>
val arr = new util.ArrayList[Any](bools.size)
val iterator = bools.iterator()
while (iterator.hasNext) {
arr.add(iterator.next())
}
arr

case valueOfUnknownType =>
throw new RuntimeException(s"Found unknown list type in avro record: ${valueOfUnknownType.getClass.getName}")
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Repeated array conversion patterns

Each array type uses identical conversion pattern. Use helper method.

 { // cases are ordered by most frequent use
   // TODO: Leverage type info if this case match proves to be expensive
   case doubles: fastavro.PrimitiveDoubleArrayList =>
-    val arr = new util.ArrayList[Any](doubles.size)
-    val iterator = doubles.iterator()
-    while (iterator.hasNext) {
-      arr.add(iterator.next())
-    }
-    arr
+    buildArray(doubles.size, doubles.iterator())

   case longs: fastavro.PrimitiveLongArrayList =>
-    val arr = new util.ArrayList[Any](longs.size)
-    val iterator = longs.iterator()
-    while (iterator.hasNext) {
-      arr.add(iterator.next())
-    }
-    arr
+    buildArray(longs.size, longs.iterator())

   case genericArray: GenericData.Array[Any] =>
-    val arr = new util.ArrayList[Any](genericArray.size)
-    val iterator = genericArray.iterator()
-    while (iterator.hasNext) {
-      arr.add(iterator.next())
-    }
-    arr
+    buildArray(genericArray.size, genericArray.iterator())

   case ints: fastavro.PrimitiveIntArrayList =>
-    val arr = new util.ArrayList[Any](ints.size)
-    val iterator = ints.iterator()
-    while (iterator.hasNext) {
-      arr.add(iterator.next())
-    }
-    arr
+    buildArray(ints.size, ints.iterator())

   case floats: fastavro.PrimitiveFloatArrayList =>
-    val arr = new util.ArrayList[Any](floats.size)
-    val iterator = floats.iterator()
-    while (iterator.hasNext) {
-      arr.add(iterator.next())
-    }
-    arr
+    buildArray(floats.size, floats.iterator())

   case bools: fastavro.PrimitiveBooleanArrayList =>
-    val arr = new util.ArrayList[Any](bools.size)
-    val iterator = bools.iterator()
-    while (iterator.hasNext) {
-      arr.add(iterator.next())
-    }
-    arr
+    buildArray(bools.size, bools.iterator())

Committable suggestion skipped: line range outside the PR's diff.

@nikhil-zlai nikhil-zlai merged commit 0a4115a into main Apr 17, 2025
39 checks passed
@nikhil-zlai nikhil-zlai deleted the nikhil/optimizations_latency_misc branch April 17, 2025 06:50
kumar-zlai pushed a commit that referenced this pull request Apr 25, 2025
## Summary
guided by flamegraph from one of our customers
- pre-compute window.millis
- eliminate branching in AvroConversions.toChrononRow - by pre-building
a row generator func `Row.fromCached`
- optimize `expandWindowedTileIr` by pre-computing the expander indexes.
- bring-in fast-serde from linkedin which does generic data decoding, by
pre-generating decoder classes.
- optimize FetcherCache.getBatchBytes logic
- early exit when externalParts are null
- bigtablekvstore impl optimizations on how we construct the keys

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
  - Added a new method for converting cached data to Chronon row format.
- Introduced a public field to expose the smallest tail hop duration in
milliseconds for group-by serving information.

- **Bug Fixes**
- Improved error handling and null safety in batch response processing
and fetcher logic.
- Enhanced logging with better exception handling and sampling in online
fetcher responses.
- Added exception logging around synchronous KVStore response retrieval.

- **Refactor**
- Optimized windowing and mapping logic for performance and clarity in
several aggregation and expansion methods.
- Modularized derivation and logging logic in the fetcher for better
maintainability.
- Updated method and field names for clarity regarding window resolution
and tail hop durations.
- Refactored Avro serialization/deserialization to use LinkedIn Fast
Avro SerDe with cached conversion functions.
- Improved conversion utilities for Avro to Chronon row representations.
- Centralized Scala-to-Java byte array conversions in tiling utilities.

- **Chores**
- Added LinkedIn Fast Avro SerDe as a dependency and updated related
build and dependency files.
  - Updated utility methods for more consistent data type conversions.

- **Style**
  - Improved code comments and documentation for clarity.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kumar-zlai pushed a commit that referenced this pull request Apr 29, 2025
## Summary
guided by flamegraph from one of our customers
- pre-compute window.millis
- eliminate branching in AvroConversions.toChrononRow - by pre-building
a row generator func `Row.fromCached`
- optimize `expandWindowedTileIr` by pre-computing the expander indexes.
- bring-in fast-serde from linkedin which does generic data decoding, by
pre-generating decoder classes.
- optimize FetcherCache.getBatchBytes logic
- early exit when externalParts are null
- bigtablekvstore impl optimizations on how we construct the keys

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
  - Added a new method for converting cached data to Chronon row format.
- Introduced a public field to expose the smallest tail hop duration in
milliseconds for group-by serving information.

- **Bug Fixes**
- Improved error handling and null safety in batch response processing
and fetcher logic.
- Enhanced logging with better exception handling and sampling in online
fetcher responses.
- Added exception logging around synchronous KVStore response retrieval.

- **Refactor**
- Optimized windowing and mapping logic for performance and clarity in
several aggregation and expansion methods.
- Modularized derivation and logging logic in the fetcher for better
maintainability.
- Updated method and field names for clarity regarding window resolution
and tail hop durations.
- Refactored Avro serialization/deserialization to use LinkedIn Fast
Avro SerDe with cached conversion functions.
- Improved conversion utilities for Avro to Chronon row representations.
- Centralized Scala-to-Java byte array conversions in tiling utilities.

- **Chores**
- Added LinkedIn Fast Avro SerDe as a dependency and updated related
build and dependency files.
  - Updated utility methods for more consistent data type conversions.

- **Style**
  - Improved code comments and documentation for clarity.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary
guided by flamegraph from one of our customers
- pre-compute window.millis
- eliminate branching in AvroConversions.toChrononRow - by pre-building
a row generator func `Row.fromCached`
- optimize `expandWindowedTileIr` by pre-computing the expander indexes.
- bring-in fast-serde from linkedin which does generic data decoding, by
pre-generating decoder classes.
- optimize FetcherCache.getBatchBytes logic
- early exit when externalParts are null
- bigtablekvstore impl optimizations on how we construct the keys

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
  - Added a new method for converting cached data to Chronon row format.
- Introduced a public field to expose the smallest tail hop duration in
milliseconds for group-by serving information.

- **Bug Fixes**
- Improved error handling and null safety in batch response processing
and fetcher logic.
- Enhanced logging with better exception handling and sampling in online
fetcher responses.
- Added exception logging around synchronous KVStore response retrieval.

- **Refactor**
- Optimized windowing and mapping logic for performance and clarity in
several aggregation and expansion methods.
- Modularized derivation and logging logic in the fetcher for better
maintainability.
- Updated method and field names for clarity regarding window resolution
and tail hop durations.
- Refactored Avro serialization/deserialization to use LinkedIn Fast
Avro SerDe with cached conversion functions.
- Improved conversion utilities for Avro to Chronon row representations.
- Centralized Scala-to-Java byte array conversions in tiling utilities.

- **Chores**
- Added LinkedIn Fast Avro SerDe as a dependency and updated related
build and dependency files.
  - Updated utility methods for more consistent data type conversions.

- **Style**
  - Improved code comments and documentation for clarity.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary
guided by flamegraph from one of our customers
- pre-compute window.millis
- eliminate branching in AvroConversions.toChrononRow - by pre-building
a row generator func `Row.fromCached`
- optimize `expandWindowedTileIr` by pre-computing the expander indexes.
- bring-in fast-serde from linkedin which does generic data decoding, by
pre-generating decoder classes.
- optimize FetcherCache.getBatchBytes logic
- early exit when externalParts are null
- bigtablekvstore impl optimizations on how we construct the keys

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
  - Added a new method for converting cached data to Chronon row format.
- Introduced a public field to expose the smallest tail hop duration in
milliseconds for group-by serving information.

- **Bug Fixes**
- Improved error handling and null safety in batch response processing
and fetcher logic.
- Enhanced logging with better exception handling and sampling in online
fetcher responses.
- Added exception logging around synchronous KVStore response retrieval.

- **Refactor**
- Optimized windowing and mapping logic for performance and clarity in
several aggregation and expansion methods.
- Modularized derivation and logging logic in the fetcher for better
maintainability.
- Updated method and field names for clarity regarding window resolution
and tail hop durations.
- Refactored Avro serialization/deserialization to use LinkedIn Fast
Avro SerDe with cached conversion functions.
- Improved conversion utilities for Avro to Chronon row representations.
- Centralized Scala-to-Java byte array conversions in tiling utilities.

- **Chores**
- Added LinkedIn Fast Avro SerDe as a dependency and updated related
build and dependency files.
  - Updated utility methods for more consistent data type conversions.

- **Style**
  - Improved code comments and documentation for clarity.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 16, 2025
## Summary
guided by flamegraph from one of our customers
- pre-compute window.millis
- eliminate branching in AvroConversions.toChrononRow - by pre-building
a row generator func `Row.fromCached`
- optimize `expandWindowedTileIr` by pre-computing the expander indexes.
- bring-in fast-serde from linkedin which does generic data decoding, by
pre-generating decoder classes.
- optimize FetcherCache.getBatchBytes logic
- early exit when externalParts are null
- bigtablekvstore impl optimizations on how we construct the keys

## Cheour clientslist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
  - Added a new method for converting cached data to Chronon row format.
- Introduced a public field to expose the smallest tail hop duration in
milliseconds for group-by serving information.

- **Bug Fixes**
- Improved error handling and null safety in batch response processing
and fetcher logic.
- Enhanced logging with better exception handling and sampling in online
fetcher responses.
- Added exception logging around synchronous KVStore response retrieval.

- **Refactor**
- Optimized windowing and mapping logic for performance and clarity in
several aggregation and expansion methods.
- Modularized derivation and logging logic in the fetcher for better
maintainability.
- Updated method and field names for clarity regarding window resolution
and tail hop durations.
- Refactored Avro serialization/deserialization to use LinkedIn Fast
Avro SerDe with cached conversion functions.
- Improved conversion utilities for Avro to Chronon row representations.
- Centralized Scala-to-Java byte array conversions in tiling utilities.

- **Chores**
- Added LinkedIn Fast Avro SerDe as a dependency and updated related
build and dependency files.
  - Updated utility methods for more consistent data type conversions.

- **Style**
  - Improved code comments and documentation for clarity.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants