Add list & join schema fetcher APIs #431

piyush-zlai · 2025-02-25T18:03:15Z

Summary

Add a couple of APIs to help with the Etsy Patina integration. One is to list out all online joins and the second is to retrieve the join schema details for a given Join.

As part of wiring up list support, I tweaked a couple of properties like the list pagination key / list call limit to make things consistent between DynamoDB and BigTable.

For the BT implementation we issue a range query under the 'joins/' prefix. Subsequent calls (in case of pagination) continue off this range (verified this via unit tests and also basic sanity checks on Etsy).

APIs added are:

/v1/joins -> Return the list of online joins
/v1/join/schema/join-name -> Return a payload consisting of {"joinName": "..", "keySchema": "avro schema", "valueSchema": "avro schema", "schemaHash": "hash"}

Tested by dropping the docker container and confirming things on the Etsy side:

$ curl http://localhost:9000/v1/joins                                                                                                                                              
{"joinNames":["search.ranking.v1_web_zipline_cdc_and_beacon_external" ...}

And

curl http://localhost:9000/v1/join/schema/search.ranking.v1_web_zipline_cdc_and_beacon_external
{ big payload }

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

New Features
- Introduced new API endpoints that let users list available joins and retrieve detailed join schema information.
- Added enhanced configuration options to support complex join workflows.
- New test cases for validating join listing and schema retrieval functionalities.
- Added new constants for pagination and entity type handling.
Improvements
- Standardized pagination and entity handling across cloud integrations, ensuring a consistent and reliable data listing experience.
- Enhanced error handling and response formatting for join-related requests.
- Expanded testing capabilities with additional dependencies and resource inclusion.

coderabbitai · 2025-02-25T18:03:26Z

Walkthrough

This pull request introduces new constants in the API for pagination and entity type identification. It refactors existing cloud, online, and service modules to replace hardcoded values with these constants, and updates tests accordingly. New endpoints and methods for join listing and join schema fetching are added, alongside build and resource configuration updates.

Changes

Files	Change Summary
`api/src/main/scala/ai/chronon/api/Constants.scala`	Added constants: `ContinuationKey`, `ListLimit`, `ListEntityType`.
`cloud_aws/{BUILD.bazel, DynamoDBKVStoreImpl.scala, DynamoDBKVStoreTest.scala}`	Added test dependency `//api:lib`; replaced local constants with API constants; updated tests for pagination.
`cloud_gcp/{BigTableKVStoreImpl.scala, BigTableKVStoreTest.scala}`	Replaced local constants with API constants; restructured list logic with pattern matching; added a pagination test for entity types.
`distribution/publish_docker_images.sh`	Added AWS JAR build command and file existence check.
`online/{BUILD.bazel, JavaFetcher.java, JavaJoinSchemaResponse.java, JoinCodec.scala, Metrics.scala, fetcher/{Fetcher.scala, MetadataStore.scala}, test/resources/joins, test/ListJoinsTest.scala}`	Added join schema fetching and join listing functionality; introduced new Java API methods and classes; updated codec type signatures; included new JSON configurations and tests.
`service/{BUILD.bazel, FetcherVerticle.java, handlers/{FetchRouter.java, JoinListHandler.java, JoinSchemaHandler.java}, test/{JoinListHandlerTest.java, JoinSchemaHandlerTest.java}}`	Updated routing to use `JavaFetcher`; added HTTP handlers for join listing and join schema retrieval; updated build dependency; added corresponding tests.

Possibly related PRs

Add DynamoDB kv store implementation #33: Updates constant usage in DynamoDBKVStoreImpl for pagination.
Backport service changes to zipline fork #70: Replaces local constants with API constants in store implementations and tests.
refactor: split fetcher logic into multiple files #425: Modifies constant usage in DynamoDBKVStoreImpl and tests, aligning with these changes.

Suggested reviewers

nikhil-zlai

Poem

Code flows anew with constants bright,
Refactored paths bring order to the night.
Endpoints now sing in harmonious tune,
Tests and builds rising like a cosmic rune.
Celebrate the change—a code delight!

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 26856c2 and 58aa047.

📒 Files selected for processing (1)

service/src/main/java/ai/chronon/service/FetcherVerticle.java (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: non_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: non_spark_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (5)

service/src/main/java/ai/chronon/service/FetcherVerticle.java (5)

4-7: Added imports for new handler classes.

The imports are correctly added for the new functionality.

37-38: Created JavaFetcher instance for handler use.

Good addition of JavaFetcher creation for reuse across handlers.

40-41: Modified route to use fetcher.

Updated fetch route to use JavaFetcher instance instead of api directly.

42-44: Added new endpoint for listing joins.

Implemented the new endpoint for listing online joins as specified in requirements.

45-47: Added join schema endpoint in the idiomatic way.

Implemented the join schema endpoint with path parameter as recommended in previous review.

✨ Finishing Touches

📝 Generate Docstrings (Beta)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

online/src/main/java/ai/chronon/online/JavaJoinSchemaResponse.java (1)
5-32: Good interoperability implementation.

Provides clean conversion between Java and Scala representations of join schema.

Consider adding null checks in the constructor to prevent potential NullPointerExceptions.
 public JavaJoinSchemaResponse(Fetcher.JoinSchemaResponse scalaResponse){
+    if (scalaResponse == null) {
+        throw new IllegalArgumentException("scalaResponse cannot be null");
+    }
     this.joinName = scalaResponse.joinName();
     this.keySchema = scalaResponse.keySchema();
     this.valueSchema = scalaResponse.valueSchema();
     this.schemaHash = scalaResponse.schemaHash();
 }
service/src/main/java/ai/chronon/service/handlers/JoinSchemaHandler.java (2)

1-51: The implementation looks solid but consider enhancing error handling.

Consider returning different HTTP status codes based on error types (e.g., 404 for not found, 400 for bad request) rather than always using 500. This would make the API more RESTful.

25-29: Add input validation for the entity name.

The path parameter is extracted but not validated. Consider adding validation to handle empty or invalid names before proceeding with fetching.

online/src/test/resources/joins/user_transactions.txn_join_d (2)

33-103: Check groupBy correctness.
Verify logic for aggregation windows.

212-247: Check merchant_data groupBy.
Confirm businesses with "is_big_merchant" are processed correctly.

online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)

201-223: doRetrieveAllListConfs pagination.
Recursive approach is fine. Watch for potential large memory usage with huge accumulations.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4f579b4 and 99a16c9.

📒 Files selected for processing (24)

api/src/main/scala/ai/chronon/api/Constants.scala (1 hunks)
cloud_aws/BUILD.bazel (1 hunks)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (3 hunks)
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (2 hunks)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (4 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (3 hunks)
distribution/publish_docker_images.sh (2 hunks)
online/BUILD.bazel (2 hunks)
online/src/main/java/ai/chronon/online/JavaFetcher.java (3 hunks)
online/src/main/java/ai/chronon/online/JavaJoinSchemaResponse.java (1 hunks)
online/src/main/scala/ai/chronon/online/JoinCodec.scala (2 hunks)
online/src/main/scala/ai/chronon/online/Metrics.scala (1 hunks)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (3 hunks)
online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (2 hunks)
online/src/test/resources/joins/user_transactions.txn_join_a (1 hunks)
online/src/test/resources/joins/user_transactions.txn_join_d (1 hunks)
online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala (1 hunks)
service/BUILD.bazel (1 hunks)
service/src/main/java/ai/chronon/service/FetcherVerticle.java (2 hunks)
service/src/main/java/ai/chronon/service/handlers/FetchRouter.java (1 hunks)
service/src/main/java/ai/chronon/service/handlers/JoinListHandler.java (1 hunks)
service/src/main/java/ai/chronon/service/handlers/JoinSchemaHandler.java (1 hunks)
service/src/test/java/ai/chronon/service/handlers/JoinListHandlerTest.java (1 hunks)
service/src/test/java/ai/chronon/service/handlers/JoinSchemaHandlerTest.java (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (2)

Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-11-12T09:38:33.532Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-11-12T09:38:33.532Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.

cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (2)

Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-11-12T09:38:33.532Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.

Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-11-12T09:38:33.532Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.

online/src/test/resources/joins/user_transactions.txn_join_a (1)

Learnt from: chewy-zlai
PR: zipline-ai/chronon#30
File: api/py/test/sample/production/joins/risk/user_transactions.txn_join:217-218
Timestamp: 2024-11-12T09:38:33.532Z
Learning: The JSON files in this project are automatically generated and should not be manually modified or refactored.

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: non_spark_tests
GitHub Check: non_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (60)

service/BUILD.bazel (1)

32-32: Added Avro dependency for join schema functionality

Added Apache Avro library to test dependencies to support the new join schema fetching API.

online/src/main/scala/ai/chronon/online/Metrics.scala (1)

31-31: Looks good, new metric for tracking join schema fetch operations.

Consistent with existing metric naming patterns.

cloud_aws/BUILD.bazel (1)

27-27: API lib added to test dependencies.

Good addition that ensures test cases have access to API constants.

online/BUILD.bazel (2)

73-73: Resources added to test library.

Enables access to test resources for running test cases.

85-85: Resources added to test suite.

Consistent with test_lib configuration.

distribution/publish_docker_images.sh (3)

33-33: Added AWS JAR build command.

Ensures AWS implementation is built for Docker image.

37-37: Added AWS JAR path variable.

Consistent with existing GCP JAR variable.

50-53: Added AWS JAR existence check.

Prevents build failures by validating AWS JAR exists.

api/src/main/scala/ai/chronon/api/Constants.scala (1)

89-97: Addition of constants for KV store pagination looks good.

The new constants centralize pagination and entity type parameters for better maintainability.

service/src/main/java/ai/chronon/service/handlers/JoinListHandler.java (1)

13-46: Clean implementation of join list handler.

The handler properly follows Vert.x's non-blocking pattern by wrapping the CompletableFuture.

cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (3)

3-3: Updated imports to use centralized constants.

Good refactoring to use the centralized constants from Constants.scala.

128-131: Consistent use of pagination constants.

First list request now uses the centralized ListLimit constant.

134-138: Second list request updated to use new constants.

The pagination continuation works as expected with the centralized constants.

service/src/main/java/ai/chronon/service/handlers/FetchRouter.java (1)

30-30: LGTM - Method signature refactored to use JavaFetcher directly.

Good change that removes indirect fetcher creation via Api, making dependencies clearer.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (2)

3-3: Updated imports use standardized constants.

Replaced hardcoded strings with constants, improving maintainability.

205-250: New pagination test looks good.

Test thoroughly verifies entity type listing functionality with pagination.

online/src/main/java/ai/chronon/online/JavaFetcher.java (4)

19-19: Added import for ScalaJavaConversions utility.

Required for the new methods that convert Scala collections to Java.

29-29: Added import for Scala Try utility.

Needed for error handling in the schema fetching functionality.

175-180: New listJoins method looks good.

Efficiently converts Scala Future to Java CompletableFuture with proper type conversion.

182-185: New fetchJoinSchema method looks good.

Properly handles errors using JTry pattern for consistent error handling.

service/src/test/java/ai/chronon/service/handlers/JoinSchemaHandlerTest.java (5)

30-42: Well-structured test class setup.

Good initialization of class fields and mocks.

43-55: Clean test setup method.

The setup configures mocks with appropriate behavior needed for tests.

57-101: Comprehensive test for successful schema retrieval.

Test properly validates schema hash, join name, key/value schemas, and parses Avro schemas to ensure structural validity.

103-128: Good error handling test.

Test correctly checks failure scenario response code and structure.

130-136: Reusable error validation helper.

Helper method effectively validates error response format.

service/src/test/java/ai/chronon/service/handlers/JoinListHandlerTest.java (5)

31-42: Well-structured test class setup.

Class correctly initializes mocks and handler.

43-55: Appropriate test setup method.

Setup method properly initializes the test environment.

57-89: Thorough successful list request test.

Test validates HTTP status, content type, and response structure with expected join names.

91-118: Good failure scenario test.

Test properly simulates and validates error handling for failed KV store lookups.

120-126: Reusable error validation helper.

Helper method effectively validates error response structure.

online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (3)

26-26: Updated import to include new classes.

Import now includes JoinSchemaResponse for the new functionality.

75-81: Well-documented JoinSchemaResponse case class.

Class provides clear structure for join schema with good documentation.

427-448: Good implementation of fetchJoinSchema method.

Method follows established patterns for:

Metrics tracking

Error handling

Response construction

Appropriately uses joinCodecCache to retrieve schema information.

online/src/main/scala/ai/chronon/online/JoinCodec.scala (2)

37-38: Simplified type references.

Removed redundant serde prefix from AvroCodec types for cleaner code.

92-92: Updated method signature for consistency.

Method parameter types updated to match class field types.

online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala (1)

1-105: Well-structured test for the MetadataStore's listJoins method!

Complete test coverage with validation of online joins, error handling, and pagination logic.

cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (4)

4-4: Good refactoring to use centralized constants.

Improves maintainability by using imported constants rather than hardcoded values.

169-173: Appropriate use of ListLimit constant.

Consistent with the refactoring approach to use centralized constants.

175-178: Proper implementation of ContinuationKey.

Clean transition from local constant to imported one.

196-196: Consistent use of ContinuationKey in response mapping.

Maintains consistent usage of the imported constant.

online/src/test/resources/joins/user_transactions.txn_join_a (1)

1-248: JSON configuration appears correct.

JSON structure is well-formatted and contains appropriate join configuration with "online": 0 flag.

service/src/main/java/ai/chronon/service/FetcherVerticle.java (5)

4-7: Required imports for new functionality.

Appropriate imports for the new handlers and JavaFetcher.

37-38: Clean instantiation of JavaFetcher.

Creates fetcher instance with appropriate parameters.

40-40: Updated fetch route to use JavaFetcher.

Good refactoring to use the fetcher instance instead of direct API access.

42-44: New API endpoint for listing joins.

Implements the first required API for listing all online joins as specified in the PR objectives.

45-47: New API endpoint for join schema retrieval.

Implements the second required API for retrieving schema details for a specified join.

online/src/test/resources/joins/user_transactions.txn_join_d (4)

1-19: Validate naming consistency in metadata.
Check that all references match actual table names to avoid runtime errors.

20-32: Ensure table references are valid.
Confirm "data.users" is correct and accessible.

105-173: Confirm naming alignment.
"risk.transaction_events.txn_group_by_merchant" must sync with references.

175-210: Review user_data groupBy.
Ensure "account_balance" and other fields meet usage constraints.

online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (5)

24-24: Imports look fine.
Needed for KVStore list operations.

31-31: StandardCharsets import is good.
No issues.

177-181: New method signature looks good.
Clear param name. Good default value usage.

182-200: parseJoins logic.
Handles JSON decoding well. Ensure all exceptions are well-reported.

225-226: Final call.
Keeps code simple. No issues.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (5)

3-3: New imports are correct.

215-219: listLimit extraction.
Covers int/string fallback. Good approach.

221-222: maybeListEntityType and maybeStartKey retrieval.
Clean property checks.

232-241: Smart pattern matching.
Handles different start conditions. Nicely done.

264-264: ContinuationKey usage.
Returns last key as next page start. Good for pagination.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

service/src/main/java/ai/chronon/service/handlers/JoinSchemaHandler.java (1)

31-42: Consider differentiating error status codes.

Always returning 500 for errors is generic. Differentiate between server errors and not-found conditions.

-            logger.error("Unable to retrieve join schema for: {}", entityName, joinSchemaResponseTry.getException());
-
-            List<String> errorMessages = Collections.singletonList(joinSchemaResponseTry.getException().getMessage());
-
-            ctx.response()
-                    .setStatusCode(500)
-                    .putHeader("content-type", "application/json")
-                    .end(new JsonObject().put("errors", errorMessages).encode());
+            Throwable exception = joinSchemaResponseTry.getException();
+            logger.error("Unable to retrieve join schema for: {}", entityName, exception);
+
+            int statusCode = (exception instanceof NoSuchElementException) ? 404 : 500;
+            List<String> errorMessages = Collections.singletonList(exception.getMessage());
+
+            ctx.response()
+                    .setStatusCode(statusCode)
+                    .putHeader("content-type", "application/json")
+                    .end(new JsonObject().put("errors", errorMessages).encode());

service/src/test/java/ai/chronon/service/handlers/JoinListHandlerTest.java (1)

95-95: Remove unused variable.

This list isn't used in the test.
-        List<String> joins = List.of("my_joins.join_a.v1", "my_joins.join_a.v2", "my_joins.join_b.v1");

service/src/test/java/ai/chronon/service/handlers/JoinSchemaHandlerTest.java (1)

130-136: Consider enhancing error validation.

Helper method checks for errors array existence, but could verify more error details.

 private void validateFailureResponse(String jsonResponse, TestContext context) {
     JsonObject actualResponse = new JsonObject(jsonResponse);
     context.assertTrue(actualResponse.containsKey("errors"));

     String failureString = actualResponse.getJsonArray("errors").getString(0);
     context.assertNotNull(failureString);
+    context.assertTrue(failureString.contains("some fake failure"));
 }

online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)

201-226: Pagination implementation looks good.

The recursive implementation for retrieving all joins with pagination support is clean and includes proper metrics tracking.

Consider using immutable data structures instead of mutable ArrayBuffer for better functional style:

-def doRetrieveAllListConfs(acc: mutable.ArrayBuffer[String],
+def doRetrieveAllListConfs(acc: Seq[String],
                           paginationKey: Option[Any] = None): Future[Seq[String]] = {
  // ...
  fetchContext.kvStore.list(listRequest).flatMap { response =>
    val joinSeq: Seq[String] = parseJoins(response)
-   val newAcc = acc ++ joinSeq
+   val newAcc = acc ++ joinSeq // No change needed here as ++ works on immutable Seq too
    // ...
  }
}

-doRetrieveAllListConfs(new mutable.ArrayBuffer[String]())
+doRetrieveAllListConfs(Seq.empty[String])

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 99a16c9 and 26856c2.

📒 Files selected for processing (24)

api/src/main/scala/ai/chronon/api/Constants.scala (1 hunks)
cloud_aws/BUILD.bazel (1 hunks)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (3 hunks)
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (2 hunks)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (4 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (3 hunks)
distribution/publish_docker_images.sh (2 hunks)
online/BUILD.bazel (2 hunks)
online/src/main/java/ai/chronon/online/JavaFetcher.java (3 hunks)
online/src/main/java/ai/chronon/online/JavaJoinSchemaResponse.java (1 hunks)
online/src/main/scala/ai/chronon/online/JoinCodec.scala (2 hunks)
online/src/main/scala/ai/chronon/online/Metrics.scala (1 hunks)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (3 hunks)
online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (2 hunks)
online/src/test/resources/joins/user_transactions.txn_join_a (1 hunks)
online/src/test/resources/joins/user_transactions.txn_join_d (1 hunks)
online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala (1 hunks)
service/BUILD.bazel (1 hunks)
service/src/main/java/ai/chronon/service/FetcherVerticle.java (2 hunks)
service/src/main/java/ai/chronon/service/handlers/FetchRouter.java (1 hunks)
service/src/main/java/ai/chronon/service/handlers/JoinListHandler.java (1 hunks)
service/src/main/java/ai/chronon/service/handlers/JoinSchemaHandler.java (1 hunks)
service/src/test/java/ai/chronon/service/handlers/JoinListHandlerTest.java (1 hunks)
service/src/test/java/ai/chronon/service/handlers/JoinSchemaHandlerTest.java (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (16)

online/src/main/scala/ai/chronon/online/Metrics.scala
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
distribution/publish_docker_images.sh
cloud_aws/BUILD.bazel
online/BUILD.bazel
service/BUILD.bazel
online/src/main/java/ai/chronon/online/JavaJoinSchemaResponse.java
service/src/main/java/ai/chronon/service/handlers/JoinListHandler.java
service/src/main/java/ai/chronon/service/handlers/FetchRouter.java
api/src/main/scala/ai/chronon/api/Constants.scala
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala
online/src/main/scala/ai/chronon/online/JoinCodec.scala
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
online/src/test/resources/joins/user_transactions.txn_join_a

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: non_spark_tests
GitHub Check: non_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (15)

online/src/main/java/ai/chronon/online/JavaFetcher.java (2)

175-180: LGTM: listJoins implementation.

Clean conversion from Scala Future to Java CompletableFuture with proper type handling.

182-185: LGTM: fetchJoinSchema implementation.

Effective Scala-to-Java mapping with proper error propagation.

online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (2)

75-82: LGTM: Well-documented schema response model.

Comprehensive documentation with clear parameter descriptions.

427-448: LGTM: Thorough implementation with proper metrics.

Handles errors gracefully, includes appropriate metrics, and leverages existing cache.

service/src/test/java/ai/chronon/service/handlers/JoinListHandlerTest.java (2)

57-89: LGTM: Well-structured success test.

Comprehensive verification of status code, headers and response content.

91-118: LGTM: Thorough error handling test.

Good coverage of the error path with explicit assertions.

service/src/test/java/ai/chronon/service/handlers/JoinSchemaHandlerTest.java (2)

57-101: Well-structured test for successful schema fetch.

The test validates response status, content type, and JSON structure correctly. It also verifies Avro schema parsing.

103-128: Error handling test looks good.

Test properly verifies 500 status code and error response structure when fetcher returns failure.

service/src/main/java/ai/chronon/service/FetcherVerticle.java (4)

37-38: JavaFetcher instance created correctly.

Good initialization of the fetcher with appropriate parameters.

40-40: Updated fetch route to use new fetcher.

Properly modified to use fetcher instead of api directly.

42-44: New route for listing joins.

API endpoint aligns well with PR objectives.

45-46: New route for join schema retrieval.

Endpoint correctly uses path parameter for join name.

online/src/test/resources/joins/user_transactions.txn_join_d (1)

1-248: Well-structured join configuration.

The configuration defines a comprehensive join between users and transaction data with appropriate metadata, dependencies, and aggregations.

online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (2)

24-24: Updated imports to include List operations.

Correctly added ListRequest and ListResponse types for new functionality.

177-199: Well-implemented joins parsing.

The parseJoins function correctly converts response values to strings, deserializes them, and filters based on online status. Error handling is properly implemented.

nikhil-zlai

very thorough pr! nicely done sir

nikhil-zlai · 2025-02-25T19:34:41Z

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala

-    // if we got a start row key, lets wire it up
-    maybeStartKey.foreach { startKey =>
-      query.range(ByteStringRange.unbounded().startOpen(ByteString.copyFrom(startKey.asInstanceOf[Array[Byte]])))
+    (maybeStartKey, maybeListEntityType) match {


nikhil-zlai · 2025-02-25T19:41:31Z

service/src/main/java/ai/chronon/service/FetcherVerticle.java

+        router.get("/v1/joins").handler(new JoinListHandler(fetcher));
+
+        // Set up route for retrieval of Join schema
+        router.get("/v1/join/schema/:name").handler(new JoinSchemaHandler(fetcher));


Suggested change

router.get("/v1/join/schema/:name").handler(new JoinSchemaHandler(fetcher));

router.get("/v1/join/:name/schema").handler(new JoinSchemaHandler(fetcher));

is the idiomatic way to do it

good call will update

nikhil-zlai · 2025-02-25T19:42:27Z

online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala

@@ -416,6 +424,29 @@ class Fetcher(val kvStore: KVStore,
    }
  }

+  def fetchJoinSchema(joinName: String): Try[JoinSchemaResponse] = {


should this be in metadata store?

I guess that means that join codec cache should move into metadatastore as well - which makes sense to me.

I went with this here to be able to leverage the joinCodecCache (and not rebuild the cache in the metadata store or the join codec per call)

## Summary Add a couple of APIs to help with the Etsy Patina integration. One is to list out all online joins and the second is to retrieve the join schema details for a given Join. As part of wiring up list support, I tweaked a couple of properties like the list pagination key / list call limit to make things consistent between DynamoDB and BigTable. For the BT implementation we issue a range query under the 'joins/' prefix. Subsequent calls (in case of pagination) continue off this range (verified this via unit tests and also basic sanity checks on Etsy). APIs added are: * /v1/joins -> Return the list of online joins * /v1/join/schema/join-name -> Return a payload consisting of {"joinName": "..", "keySchema": "avro schema", "valueSchema": "avro schema", "schemaHash": "hash"} Tested by dropping the docker container and confirming things on the Etsy side: ``` $ curl http://localhost:9000/v1/joins {"joinNames":["search.ranking.v1_web_zipline_cdc_and_beacon_external" ...} ``` And ``` curl http://localhost:9000/v1/join/schema/search.ranking.v1_web_zipline_cdc_and_beacon_external { big payload } ``` ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced new API endpoints that let users list available joins and retrieve detailed join schema information. - Added enhanced configuration options to support complex join workflows. - New test cases for validating join listing and schema retrieval functionalities. - Added new constants for pagination and entity type handling. - **Improvements** - Standardized pagination and entity handling across cloud integrations, ensuring a consistent and reliable data listing experience. - Enhanced error handling and response formatting for join-related requests. - Expanded testing capabilities with additional dependencies and resource inclusion.

## Summary Add a couple of APIs to help with the our clients ***REMOVED*** integration. One is to list out all online joins and the second is to retrieve the join schema details for a given Join. As part of wiring up list support, I tweaked a couple of properties like the list pagination key / list call limit to make things consistent between DynamoDB and BigTable. For the BT implementation we issue a range query under the 'joins/' prefix. Subsequent calls (in case of pagination) continue off this range (verified this via unit tests and also basic sanity checks on our clients). APIs added are: * /v1/joins -> Return the list of online joins * /v1/join/schema/join-name -> Return a payload consisting of {"joinName": "..", "keySchema": "avro schema", "valueSchema": "avro schema", "schemaHash": "hash"} Tested by dropping the docker container and confirming things on the our clients side: ``` $ curl http://localhost:9000/v1/joins {"joinNames":["search.ranking.v1_web_zipline_cdc_and_beacon_external" ...} ``` And ``` curl http://localhost:9000/v1/join/schema/search.ranking.v1_web_zipline_cdc_and_beacon_external { big payload } ``` ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced new API endpoints that let users list available joins and retrieve detailed join schema information. - Added enhanced configuration options to support complex join workflows. - New test cases for validating join listing and schema retrieval functionalities. - Added new constants for pagination and entity type handling. - **Improvements** - Standardized pagination and entity handling across cloud integrations, ensuring a consistent and reliable data listing experience. - Enhanced error handling and response formatting for join-related requests. - Expanded testing capabilities with additional dependencies and resource inclusion.

## Summary Add a couple of APIs to help with the our clients ***REMOVED*** integration. One is to list out all online joins and the second is to retrieve the join schema details for a given Join. As part of wiring up list support, I tweaked a couple of properties like the list pagination key / list call limit to make things consistent between DynamoDB and BigTable. For the BT implementation we issue a range query under the 'joins/' prefix. Subsequent calls (in case of pagination) continue off this range (verified this via unit tests and also basic sanity cheour clientss on our clients). APIs added are: * /v1/joins -> Return the list of online joins * /v1/join/schema/join-name -> Return a payload consisting of {"joinName": "..", "keySchema": "avro schema", "valueSchema": "avro schema", "schemaHash": "hash"} Tested by dropping the doour clientser container and confirming things on the our clients side: ``` $ curl http://localhost:9000/v1/joins {"joinNames":["search.ranking.v1_web_zipline_cdc_and_beacon_external" ...} ``` And ``` curl http://localhost:9000/v1/join/schema/search.ranking.v1_web_zipline_cdc_and_beacon_external { big payload } ``` ## Cheour clientslist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced new API endpoints that let users list available joins and retrieve detailed join schema information. - Added enhanced configuration options to support complex join workflows. - New test cases for validating join listing and schema retrieval functionalities. - Added new constants for pagination and entity type handling. - **Improvements** - Standardized pagination and entity handling across cloud integrations, ensuring a consistent and reliable data listing experience. - Enhanced error handling and response formatting for join-related requests. - Expanded testing capabilities with additional dependencies and resource inclusion.

coderabbitai bot reviewed Feb 25, 2025

View reviewed changes

piyush-zlai added 8 commits February 25, 2025 13:15

Add join list endpoint

c614277

Some tweaks

8b7f1db

Add join schema endpoint

f59e5d6

Rebase updates

b2adb51

Add list joins test

85d345a

Fix join schema code + tests & rename publish script

cdf5399

style: Apply scalafix and scalafmt changes

e55f810

Streamline some pojo stuff

26856c2

piyush-zlai force-pushed the piyush/fetcher_apis branch from 99a16c9 to 26856c2 Compare February 25, 2025 18:18

coderabbitai bot reviewed Feb 25, 2025

View reviewed changes

piyush-zlai requested review from nikhil-zlai and david-zlai February 25, 2025 18:34

nikhil-zlai approved these changes Feb 25, 2025

View reviewed changes

nikhil-zlai reviewed Feb 25, 2025

View reviewed changes

PR feedback

58aa047

piyush-zlai merged commit 544daab into main Feb 25, 2025
7 checks passed

piyush-zlai deleted the piyush/fetcher_apis branch February 25, 2025 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add list & join schema fetcher APIs #431

Add list & join schema fetcher APIs #431

piyush-zlai commented Feb 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2025 •

edited

Loading

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

nikhil-zlai left a comment

nikhil-zlai Feb 25, 2025

nikhil-zlai Feb 25, 2025

piyush-zlai Feb 25, 2025

nikhil-zlai Feb 25, 2025

nikhil-zlai Feb 25, 2025

piyush-zlai Feb 25, 2025

	router.get("/v1/join/schema/:name").handler(new JoinSchemaHandler(fetcher));
	router.get("/v1/join/:name/schema").handler(new JoinSchemaHandler(fetcher));

Add list & join schema fetcher APIs #431

Add list & join schema fetcher APIs #431

Conversation

piyush-zlai commented Feb 25, 2025 • edited by coderabbitai bot Loading

Summary

Checklist

Summary by CodeRabbit

coderabbitai bot commented Feb 25, 2025 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

nikhil-zlai left a comment

Choose a reason for hiding this comment

nikhil-zlai Feb 25, 2025

Choose a reason for hiding this comment

nikhil-zlai Feb 25, 2025

Choose a reason for hiding this comment

piyush-zlai Feb 25, 2025

Choose a reason for hiding this comment

nikhil-zlai Feb 25, 2025

Choose a reason for hiding this comment

nikhil-zlai Feb 25, 2025

Choose a reason for hiding this comment

piyush-zlai Feb 25, 2025

Choose a reason for hiding this comment

piyush-zlai commented Feb 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)