-
Notifications
You must be signed in to change notification settings - Fork 0
Add list & join schema fetcher APIs #431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request introduces new constants in the API for pagination and entity type identification. It refactors existing cloud, online, and service modules to replace hardcoded values with these constants, and updates tests accordingly. New endpoints and methods for join listing and join schema fetching are added, alongside build and resource configuration updates. Changes
Possibly related PRs
Suggested reviewers
Poem
Warning Review ran into problems🔥 ProblemsGitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms (4)
🔇 Additional comments (5)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
online/src/main/java/ai/chronon/online/JavaJoinSchemaResponse.java (1)
5-32
: Good interoperability implementation.Provides clean conversion between Java and Scala representations of join schema.
Consider adding null checks in the constructor to prevent potential NullPointerExceptions.
public JavaJoinSchemaResponse(Fetcher.JoinSchemaResponse scalaResponse){ + if (scalaResponse == null) { + throw new IllegalArgumentException("scalaResponse cannot be null"); + } this.joinName = scalaResponse.joinName(); this.keySchema = scalaResponse.keySchema(); this.valueSchema = scalaResponse.valueSchema(); this.schemaHash = scalaResponse.schemaHash(); }service/src/main/java/ai/chronon/service/handlers/JoinSchemaHandler.java (2)
1-51
: The implementation looks solid but consider enhancing error handling.Consider returning different HTTP status codes based on error types (e.g., 404 for not found, 400 for bad request) rather than always using 500. This would make the API more RESTful.
25-29
: Add input validation for the entity name.The path parameter is extracted but not validated. Consider adding validation to handle empty or invalid names before proceeding with fetching.
online/src/test/resources/joins/user_transactions.txn_join_d (2)
33-103
: Check groupBy correctness.
Verify logic for aggregation windows.
212-247
: Check merchant_data groupBy.
Confirm businesses with "is_big_merchant" are processed correctly.online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)
201-223
: doRetrieveAllListConfs pagination.
Recursive approach is fine. Watch for potential large memory usage with huge accumulations.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (24)
api/src/main/scala/ai/chronon/api/Constants.scala
(1 hunks)cloud_aws/BUILD.bazel
(1 hunks)cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
(3 hunks)cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
(2 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
(4 hunks)cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala
(3 hunks)distribution/publish_docker_images.sh
(2 hunks)online/BUILD.bazel
(2 hunks)online/src/main/java/ai/chronon/online/JavaFetcher.java
(3 hunks)online/src/main/java/ai/chronon/online/JavaJoinSchemaResponse.java
(1 hunks)online/src/main/scala/ai/chronon/online/JoinCodec.scala
(2 hunks)online/src/main/scala/ai/chronon/online/Metrics.scala
(1 hunks)online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
(3 hunks)online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala
(2 hunks)online/src/test/resources/joins/user_transactions.txn_join_a
(1 hunks)online/src/test/resources/joins/user_transactions.txn_join_d
(1 hunks)online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala
(1 hunks)service/BUILD.bazel
(1 hunks)service/src/main/java/ai/chronon/service/FetcherVerticle.java
(2 hunks)service/src/main/java/ai/chronon/service/handlers/FetchRouter.java
(1 hunks)service/src/main/java/ai/chronon/service/handlers/JoinListHandler.java
(1 hunks)service/src/main/java/ai/chronon/service/handlers/JoinSchemaHandler.java
(1 hunks)service/src/test/java/ai/chronon/service/handlers/JoinListHandlerTest.java
(1 hunks)service/src/test/java/ai/chronon/service/handlers/JoinSchemaHandlerTest.java
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (2)
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-11-12T09:38:33.532Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-11-12T09:38:33.532Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (2)
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-11-12T09:38:33.532Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-11-12T09:38:33.532Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.
online/src/test/resources/joins/user_transactions.txn_join_a (1)
Learnt from: chewy-zlai
PR: zipline-ai/chronon#30
File: api/py/test/sample/production/joins/risk/user_transactions.txn_join:217-218
Timestamp: 2024-11-12T09:38:33.532Z
Learning: The JSON files in this project are automatically generated and should not be manually modified or refactored.
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: non_spark_tests
- GitHub Check: non_spark_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (60)
service/BUILD.bazel (1)
32-32
: Added Avro dependency for join schema functionalityAdded Apache Avro library to test dependencies to support the new join schema fetching API.
online/src/main/scala/ai/chronon/online/Metrics.scala (1)
31-31
: Looks good, new metric for tracking join schema fetch operations.Consistent with existing metric naming patterns.
cloud_aws/BUILD.bazel (1)
27-27
: API lib added to test dependencies.Good addition that ensures test cases have access to API constants.
online/BUILD.bazel (2)
73-73
: Resources added to test library.Enables access to test resources for running test cases.
85-85
: Resources added to test suite.Consistent with test_lib configuration.
distribution/publish_docker_images.sh (3)
33-33
: Added AWS JAR build command.Ensures AWS implementation is built for Docker image.
37-37
: Added AWS JAR path variable.Consistent with existing GCP JAR variable.
50-53
: Added AWS JAR existence check.Prevents build failures by validating AWS JAR exists.
api/src/main/scala/ai/chronon/api/Constants.scala (1)
89-97
: Addition of constants for KV store pagination looks good.The new constants centralize pagination and entity type parameters for better maintainability.
service/src/main/java/ai/chronon/service/handlers/JoinListHandler.java (1)
13-46
: Clean implementation of join list handler.The handler properly follows Vert.x's non-blocking pattern by wrapping the CompletableFuture.
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (3)
3-3
: Updated imports to use centralized constants.Good refactoring to use the centralized constants from Constants.scala.
128-131
: Consistent use of pagination constants.First list request now uses the centralized ListLimit constant.
134-138
: Second list request updated to use new constants.The pagination continuation works as expected with the centralized constants.
service/src/main/java/ai/chronon/service/handlers/FetchRouter.java (1)
30-30
: LGTM - Method signature refactored to use JavaFetcher directly.Good change that removes indirect fetcher creation via Api, making dependencies clearer.
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (2)
3-3
: Updated imports use standardized constants.Replaced hardcoded strings with constants, improving maintainability.
205-250
: New pagination test looks good.Test thoroughly verifies entity type listing functionality with pagination.
online/src/main/java/ai/chronon/online/JavaFetcher.java (4)
19-19
: Added import for ScalaJavaConversions utility.Required for the new methods that convert Scala collections to Java.
29-29
: Added import for Scala Try utility.Needed for error handling in the schema fetching functionality.
175-180
: New listJoins method looks good.Efficiently converts Scala Future to Java CompletableFuture with proper type conversion.
182-185
: New fetchJoinSchema method looks good.Properly handles errors using JTry pattern for consistent error handling.
service/src/test/java/ai/chronon/service/handlers/JoinSchemaHandlerTest.java (5)
30-42
: Well-structured test class setup.Good initialization of class fields and mocks.
43-55
: Clean test setup method.The setup configures mocks with appropriate behavior needed for tests.
57-101
: Comprehensive test for successful schema retrieval.Test properly validates schema hash, join name, key/value schemas, and parses Avro schemas to ensure structural validity.
103-128
: Good error handling test.Test correctly checks failure scenario response code and structure.
130-136
: Reusable error validation helper.Helper method effectively validates error response format.
service/src/test/java/ai/chronon/service/handlers/JoinListHandlerTest.java (5)
31-42
: Well-structured test class setup.Class correctly initializes mocks and handler.
43-55
: Appropriate test setup method.Setup method properly initializes the test environment.
57-89
: Thorough successful list request test.Test validates HTTP status, content type, and response structure with expected join names.
91-118
: Good failure scenario test.Test properly simulates and validates error handling for failed KV store lookups.
120-126
: Reusable error validation helper.Helper method effectively validates error response structure.
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (3)
26-26
: Updated import to include new classes.Import now includes JoinSchemaResponse for the new functionality.
75-81
: Well-documented JoinSchemaResponse case class.Class provides clear structure for join schema with good documentation.
427-448
: Good implementation of fetchJoinSchema method.Method follows established patterns for:
- Metrics tracking
- Error handling
- Response construction
Appropriately uses joinCodecCache to retrieve schema information.
online/src/main/scala/ai/chronon/online/JoinCodec.scala (2)
37-38
: Simplified type references.Removed redundant serde prefix from AvroCodec types for cleaner code.
92-92
: Updated method signature for consistency.Method parameter types updated to match class field types.
online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala (1)
1-105
: Well-structured test for the MetadataStore's listJoins method!Complete test coverage with validation of online joins, error handling, and pagination logic.
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (4)
4-4
: Good refactoring to use centralized constants.Improves maintainability by using imported constants rather than hardcoded values.
169-173
: Appropriate use of ListLimit constant.Consistent with the refactoring approach to use centralized constants.
175-178
: Proper implementation of ContinuationKey.Clean transition from local constant to imported one.
196-196
: Consistent use of ContinuationKey in response mapping.Maintains consistent usage of the imported constant.
online/src/test/resources/joins/user_transactions.txn_join_a (1)
1-248
: JSON configuration appears correct.JSON structure is well-formatted and contains appropriate join configuration with "online": 0 flag.
service/src/main/java/ai/chronon/service/FetcherVerticle.java (5)
4-7
: Required imports for new functionality.Appropriate imports for the new handlers and JavaFetcher.
37-38
: Clean instantiation of JavaFetcher.Creates fetcher instance with appropriate parameters.
40-40
: Updated fetch route to use JavaFetcher.Good refactoring to use the fetcher instance instead of direct API access.
42-44
: New API endpoint for listing joins.Implements the first required API for listing all online joins as specified in the PR objectives.
45-47
: New API endpoint for join schema retrieval.Implements the second required API for retrieving schema details for a specified join.
online/src/test/resources/joins/user_transactions.txn_join_d (4)
1-19
: Validate naming consistency in metadata.
Check that all references match actual table names to avoid runtime errors.
20-32
: Ensure table references are valid.
Confirm "data.users" is correct and accessible.
105-173
: Confirm naming alignment.
"risk.transaction_events.txn_group_by_merchant" must sync with references.
175-210
: Review user_data groupBy.
Ensure "account_balance" and other fields meet usage constraints.online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (5)
24-24
: Imports look fine.
Needed for KVStore list operations.
31-31
: StandardCharsets import is good.
No issues.
177-181
: New method signature looks good.
Clear param name. Good default value usage.
182-200
: parseJoins logic.
Handles JSON decoding well. Ensure all exceptions are well-reported.
225-226
: Final call.
Keeps code simple. No issues.cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (5)
3-3
: New imports are correct.
215-219
: listLimit extraction.
Covers int/string fallback. Good approach.
221-222
: maybeListEntityType and maybeStartKey retrieval.
Clean property checks.
232-241
: Smart pattern matching.
Handles different start conditions. Nicely done.
264-264
: ContinuationKey usage.
Returns last key as next page start. Good for pagination.
99a16c9
to
26856c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
service/src/main/java/ai/chronon/service/handlers/JoinSchemaHandler.java (1)
31-42
: Consider differentiating error status codes.Always returning 500 for errors is generic. Differentiate between server errors and not-found conditions.
- logger.error("Unable to retrieve join schema for: {}", entityName, joinSchemaResponseTry.getException()); - - List<String> errorMessages = Collections.singletonList(joinSchemaResponseTry.getException().getMessage()); - - ctx.response() - .setStatusCode(500) - .putHeader("content-type", "application/json") - .end(new JsonObject().put("errors", errorMessages).encode()); + Throwable exception = joinSchemaResponseTry.getException(); + logger.error("Unable to retrieve join schema for: {}", entityName, exception); + + int statusCode = (exception instanceof NoSuchElementException) ? 404 : 500; + List<String> errorMessages = Collections.singletonList(exception.getMessage()); + + ctx.response() + .setStatusCode(statusCode) + .putHeader("content-type", "application/json") + .end(new JsonObject().put("errors", errorMessages).encode());service/src/test/java/ai/chronon/service/handlers/JoinListHandlerTest.java (1)
95-95
: Remove unused variable.This list isn't used in the test.
- List<String> joins = List.of("my_joins.join_a.v1", "my_joins.join_a.v2", "my_joins.join_b.v1");
service/src/test/java/ai/chronon/service/handlers/JoinSchemaHandlerTest.java (1)
130-136
: Consider enhancing error validation.Helper method checks for errors array existence, but could verify more error details.
private void validateFailureResponse(String jsonResponse, TestContext context) { JsonObject actualResponse = new JsonObject(jsonResponse); context.assertTrue(actualResponse.containsKey("errors")); String failureString = actualResponse.getJsonArray("errors").getString(0); context.assertNotNull(failureString); + context.assertTrue(failureString.contains("some fake failure")); }
online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)
201-226
: Pagination implementation looks good.The recursive implementation for retrieving all joins with pagination support is clean and includes proper metrics tracking.
Consider using immutable data structures instead of mutable ArrayBuffer for better functional style:
-def doRetrieveAllListConfs(acc: mutable.ArrayBuffer[String], +def doRetrieveAllListConfs(acc: Seq[String], paginationKey: Option[Any] = None): Future[Seq[String]] = { // ... fetchContext.kvStore.list(listRequest).flatMap { response => val joinSeq: Seq[String] = parseJoins(response) - val newAcc = acc ++ joinSeq + val newAcc = acc ++ joinSeq // No change needed here as ++ works on immutable Seq too // ... } } -doRetrieveAllListConfs(new mutable.ArrayBuffer[String]()) +doRetrieveAllListConfs(Seq.empty[String])
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (24)
api/src/main/scala/ai/chronon/api/Constants.scala
(1 hunks)cloud_aws/BUILD.bazel
(1 hunks)cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
(3 hunks)cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
(2 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
(4 hunks)cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala
(3 hunks)distribution/publish_docker_images.sh
(2 hunks)online/BUILD.bazel
(2 hunks)online/src/main/java/ai/chronon/online/JavaFetcher.java
(3 hunks)online/src/main/java/ai/chronon/online/JavaJoinSchemaResponse.java
(1 hunks)online/src/main/scala/ai/chronon/online/JoinCodec.scala
(2 hunks)online/src/main/scala/ai/chronon/online/Metrics.scala
(1 hunks)online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
(3 hunks)online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala
(2 hunks)online/src/test/resources/joins/user_transactions.txn_join_a
(1 hunks)online/src/test/resources/joins/user_transactions.txn_join_d
(1 hunks)online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala
(1 hunks)service/BUILD.bazel
(1 hunks)service/src/main/java/ai/chronon/service/FetcherVerticle.java
(2 hunks)service/src/main/java/ai/chronon/service/handlers/FetchRouter.java
(1 hunks)service/src/main/java/ai/chronon/service/handlers/JoinListHandler.java
(1 hunks)service/src/main/java/ai/chronon/service/handlers/JoinSchemaHandler.java
(1 hunks)service/src/test/java/ai/chronon/service/handlers/JoinListHandlerTest.java
(1 hunks)service/src/test/java/ai/chronon/service/handlers/JoinSchemaHandlerTest.java
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (16)
- online/src/main/scala/ai/chronon/online/Metrics.scala
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
- distribution/publish_docker_images.sh
- cloud_aws/BUILD.bazel
- online/BUILD.bazel
- service/BUILD.bazel
- online/src/main/java/ai/chronon/online/JavaJoinSchemaResponse.java
- service/src/main/java/ai/chronon/service/handlers/JoinListHandler.java
- service/src/main/java/ai/chronon/service/handlers/FetchRouter.java
- api/src/main/scala/ai/chronon/api/Constants.scala
- cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala
- cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
- online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala
- online/src/main/scala/ai/chronon/online/JoinCodec.scala
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
- online/src/test/resources/joins/user_transactions.txn_join_a
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: non_spark_tests
- GitHub Check: non_spark_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (15)
online/src/main/java/ai/chronon/online/JavaFetcher.java (2)
175-180
: LGTM: listJoins implementation.Clean conversion from Scala Future to Java CompletableFuture with proper type handling.
182-185
: LGTM: fetchJoinSchema implementation.Effective Scala-to-Java mapping with proper error propagation.
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (2)
75-82
: LGTM: Well-documented schema response model.Comprehensive documentation with clear parameter descriptions.
427-448
: LGTM: Thorough implementation with proper metrics.Handles errors gracefully, includes appropriate metrics, and leverages existing cache.
service/src/test/java/ai/chronon/service/handlers/JoinListHandlerTest.java (2)
57-89
: LGTM: Well-structured success test.Comprehensive verification of status code, headers and response content.
91-118
: LGTM: Thorough error handling test.Good coverage of the error path with explicit assertions.
service/src/test/java/ai/chronon/service/handlers/JoinSchemaHandlerTest.java (2)
57-101
: Well-structured test for successful schema fetch.The test validates response status, content type, and JSON structure correctly. It also verifies Avro schema parsing.
103-128
: Error handling test looks good.Test properly verifies 500 status code and error response structure when fetcher returns failure.
service/src/main/java/ai/chronon/service/FetcherVerticle.java (4)
37-38
: JavaFetcher instance created correctly.Good initialization of the fetcher with appropriate parameters.
40-40
: Updated fetch route to use new fetcher.Properly modified to use fetcher instead of api directly.
42-44
: New route for listing joins.API endpoint aligns well with PR objectives.
45-46
: New route for join schema retrieval.Endpoint correctly uses path parameter for join name.
online/src/test/resources/joins/user_transactions.txn_join_d (1)
1-248
: Well-structured join configuration.The configuration defines a comprehensive join between users and transaction data with appropriate metadata, dependencies, and aggregations.
online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (2)
24-24
: Updated imports to include List operations.Correctly added ListRequest and ListResponse types for new functionality.
177-199
: Well-implemented joins parsing.The parseJoins function correctly converts response values to strings, deserializes them, and filters based on online status. Error handling is properly implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very thorough pr! nicely done sir
// if we got a start row key, lets wire it up | ||
maybeStartKey.foreach { startKey => | ||
query.range(ByteStringRange.unbounded().startOpen(ByteString.copyFrom(startKey.asInstanceOf[Array[Byte]]))) | ||
(maybeStartKey, maybeListEntityType) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat!
router.get("/v1/joins").handler(new JoinListHandler(fetcher)); | ||
|
||
// Set up route for retrieval of Join schema | ||
router.get("/v1/join/schema/:name").handler(new JoinSchemaHandler(fetcher)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
router.get("/v1/join/schema/:name").handler(new JoinSchemaHandler(fetcher)); | |
router.get("/v1/join/:name/schema").handler(new JoinSchemaHandler(fetcher)); |
is the idiomatic way to do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call will update
@@ -416,6 +424,29 @@ class Fetcher(val kvStore: KVStore, | |||
} | |||
} | |||
|
|||
def fetchJoinSchema(joinName: String): Try[JoinSchemaResponse] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be in metadata store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that means that join codec cache should move into metadatastore as well - which makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with this here to be able to leverage the joinCodecCache (and not rebuild the cache in the metadata store or the join codec per call)
## Summary Add a couple of APIs to help with the Etsy Patina integration. One is to list out all online joins and the second is to retrieve the join schema details for a given Join. As part of wiring up list support, I tweaked a couple of properties like the list pagination key / list call limit to make things consistent between DynamoDB and BigTable. For the BT implementation we issue a range query under the 'joins/' prefix. Subsequent calls (in case of pagination) continue off this range (verified this via unit tests and also basic sanity checks on Etsy). APIs added are: * /v1/joins -> Return the list of online joins * /v1/join/schema/join-name -> Return a payload consisting of {"joinName": "..", "keySchema": "avro schema", "valueSchema": "avro schema", "schemaHash": "hash"} Tested by dropping the docker container and confirming things on the Etsy side: ``` $ curl http://localhost:9000/v1/joins {"joinNames":["search.ranking.v1_web_zipline_cdc_and_beacon_external" ...} ``` And ``` curl http://localhost:9000/v1/join/schema/search.ranking.v1_web_zipline_cdc_and_beacon_external { big payload } ``` ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced new API endpoints that let users list available joins and retrieve detailed join schema information. - Added enhanced configuration options to support complex join workflows. - New test cases for validating join listing and schema retrieval functionalities. - Added new constants for pagination and entity type handling. - **Improvements** - Standardized pagination and entity handling across cloud integrations, ensuring a consistent and reliable data listing experience. - Enhanced error handling and response formatting for join-related requests. - Expanded testing capabilities with additional dependencies and resource inclusion. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Add a couple of APIs to help with the Etsy Patina integration. One is to list out all online joins and the second is to retrieve the join schema details for a given Join. As part of wiring up list support, I tweaked a couple of properties like the list pagination key / list call limit to make things consistent between DynamoDB and BigTable. For the BT implementation we issue a range query under the 'joins/' prefix. Subsequent calls (in case of pagination) continue off this range (verified this via unit tests and also basic sanity checks on Etsy). APIs added are: * /v1/joins -> Return the list of online joins * /v1/join/schema/join-name -> Return a payload consisting of {"joinName": "..", "keySchema": "avro schema", "valueSchema": "avro schema", "schemaHash": "hash"} Tested by dropping the docker container and confirming things on the Etsy side: ``` $ curl http://localhost:9000/v1/joins {"joinNames":["search.ranking.v1_web_zipline_cdc_and_beacon_external" ...} ``` And ``` curl http://localhost:9000/v1/join/schema/search.ranking.v1_web_zipline_cdc_and_beacon_external { big payload } ``` ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced new API endpoints that let users list available joins and retrieve detailed join schema information. - Added enhanced configuration options to support complex join workflows. - New test cases for validating join listing and schema retrieval functionalities. - Added new constants for pagination and entity type handling. - **Improvements** - Standardized pagination and entity handling across cloud integrations, ensuring a consistent and reliable data listing experience. - Enhanced error handling and response formatting for join-related requests. - Expanded testing capabilities with additional dependencies and resource inclusion. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Add a couple of APIs to help with the Etsy Patina integration. One is to list out all online joins and the second is to retrieve the join schema details for a given Join. As part of wiring up list support, I tweaked a couple of properties like the list pagination key / list call limit to make things consistent between DynamoDB and BigTable. For the BT implementation we issue a range query under the 'joins/' prefix. Subsequent calls (in case of pagination) continue off this range (verified this via unit tests and also basic sanity checks on Etsy). APIs added are: * /v1/joins -> Return the list of online joins * /v1/join/schema/join-name -> Return a payload consisting of {"joinName": "..", "keySchema": "avro schema", "valueSchema": "avro schema", "schemaHash": "hash"} Tested by dropping the docker container and confirming things on the Etsy side: ``` $ curl http://localhost:9000/v1/joins {"joinNames":["search.ranking.v1_web_zipline_cdc_and_beacon_external" ...} ``` And ``` curl http://localhost:9000/v1/join/schema/search.ranking.v1_web_zipline_cdc_and_beacon_external { big payload } ``` ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced new API endpoints that let users list available joins and retrieve detailed join schema information. - Added enhanced configuration options to support complex join workflows. - New test cases for validating join listing and schema retrieval functionalities. - Added new constants for pagination and entity type handling. - **Improvements** - Standardized pagination and entity handling across cloud integrations, ensuring a consistent and reliable data listing experience. - Enhanced error handling and response formatting for join-related requests. - Expanded testing capabilities with additional dependencies and resource inclusion. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Add a couple of APIs to help with the our clients ***REMOVED*** integration. One is to list out all online joins and the second is to retrieve the join schema details for a given Join. As part of wiring up list support, I tweaked a couple of properties like the list pagination key / list call limit to make things consistent between DynamoDB and BigTable. For the BT implementation we issue a range query under the 'joins/' prefix. Subsequent calls (in case of pagination) continue off this range (verified this via unit tests and also basic sanity checks on our clients). APIs added are: * /v1/joins -> Return the list of online joins * /v1/join/schema/join-name -> Return a payload consisting of {"joinName": "..", "keySchema": "avro schema", "valueSchema": "avro schema", "schemaHash": "hash"} Tested by dropping the docker container and confirming things on the our clients side: ``` $ curl http://localhost:9000/v1/joins {"joinNames":["search.ranking.v1_web_zipline_cdc_and_beacon_external" ...} ``` And ``` curl http://localhost:9000/v1/join/schema/search.ranking.v1_web_zipline_cdc_and_beacon_external { big payload } ``` ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced new API endpoints that let users list available joins and retrieve detailed join schema information. - Added enhanced configuration options to support complex join workflows. - New test cases for validating join listing and schema retrieval functionalities. - Added new constants for pagination and entity type handling. - **Improvements** - Standardized pagination and entity handling across cloud integrations, ensuring a consistent and reliable data listing experience. - Enhanced error handling and response formatting for join-related requests. - Expanded testing capabilities with additional dependencies and resource inclusion. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Add a couple of APIs to help with the our clients ***REMOVED*** integration. One is to list out all online joins and the second is to retrieve the join schema details for a given Join. As part of wiring up list support, I tweaked a couple of properties like the list pagination key / list call limit to make things consistent between DynamoDB and BigTable. For the BT implementation we issue a range query under the 'joins/' prefix. Subsequent calls (in case of pagination) continue off this range (verified this via unit tests and also basic sanity checks on our clients). APIs added are: * /v1/joins -> Return the list of online joins * /v1/join/schema/join-name -> Return a payload consisting of {"joinName": "..", "keySchema": "avro schema", "valueSchema": "avro schema", "schemaHash": "hash"} Tested by dropping the docker container and confirming things on the our clients side: ``` $ curl http://localhost:9000/v1/joins {"joinNames":["search.ranking.v1_web_zipline_cdc_and_beacon_external" ...} ``` And ``` curl http://localhost:9000/v1/join/schema/search.ranking.v1_web_zipline_cdc_and_beacon_external { big payload } ``` ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced new API endpoints that let users list available joins and retrieve detailed join schema information. - Added enhanced configuration options to support complex join workflows. - New test cases for validating join listing and schema retrieval functionalities. - Added new constants for pagination and entity type handling. - **Improvements** - Standardized pagination and entity handling across cloud integrations, ensuring a consistent and reliable data listing experience. - Enhanced error handling and response formatting for join-related requests. - Expanded testing capabilities with additional dependencies and resource inclusion. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Add a couple of APIs to help with the our clients ***REMOVED*** integration. One is to list out all online joins and the second is to retrieve the join schema details for a given Join. As part of wiring up list support, I tweaked a couple of properties like the list pagination key / list call limit to make things consistent between DynamoDB and BigTable. For the BT implementation we issue a range query under the 'joins/' prefix. Subsequent calls (in case of pagination) continue off this range (verified this via unit tests and also basic sanity cheour clientss on our clients). APIs added are: * /v1/joins -> Return the list of online joins * /v1/join/schema/join-name -> Return a payload consisting of {"joinName": "..", "keySchema": "avro schema", "valueSchema": "avro schema", "schemaHash": "hash"} Tested by dropping the doour clientser container and confirming things on the our clients side: ``` $ curl http://localhost:9000/v1/joins {"joinNames":["search.ranking.v1_web_zipline_cdc_and_beacon_external" ...} ``` And ``` curl http://localhost:9000/v1/join/schema/search.ranking.v1_web_zipline_cdc_and_beacon_external { big payload } ``` ## Cheour clientslist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced new API endpoints that let users list available joins and retrieve detailed join schema information. - Added enhanced configuration options to support complex join workflows. - New test cases for validating join listing and schema retrieval functionalities. - Added new constants for pagination and entity type handling. - **Improvements** - Standardized pagination and entity handling across cloud integrations, ensuring a consistent and reliable data listing experience. - Enhanced error handling and response formatting for join-related requests. - Expanded testing capabilities with additional dependencies and resource inclusion. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Add a couple of APIs to help with the Etsy Patina integration. One is to list out all online joins and the second is to retrieve the join schema details for a given Join.
As part of wiring up list support, I tweaked a couple of properties like the list pagination key / list call limit to make things consistent between DynamoDB and BigTable.
For the BT implementation we issue a range query under the 'joins/' prefix. Subsequent calls (in case of pagination) continue off this range (verified this via unit tests and also basic sanity checks on Etsy).
APIs added are:
Tested by dropping the docker container and confirming things on the Etsy side:
And
Checklist
Summary by CodeRabbit
New Features
Improvements