Add TimestampType support for when converting to Avro from chronon type #684

david-zlai · 2025-04-23T22:31:11Z

Summary

Tested the cdc entity source GBU job from etsy and it passed. But would probably need to test if this works on the fetch side

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

New Features
- Added support for TimestampType when converting Chronon data types to Avro schemas, enabling proper representation of timestamp values.
- Enhanced Avro serialization to correctly handle timestamp fields with millisecond precision.
Bug Fixes
- Added a test to verify accurate timestamp conversion in BigQuery integration under different Java time API configurations.
Documentation
- Clarified mapping behavior of TimestampType with respect to Java 8 time API usage in detailed comments.
Style
- Improved code formatting and indentation in logging statements.
- Removed unnecessary blank lines for cleaner code structure.

## Summary building join output schema should belong to metadata store - and also reduces the size of fetcher. ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced an optimized caching mechanism for data join operations, resulting in improved performance and reliability. - Added new methods to facilitate the creation and management of join codecs. - **Bug Fixes** - Enhanced error handling for join codec operations, ensuring clearer context for failures. - **Documentation** - Improved code readability and clarity through updated comments and method signatures.

## Summary This is a hodge-podge of various improvements to the frontend: - [Collapsible sections arrow updates](https://app.asana.com/0/0/1209437960912812) - [Use Args version in all thrift types](https://app.asana.com/0/0/1209410918917485) - lots of code changes, but typescript should still be happy :) - [prefer relative imports instead of ../](https://app.asana.com/0/0/1209445322143679) - [Details page expand arrow color is hard to see](https://app.asana.com/0/0/1209445322143673) - [Link to entity pages](https://app.asana.com/0/0/1209411287539655) - [Show icons similar to lineage (with color)](https://app.asana.com/0/0/1209411287539654) - [Dialog modals should track to a URL](https://app.asana.com/0/0/1209343349099309) - I'd focus testing on this, making sure that the dialogs on /observability/drift and /overview persist on reload, work with the back button, clear query params on close/back, etc etc. - [dont sort series z-a](https://app.asana.com/0/0/1209445333142030) - [All chart axis/styles should be consistent](https://app.asana.com/0/1209162816962686/1209402326561211) ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Enhanced expandable cells with new properties for improved functionality. - Introduced new functions for processing series data and extracting columns. - Improved URL state management on job tracking, observability, and overview pages, allowing more intuitive navigation. - **Style** - Updated icon rotation in collapsible sections for clearer visual cues. - Refined cursor styling on dropdown items to enhance interactivity. - **Refactor** - Streamlined chart data processing for smoother performance. - Consolidated configuration displays for a cleaner, more responsive user experience. - Updated type definitions across various components for better clarity and consistency.

…`parseJson` support (#426) ## Summary - Resolves the performance regression in later versions (we downgraded to `0.0.2` as a workaround) - Supports single clicking on label to expand/collapse - Adds new `parseJson` prop to parse `metaData.customJson` and `metadata.dependencies` ([Asana](https://app.asana.com/0/home/1208932362205799/1209446053233125)) Uploading CleanShot 2025-02-24 at 11.12.58.mp4… ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  Co-authored-by: Sean Lynch <[email protected]>

## Summary https://github.com/user-attachments/assets/c0e2ddc7-1ebb-40c1-a541-281153c8f1a9 ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update --- - To see the specific tasks where the Asana app for GitHub is being used, see below: - https://app.asana.com/0/0/1209437960912821 - https://app.asana.com/0/0/1209445122069294  ## Summary by CodeRabbit - **New Features** - Added a dedicated chart legend displaying distinct "Streaming" and "Batch" items for improved clarity. - **Refactor** - Updated the main chart layout to utilize a grid structure for better organization of elements. - Streamlined tooltip rendering to enhance visual consistency. - **Chores** - Updated the `layerchart` dependency to version `^0.99.4` for potential improvements and new features.   --------- Co-authored-by: Sean Lynch <[email protected]>

## Summary Add support to run the fetcher service in docker. Also add rails to publish to docker hub as a private image - [ziplineai/chronon-fetcher](https://hub.docker.com/repository/docker/ziplineai/chronon-fetcher) I wasn't able to sort out logback / log4j2 logging as there's a lot of deps messing things up - Vert.x supports JUL configs and that is seemingly working so starting with that for now. Tested with: ``` docker run -v ~/.config/gcloud/application_default_credentials.json:/gcp/credentials.json \ -p 9000:9000 \ -e "GCP_PROJECT_ID=canary-443022" \ -e "GOOGLE_CLOUD_PROJECT=canary-443022" \ -e "GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance" \ -e "STATSD_HOST=127.0.0.1" \ -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/credentials.json \ ziplineai/chronon-fetcher ``` And then you can `curl http://localhost:9000/ping` On Etsy side just need to swap out the project and bt instance id and then can curl the actual join: ``` curl -X POST http://localhost:9000/v1/fetch/join/search.ranking.v1_web_zipline_cdc_and_beacon_external -H 'Content-Type: application/json' -d '[{"listing_id":"632126370","shop_id":"53908089","shipping_profile_id":"235561688531"}]' {"results":[{"status":"Success","entityKeys":{"listing_id":"632126370","shop_id":"53908089","shipping_profile_id":"235561688531"},"features":{... ``` ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Added an automation script that streamlines the container image build and publication process with improved error handling. - Introduced a new container configuration that installs essential dependencies, sets environment variables, and incorporates a health check for enhanced reliability. - Implemented a robust logging setup that standardizes console and file outputs with log rotation. - Provided a startup script for the service that verifies required settings and applies platform-specific options for seamless execution.

@ken-zlai

## Summary @ken-zlai missed removing the pointer events on the `<Svg>` after [simplifying](84e2292) (they were ignored on the `<Chart>` container). This fixes clicking on a node to open the drilldown dialog. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  Co-authored-by: Sean Lynch <[email protected]>

## Summary Adds the ability to push artifacts to aws in addition to gcp. Also adds ability to specify specific customer ids to push to. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced a new automation script that streamlines the process of building artifacts and deploying them to both AWS and GCP with improved error handling and user confirmation. - **Chores** - Removed a legacy artifact upload script that previously handled only GCP deployments.

## Summary - Supporting StagingQueries for configurable compute engines. To support BigQuery, the simplest way is to just write bigquery sql and run it on bq to create the final table. Let's first make the API change. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Added an option for users to specify the compute engine when processing queries, offering choices such as Spark and BigQuery. - Introduced validation to ensure that queries run only with the designated engine. - **Style** - Streamlined code organization for enhanced readability. - Consolidated and reordered import statements for improved clarity.   --------- Co-authored-by: Thomas Chow <[email protected]>

## Summary fetcher has grown over time into a large file with many large functions that are hard to work with. This refactoring doesn't change any functionality - just placement. Made some of the scala code more idiomatic - if(try.isFailed) - vs try.recoverWith Made Metadata methods more explicit FetcherBase -> JoinPartFetcher + GroupByFetcher + GroupByResponseHandler Added fetch context - to replace 10 constructor params ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced a unified configuration context that enhances data fetching, including improved group-by and join operations with more robust error handling. - Added a new `FetchContext` class to manage fetching operations and execution contexts. - Implemented a new `GroupByFetcher` class for efficient group-by data retrieval. - **Refactor** - Upgraded serialization and deserialization to use a more efficient, compact protocol. - Standardized API definitions and type declarations across modules to improve clarity and maintainability. - Enhanced error handling in various methods to provide more informative messages. - **Chores** - Removed outdated utilities and reorganized dependency imports. - Updated test suites to align with the refactored architecture.

## Summary - Staging query should in theory already work for external tables without additional code changes as long as we do some setup work to pin up a view first. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  --------- Co-authored-by: Thomas Chow <[email protected]>

## Summary The existing aggregations configure the items sketch incorrectly. Split it into two one that works purely with skewed data, and one that tries to best-effort collect most frequent items. ## Checklist - [x] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced new utility functions to streamline expression composition and cleanup. - Enhanced aggregation descriptions for clearer operation choices. - Added new aggregation types for improved data analysis. - **Refactor** - Revamped frequency analysis logic with improved error handling and optimized sizing. - Replaced legacy histogram approaches with a more robust frequent item detection mechanism. - **Tests** - Added tests to validate heavy hitter detection and skewed data scenarios, while removing obsolete histogram tests. - Updated existing tests to reflect changes in aggregation parameters. - **Chores** - Removed deprecated interactive modules for a leaner deployment. - **Configuration** - Adjusted default aggregation parameters for more consistent processing, including changes to the `k` value in multiple configurations.

## Summary - https://zipline-2kh4520.slack.com/archives/C087YSYJ5NZ/p1739480276997719?thread_ts=1739218764.927269&cid=C087YSYJ5NZ - https://app.asana.com/0/1208811056393706/1209448137185580 ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced a dataset generation tool that creates sample data using Apache Spark and Hudi. - Generates random entries representing user and device characteristics, and structures the data with partitioning for efficient storage. - Supports integration with cloud storage and Hive synchronization for streamlined data management.   --------- Co-authored-by: Thomas Chow <[email protected]>

## Summary Updates workflow push-to-canary to use Bazel instead of sbt. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [x ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Chores** - Modernized the build and deployment pipeline to improve efficiency and reliability. - Updated the container configuration to use a refined artifact structure, supporting smoother image creation. - Enhanced caching steps to further streamline the release workflow and optimize build times.

## Summary Add a couple of APIs to help with the Etsy Patina integration. One is to list out all online joins and the second is to retrieve the join schema details for a given Join. As part of wiring up list support, I tweaked a couple of properties like the list pagination key / list call limit to make things consistent between DynamoDB and BigTable. For the BT implementation we issue a range query under the 'joins/' prefix. Subsequent calls (in case of pagination) continue off this range (verified this via unit tests and also basic sanity checks on Etsy). APIs added are: * /v1/joins -> Return the list of online joins * /v1/join/schema/join-name -> Return a payload consisting of {"joinName": "..", "keySchema": "avro schema", "valueSchema": "avro schema", "schemaHash": "hash"} Tested by dropping the docker container and confirming things on the Etsy side: ``` $ curl http://localhost:9000/v1/joins {"joinNames":["search.ranking.v1_web_zipline_cdc_and_beacon_external" ...} ``` And ``` curl http://localhost:9000/v1/join/schema/search.ranking.v1_web_zipline_cdc_and_beacon_external { big payload } ``` ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced new API endpoints that let users list available joins and retrieve detailed join schema information. - Added enhanced configuration options to support complex join workflows. - New test cases for validating join listing and schema retrieval functionalities. - Added new constants for pagination and entity type handling. - **Improvements** - Standardized pagination and entity handling across cloud integrations, ensuring a consistent and reliable data listing experience. - Enhanced error handling and response formatting for join-related requests. - Expanded testing capabilities with additional dependencies and resource inclusion.

@david-zlai

## Summary #398 updated the module path from `"/"` to `"."`, but not all code was migrated to the new convention, causing frontend API calls to fail when retrieving joins. @david-zlai – Can you review the code to ensure it fully aligns with the new convention? @sean-zlai – Can you tear down all Docker images and rebuild on this branch to confirm observability works as expected? ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Refactor** - Streamlined how configuration names are handled in observability views. Names are now displayed as originally provided without extra formatting, ensuring a consistent and straightforward presentation. The fallback label remains “Unknown” when a name is not available.

## Summary - Everywhere else we want to handle partitions that could be non-string types. This is similar to the change in: https://github.com/zipline-ai/chronon/blob/3d2e77da18e8fa81a5471935a7358937ed8f9f13/cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala#L122-L128 ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Enhanced partition date display by introducing configurable date formatting. - Partition dates are now consistently formatted based on user configuration, ensuring reliable and predictable output across the system. - Improved retrieval of partition format for BigQuery operations, allowing for broader usage across different packages.   --------- Co-authored-by: Thomas Chow <[email protected]>

…r drilldown referential entities (sources, joinParts, etc) (#433)

## Summary Enable batch IR caching by default & fix an issue where our Vertx init code tries to connect to BT at startup and takes a second or two on the worker threads (and results in the warning - 'Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 2976 ms, time limit is 2000 ms'). ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Refactor** - Streamlined caching configuration and logic with a consistent default setting for improved behavior. - Enhanced service startup by shifting to asynchronous initialization with better error handling for a more robust launch. - **Tests** - Removed an outdated test case that validated previous caching behavior.

## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Chores** - Upgraded core styling dependencies and configuration to leverage the latest standards. - **Style** - Refined visual elements across the interface for enhanced consistency and responsiveness. - Adjusted focus outlines, shadows, and layout utilities on various interactive components. - Enhanced grid and backdrop effects to improve overall visual clarity. These updates deliver a smoother and more polished experience without changing the underlying functionality.

## Summary This PR allows the frontend to specify which percentiles it retrieves from the backend. The percentiles can be passed as a query parameter: ``` percentiles=p0,p10,p90 ``` If omitted, the default percentiles are used: ``` percentiles=p5,p50,p95 ``` ### Example Requests *(App must be running)* #### Default (uses `p5,p50,p95`) ```sh curl "http://localhost:5173/api/v1/join/risk.user_transactions.txn_join/column/txn_by_user_transaction_amount_count_1h/summary?startTs=1672531200000&endTs=1677628800000" ``` #### Equivalent Explicit Default ```sh curl "http://localhost:5173/api/v1/join/risk.user_transactions.txn_join/column/txn_by_user_transaction_amount_count_1h/summary?startTs=1672531200000&endTs=1677628800000&percentiles=p5,p50,p95" ``` #### Custom Percentiles (`p0,p10,p90`) ```sh curl "http://localhost:5173/api/v1/join/risk.user_transactions.txn_join/column/txn_by_user_transaction_amount_count_1h/summary?startTs=1672531200000&endTs=1677628800000&percentiles=p0,p10,p90" ``` ### Notes - Omitting the `percentiles` parameter is the same as explicitly setting `percentiles=p5,p50,p95`. - You can test using `curl` or Postman. - We need to let users change these percentiles via checkboxes or another UI control. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Added support for customizable percentile parameters in summary data requests, with a default setting of "p5, p50, p95". - Enhanced the ability to retrieve detailed statistical summaries by allowing users to specify percentile values when querying data. - Introduced two new optional dependencies for improved functionality. - **Bug Fixes** - Adjusted method signatures to ensure compatibility with the new percentile parameters in various components.

## Summary Just released 1.0 releases for LayerChart, LayerStack, and Svelte UX packages which have Svelte 3-5 and Tailwind 3 compatibility ([announcement](https://bsky.app/profile/techniq.dev/post/3lj3sjabpns2z)) (LayerChart 1.0 is also 98% Tailwind 4 [compatibility](techniq/layerchart#388)). This mostly paves the way to start on `next` branches / `2.0` releases which will leverage Svelte 5 runes/snippets and Tailwind 4 (along with a lot of other planned features). ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  Co-authored-by: Sean Lynch <[email protected]>

## Summary I noticed we were missing the core chronon fetcher logs during feature lookup requests. As we anyway wanted to rip out the JUL & logback, I went ahead and dropped those for a log4j2 properties file. Confirmed that I am seeing the relevant fetcher logs from classes like the SawtoothOnlineAggregator etc when I hit the service with a feature look up request. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Refactor** - Consolidated service deployment paths and streamlined startup configuration. - Improved metrics handling by conditionally enabling reporting based on environment settings. - **Chores** - Optimized resource packaging and removed legacy dependencies. - Upgraded logging configuration to enhance performance and log management.

## Summary Added Team, Online, Production and renamed LogicalNodeTable -> EntityTable ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Enhanced the table view with new columns for "Team," "Online," and "Production" data. - Added visual badges to clearly indicate boolean statuses. - **Refactor** - Replaced the `LogicalNodeTable` component with the updated `EntityTable` component to improve data presentation.

#438) ## Summary 1. added offset and bound support to staging query macros `{{ start_date }}` is valid as before, now `{{ start_date(offset=-10, lower_bound='2023-01-01') }}` is also valid 2. Previously we required users to pass in quotes around the macro separately. This pr removes the need for it `{{ start_date }}` used to become `2023-01-01`, it now becomes `'2023-01-01'` 2. added a unified top level module `api.chronon.types` that contain everything that users need. 3. added wrappers on source sub types to directly return sources ```py ttypes.Source(events=ttypes.EventSource(...)) # now becomes EventSource(...) ``` ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Added new functions for creating event, entity, and join data sources. - Introduced enhanced date macro utilities to enable flexible SQL query substitutions. - **Refactor** - Streamlined naming conventions and standardized parameter formatting. - Consolidated and simplified import structures for improved consistency. - Updated method signatures and calls from `select` to `selects` across various components. - Removed reliance on `ttypes` for source definitions and standardized parameter naming conventions. - Simplified macro substitution logic in the `StagingQuery` object. - **Tests** - Implemented comprehensive tests for date manipulation features to ensure robust behavior. - Updated existing tests to reflect changes in method names and query formatting. - Adjusted data generation parameters in tests to increase transaction volumes. - **Documentation** - Updated configuration descriptions to clearly illustrate new date template options and parameter adjustments.

## Summary cleaning up top level dir ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Chores** - Refined version control and build settings by updating ignored paths and tool versions. - Removed obsolete internal configurations, tooling, and Docker build files for a cleaner project structure. - **Documentation** - Updated installation guidance links for clearer setup instructions. - Eliminated legacy contributor, governance, and quickstart guides to reduce clutter.

## Summary No turning back now ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Refactor** - Removed legacy internal components from workflow orchestration and task management to streamline operations. - **Documentation** - Updated deployment guidance by removing outdated references. These internal improvements enhance maintainability and performance without altering your current user experience.

## Summary move OSS docsite release scripts ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Chores** - Made behind‑the‑scenes updates to streamline our internal release management processes. There are no visible changes to functionality for end-users in this release.

## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Chores** - Consolidated and streamlined build dependencies for improved integration with AWS services and data processing libraries. - Expanded the set of supported third-party libraries, including new artifacts for enhanced performance and compatibility. - Added new dependencies for Hudi, Jackson, and Zookeeper to enhance functionality. - Introduced additional Hudi artifacts for Scala 2.12 and 2.13 to broaden available functionalities. - **Tests** - Added a new test class to verify reliable write/read operations on Hudi tables using a Spark session. - **Refactor** - Enhanced serialization registration to support a broader range of data types, improving overall processing stability. - Introduced a new variable for shared library dependencies to simplify dependency management.   --------- Co-authored-by: Thomas Chow <[email protected]>

## Summary - Improve code organization - rename `ActionButtons` to `SortButton` (only remaining use case) - move `PageHeader` out of `EntityTable` and into page - remove `Entity.ts` nesting - Update `Learn` button href based on conf/entity type (ex. `Join` goes to applicable [page](https://chronon.ai/authoring_features/Join.html) for both list and individual entity) - Rename `Distributions` to `Summary` tab/route (feedback from meeting, and uses `getColumnSummary()` API) - Sort default metric (PSI) first - Fix resetting zoom by clicking on chart (without drag) ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - **Chart Controls:** Now feature a streamlined interface with a dedicated sort button replacing extra action buttons. - **Drift Metrics:** Display metrics in a sorted order with the default metric prioritized. - **Entity Tables:** Automatically derive header labels for clearer data presentation. - **Page Headers:** Now support an optional "Learn More" link to guide users to additional resources. - **Navigation & Summary Views:** Updated tabs and layouts now emphasize "Summary" data with column summaries. - **Entity Configurations:** Enhanced to include resource links for further learning. - **Bug Fixes** - Improved error handling in data retrieval processes for better user experience. - **Documentation** - Updated documentation to reflect changes in entity configurations and navigation structures.   --------- Co-authored-by: Sean Lynch <[email protected]>

…partitions (#690) ## Summary - Getting a 403 querying for a range of partitions in bigquery native tables: ``` Response too large to return. Consider specifying a destination table in your job configuration ``` - instead, let's just query individual partitions of data as separate dataframes and union them together. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update   ## Summary by CodeRabbit - **Bug Fixes** - Improved handling of BigQuery partitioned tables, ensuring more accurate partition filtering and data retrieval. - **Refactor** - Streamlined the process for reading partitioned data from BigQuery, resulting in a clearer and more consistent approach for users working with partitioned tables.  --------- Co-authored-by: Thomas Chow <[email protected]>

## Summary ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Bug Fixes** - Ensured the order of keys in query selections is preserved as provided by the user. - **Style** - Improved formatting and spacing for better readability without affecting functionality. - Enhanced ordering consistency in dependency metadata for stable JSON outputs.

## Summary Disabling analyzer checks ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Refactor** - Disabled the join configuration validation step before starting join jobs. - Updated time range calculation logic for certain join scenarios to improve consistency.  --------- Co-authored-by: ezvz <[email protected]>

## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Added support for handling BigQuery views when loading tables, improving compatibility with a wider range of BigQuery table types. - **Bug Fixes** - Updated internal handling of partition column aliases to ensure accurate retrieval of partition data from BigQuery tables.   --------- Co-authored-by: Thomas Chow <[email protected]>

## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Refactor** - Improved handling of Google Cloud Storage (GCS) artifact locations by requiring a full artifact prefix URI instead of relying on internal customer ID logic. All GCS interactions now use this provided prefix, allowing for more flexible and centralized configuration.   --------- Co-authored-by: Thomas Chow <[email protected]>

#701) ## Summary - For bigquery views, there won't be an explicit partition column on the table. Let's just use the same implementation to list primary part columns. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Improved partition handling for BigQuery tables, allowing direct retrieval of distinct partition values. - **Bug Fixes** - Added clear error handling for unsupported sub-partition filtering in BigQuery partition queries.   --------- Co-authored-by: Thomas Chow <[email protected]>

## Summary Based on our conversation with the BigTable team, it seems like using the Batcher implementation isn't what they recommend. It's primarily used for flow control and doesn't really help very much to use it. This PR yanks out that code to make the BT implementation easier to read and reason about. ## Checklist - [ ] Added Unit Tests - [X] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Refactor** - Simplified multi-get operations by removing the bulk read batcher logic and related configuration options. - Consolidated multi-get requests to use a single, consistent approach. - **Tests** - Streamlined test setup by removing parameterized tests and updating mocking strategies to match the new implementation. - Removed unused helper methods and imports for cleaner test code.

## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Improved handling of Google Cloud Storage bucket selection for file uploads, now automatically using the appropriate warehouse bucket for each customer.   --------- Co-authored-by: Thomas Chow <[email protected]>

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala

…QueryCatalogTest.scala

## Summary Pull in PRs - airbnb/chronon#964 and airbnb/chronon#932. We hit issues related to 964 in some of our tests at Etsy - groupByServingInfo lookups against BT timed out and we end up caching the failure response. 964 addresses this and it depends on 932 so pulling that in as well. ## Checklist - [ ] Added Unit Tests - [X] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Improved error handling and reporting for partial failures in join operations and key-value store lookups. - Enhanced cache refresh mechanisms for join configurations and metadata, improving system robustness during failures. - Added a configurable option to control strictness on invalid dataset references in the in-memory key-value store. - **Bug Fixes** - Exceptions and partial failures are now more accurately surfaced in fetch responses, ensuring clearer diagnostics for end-users. - Updated error key naming for consistency in response maps. - **Tests** - Added a new test to verify correct handling and reporting of partial failures in key-value store operations.

…ssion status (#697) ## Summary This is needed for agent to be able to track status of submitted jobs and report them back to the orchestration service ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Added support for specifying a custom cluster name when submitting EMR jobs. - **Improvements** - Scaling factors for auto-scaling now support decimal values, allowing more precise scaling adjustments. - Job status methods now return status as a string, making it easier to programmatically track job progress and errors.

## Summary - In BigQuery, we have views and tables that can be native. For native tables we can partition list through the information schema. We cannot do the same for views. We should take two different approaches for partition listing for tables and views.In order to do this, we'll do a blind test - first check the information schema and if we can't get a partition column out of that, we'll just do a blind `select distinct(...)`. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update   ## Summary by CodeRabbit - **Bug Fixes** - Improved error handling for missing partition columns, providing clearer error messages and a more robust fallback method for retrieving partition values. - **Refactor** - Centralized the handling of missing partition columns for more consistent behavior across the application.  --------- Co-authored-by: Thomas Chow <[email protected]>

## Summary Putting this up again - #684 ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Added support for the Avro logical type `timestamp-millis` in schema and value conversions, enabling better handling of timestamp fields. - Enhanced BigQuery integration with a new test to verify correct timestamp conversions based on configuration settings. - **Documentation** - Added detailed comments explaining the mapping behavior of timestamp types and relevant configuration flags. - **Refactor** - Improved logging structure for serialized object size calculations for better readability. - Minor formatting and consistency improvements in test assertions. - **Style** - Removed unnecessary trailing whitespace for cleaner code.

## Summary Putting this up again - #684 ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Added support for the Avro logical type `timestamp-millis` in schema and value conversions, enabling better handling of timestamp fields. - Enhanced BigQuery integration with a new test to verify correct timestamp conversions based on configuration settings. - **Documentation** - Added detailed comments explaining the mapping behavior of timestamp types and relevant configuration flags. - **Refactor** - Improved logging structure for serialized object size calculations for better readability. - Minor formatting and consistency improvements in test assertions. - **Style** - Removed unnecessary trailing whitespace for cleaner code.

nikhil-zlai and others added 30 commits February 21, 2025 15:59

refactor: fetcher sub package + kill old stats in fetcher (#423)

45bae2f

Overhaul and consolidate entity types/config. Use expansion panels fo…

5e58032

…r drilldown referential entities (sources, joinParts, etc) (#433)

tchow-zlai and others added 9 commits April 28, 2025 08:48

Add new test

b93e84e

more changes

59ff26f

fix

0f799d7

Merge branch 'main' into davidhan/support_timestamp_gbu

20eb466

perf: resolve schema only once and cache (#696)

df7673a

Merge branch 'main' into davidhan/support_timestamp_gbu

b8ff824

david-zlai requested a review from tchow-zlai April 29, 2025 08:46

tchow-zlai and others added 5 commits April 29, 2025 10:59

david-zlai commented Apr 30, 2025

View reviewed changes

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala Outdated Show resolved Hide resolved

david-zlai and others added 6 commits April 30, 2025 12:59

Update cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/Big…

452fa17

…QueryCatalogTest.scala

Merge branch 'main' into davidhan/support_timestamp_gbu

f63c534

Merge branch 'main' into davidhan/support_timestamp_gbu

3f808c8

kumar-zlai closed this May 1, 2025

kumar-zlai force-pushed the main branch from a969d44 to e6f8822 Compare May 1, 2025 05:47

david-zlai mentioned this pull request May 5, 2025

Add timestamp type support #733

Merged

4 tasks

david-zlai deleted the davidhan/support_timestamp_gbu branch May 12, 2025 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add TimestampType support for when converting to Avro from chronon type #684

Add TimestampType support for when converting to Avro from chronon type #684

Uh oh!

david-zlai commented Apr 23, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

Uh oh!

Uh oh!

Add TimestampType support for when converting to Avro from chronon type #684

Add TimestampType support for when converting to Avro from chronon type #684

Uh oh!

Conversation

david-zlai commented Apr 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

Uh oh!

Uh oh!

david-zlai commented Apr 23, 2025 •

edited by coderabbitai bot

Loading