Skip to content

Add TimestampType support for when converting to Avro from chronon type #684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 683 commits into from

Conversation

david-zlai
Copy link
Contributor

@david-zlai david-zlai commented Apr 23, 2025

Summary

Tested the cdc entity source GBU job from etsy and it passed. But would probably need to test if this works on the fetch side

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • New Features

    • Added support for TimestampType when converting Chronon data types to Avro schemas, enabling proper representation of timestamp values.
    • Enhanced Avro serialization to correctly handle timestamp fields with millisecond precision.
  • Bug Fixes

    • Added a test to verify accurate timestamp conversion in BigQuery integration under different Java time API configurations.
  • Documentation

    • Clarified mapping behavior of TimestampType with respect to Java 8 time API usage in detailed comments.
  • Style

    • Improved code formatting and indentation in logging statements.
    • Removed unnecessary blank lines for cleaner code structure.

nikhil-zlai and others added 30 commits February 21, 2025 15:59
## Summary
building join output schema should belong to metadata store - and also
reduces the size of fetcher.

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced an optimized caching mechanism for data join operations,
resulting in improved performance and reliability.
- Added new methods to facilitate the creation and management of join
codecs.
  
- **Bug Fixes**
- Enhanced error handling for join codec operations, ensuring clearer
context for failures.
  
- **Documentation**
- Improved code readability and clarity through updated comments and
method signatures.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

This is a hodge-podge of various improvements to the frontend:
- [Collapsible sections arrow
updates](https://app.asana.com/0/0/1209437960912812)
- [Use Args version in all thrift
types](https://app.asana.com/0/0/1209410918917485)
    - lots of code changes, but typescript should still be happy :)
- [prefer relative imports instead of
../](https://app.asana.com/0/0/1209445322143679)
- [Details page expand arrow color is hard to
see](https://app.asana.com/0/0/1209445322143673)
- [Link to entity pages](https://app.asana.com/0/0/1209411287539655)
- [Show icons similar to lineage (with
color)](https://app.asana.com/0/0/1209411287539654)
- [Dialog modals should track to a
URL](https://app.asana.com/0/0/1209343349099309)
- I'd focus testing on this, making sure that the dialogs on
/observability/drift and /overview persist on reload, work with the back
button, clear query params on close/back, etc etc.
- [dont sort series z-a](https://app.asana.com/0/0/1209445333142030)
- [All chart axis/styles should be
consistent](https://app.asana.com/0/1209162816962686/1209402326561211)

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced expandable cells with new properties for improved
functionality.
- Introduced new functions for processing series data and extracting
columns.
- Improved URL state management on job tracking, observability, and
overview pages, allowing more intuitive navigation.

- **Style**
- Updated icon rotation in collapsible sections for clearer visual cues.
  - Refined cursor styling on dropdown items to enhance interactivity.

- **Refactor**
  - Streamlined chart data processing for smoother performance.
- Consolidated configuration displays for a cleaner, more responsive
user experience.
- Updated type definitions across various components for better clarity
and consistency.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
…`parseJson` support (#426)

## Summary

- Resolves the performance regression in later versions (we downgraded
to `0.0.2` as a workaround)
- Supports single clicking on label to expand/collapse
- Adds new `parseJson` prop to parse `metaData.customJson` and
`metadata.dependencies`
([Asana](https://app.asana.com/0/home/1208932362205799/1209446053233125))

Uploading CleanShot 2025-02-24 at 11.12.58.mp4…

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update


<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

Co-authored-by: Sean Lynch <[email protected]>
## Summary


https://github.com/user-attachments/assets/c0e2ddc7-1ebb-40c1-a541-281153c8f1a9


## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
  - https://app.asana.com/0/0/1209437960912821
  - https://app.asana.com/0/0/1209445122069294
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added a dedicated chart legend displaying distinct "Streaming" and
"Batch" items for improved clarity.
  
- **Refactor**
- Updated the main chart layout to utilize a grid structure for better
organization of elements.
  - Streamlined tooltip rendering to enhance visual consistency.

- **Chores**
- Updated the `layerchart` dependency to version `^0.99.4` for potential
improvements and new features.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Sean Lynch <[email protected]>
## Summary
Add support to run the fetcher service in docker. Also add rails to
publish to docker hub as a private image -
[ziplineai/chronon-fetcher](https://hub.docker.com/repository/docker/ziplineai/chronon-fetcher)

I wasn't able to sort out logback / log4j2 logging as there's a lot of
deps messing things up - Vert.x supports JUL configs and that is
seemingly working so starting with that for now.

Tested with:
```
docker run -v ~/.config/gcloud/application_default_credentials.json:/gcp/credentials.json \
 -p 9000:9000 \
 -e "GCP_PROJECT_ID=canary-443022" \
 -e "GOOGLE_CLOUD_PROJECT=canary-443022" \
 -e "GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance" \
 -e "STATSD_HOST=127.0.0.1" \
 -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/credentials.json \
 ziplineai/chronon-fetcher
```

And then you can `curl http://localhost:9000/ping`

On Etsy side just need to swap out the project and bt instance id and
then can curl the actual join:
```
curl -X POST http://localhost:9000/v1/fetch/join/search.ranking.v1_web_zipline_cdc_and_beacon_external -H 'Content-Type: application/json' -d '[{"listing_id":"632126370","shop_id":"53908089","shipping_profile_id":"235561688531"}]'
{"results":[{"status":"Success","entityKeys":{"listing_id":"632126370","shop_id":"53908089","shipping_profile_id":"235561688531"},"features":{...
```

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [X] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added an automation script that streamlines the container image build
and publication process with improved error handling.
- Introduced a new container configuration that installs essential
dependencies, sets environment variables, and incorporates a health
check for enhanced reliability.
- Implemented a robust logging setup that standardizes console and file
outputs with log rotation.
- Provided a startup script for the service that verifies required
settings and applies platform-specific options for seamless execution.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

@ken-zlai missed removing the pointer events on the `<Svg>` after
[simplifying](84e2292)
(they were ignored on the `<Chart>` container). This fixes clicking on a
node to open the drilldown dialog.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update


<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

Co-authored-by: Sean Lynch <[email protected]>
## Summary

Adds the ability to push artifacts to aws in addition to gcp. Also adds
ability to specify specific customer ids to push to.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a new automation script that streamlines the process of
building artifacts and deploying them to both AWS and GCP with improved
error handling and user confirmation.

- **Chores**
- Removed a legacy artifact upload script that previously handled only
GCP deployments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

- Supporting StagingQueries for configurable compute engines. To support
BigQuery, the simplest way is to just write bigquery sql and run it on
bq to create the final table. Let's first make the API change.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Added an option for users to specify the compute engine when
processing queries, offering choices such as Spark and BigQuery.
- Introduced validation to ensure that queries run only with the
designated engine.

- **Style**
  - Streamlined code organization for enhanced readability.
  - Consolidated and reordered import statements for improved clarity.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary
fetcher has grown over time into a large file with many large functions
that are hard to work with. This refactoring doesn't change any
functionality - just placement.

Made some of the scala code more idiomatic - if(try.isFailed) - vs
try.recoverWith
Made Metadata methods more explicit
FetcherBase -> JoinPartFetcher + GroupByFetcher + GroupByResponseHandler
Added fetch context - to replace 10 constructor params


## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
- Introduced a unified configuration context that enhances data
fetching, including improved group-by and join operations with more
robust error handling.
- Added a new `FetchContext` class to manage fetching operations and
execution contexts.
- Implemented a new `GroupByFetcher` class for efficient group-by data
retrieval.
- **Refactor**
- Upgraded serialization and deserialization to use a more efficient,
compact protocol.
- Standardized API definitions and type declarations across modules to
improve clarity and maintainability.
- Enhanced error handling in various methods to provide more informative
messages.
- **Chores**
	- Removed outdated utilities and reorganized dependency imports.
	- Updated test suites to align with the refactored architecture.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

- Staging query should in theory already work for external tables
without additional code changes as long as we do some setup work to pin
up a view first.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary
The existing aggregations configure the items sketch incorrectly. Split
it into two one that works purely with skewed data, and one that tries
to best-effort collect most frequent items.

## Checklist
- [x] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced new utility functions to streamline expression composition
and cleanup.
  - Enhanced aggregation descriptions for clearer operation choices.
  - Added new aggregation types for improved data analysis.

- **Refactor**
- Revamped frequency analysis logic with improved error handling and
optimized sizing.
- Replaced legacy histogram approaches with a more robust frequent item
detection mechanism.

- **Tests**
- Added tests to validate heavy hitter detection and skewed data
scenarios, while removing obsolete histogram tests.
  - Updated existing tests to reflect changes in aggregation parameters.

- **Chores**
  - Removed deprecated interactive modules for a leaner deployment.

- **Configuration**
- Adjusted default aggregation parameters for more consistent
processing, including changes to the `k` value in multiple
configurations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

-
https://zipline-2kh4520.slack.com/archives/C087YSYJ5NZ/p1739480276997719?thread_ts=1739218764.927269&cid=C087YSYJ5NZ
- https://app.asana.com/0/1208811056393706/1209448137185580

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a dataset generation tool that creates sample data using
Apache Spark and Hudi.
- Generates random entries representing user and device characteristics,
and structures the data with partitioning for efficient storage.
- Supports integration with cloud storage and Hive synchronization for
streamlined data management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary
Updates workflow push-to-canary to use Bazel instead of sbt.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [x ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Modernized the build and deployment pipeline to improve efficiency and
reliability.
- Updated the container configuration to use a refined artifact
structure, supporting smoother image creation.
- Enhanced caching steps to further streamline the release workflow and
optimize build times.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Add a couple of APIs to help with the Etsy Patina integration. One is to
list out all online joins and the second is to retrieve the join schema
details for a given Join.

As part of wiring up list support, I tweaked a couple of properties like
the list pagination key / list call limit to make things consistent
between DynamoDB and BigTable.

For the BT implementation we issue a range query under the 'joins/'
prefix. Subsequent calls (in case of pagination) continue off this range
(verified this via unit tests and also basic sanity checks on Etsy).

APIs added are:
* /v1/joins -> Return the list of online joins
* /v1/join/schema/join-name -> Return a payload consisting of
{"joinName": "..", "keySchema": "avro schema", "valueSchema": "avro
schema", "schemaHash": "hash"}

Tested by dropping the docker container and confirming things on the
Etsy side:
```
$ curl http://localhost:9000/v1/joins                                                                                                                                              
{"joinNames":["search.ranking.v1_web_zipline_cdc_and_beacon_external" ...}
```

And
```
curl http://localhost:9000/v1/join/schema/search.ranking.v1_web_zipline_cdc_and_beacon_external
{ big payload }
```

## Checklist
- [X] Added Unit Tests
- [ ] Covered by existing CI
- [X] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced new API endpoints that let users list available joins and
retrieve detailed join schema information.
- Added enhanced configuration options to support complex join
workflows.
- New test cases for validating join listing and schema retrieval
functionalities.
  - Added new constants for pagination and entity type handling.

- **Improvements**
- Standardized pagination and entity handling across cloud integrations,
ensuring a consistent and reliable data listing experience.
- Enhanced error handling and response formatting for join-related
requests.
- Expanded testing capabilities with additional dependencies and
resource inclusion.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
#398 updated the module path from `"/"` to `"."`, but not all code was
migrated to the new convention, causing frontend API calls to fail when
retrieving joins.

@david-zlai – Can you review the code to ensure it fully aligns with the
new convention?
@sean-zlai – Can you tear down all Docker images and rebuild on this
branch to confirm observability works as expected?

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Streamlined how configuration names are handled in observability
views. Names are now displayed as originally provided without extra
formatting, ensuring a consistent and straightforward presentation. The
fallback label remains “Unknown” when a name is not available.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

- Everywhere else we want to handle partitions that could be non-string
types. This is similar to the change in:
https://github.com/zipline-ai/chronon/blob/3d2e77da18e8fa81a5471935a7358937ed8f9f13/cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala#L122-L128

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced partition date display by introducing configurable date
formatting.
- Partition dates are now consistently formatted based on user
configuration, ensuring reliable and predictable output across the
system.
- Improved retrieval of partition format for BigQuery operations,
allowing for broader usage across different packages.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
…r drilldown referential entities (sources, joinParts, etc) (#433)
## Summary
Enable batch IR caching by default & fix an issue where our Vertx init
code tries to connect to BT at startup and takes a second or two on the
worker threads (and results in the warning - 'Thread
Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 2976 ms,
time limit is 2000 ms').

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [X] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Streamlined caching configuration and logic with a consistent default
setting for improved behavior.
- Enhanced service startup by shifting to asynchronous initialization
with better error handling for a more robust launch.

- **Tests**
- Removed an outdated test case that validated previous caching
behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Upgraded core styling dependencies and configuration to leverage the
latest standards.
- **Style**
- Refined visual elements across the interface for enhanced consistency
and responsiveness.
- Adjusted focus outlines, shadows, and layout utilities on various
interactive components.
- Enhanced grid and backdrop effects to improve overall visual clarity.

These updates deliver a smoother and more polished experience without
changing the underlying functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
This PR allows the frontend to specify which percentiles it retrieves
from the backend. The percentiles can be passed as a query parameter:

```
percentiles=p0,p10,p90
```
If omitted, the default percentiles are used:  
```
percentiles=p5,p50,p95
```

### Example Requests *(App must be running)*  

#### Default (uses `p5,p50,p95`)  
```sh
curl "http://localhost:5173/api/v1/join/risk.user_transactions.txn_join/column/txn_by_user_transaction_amount_count_1h/summary?startTs=1672531200000&endTs=1677628800000"
```

#### Equivalent Explicit Default  
```sh
curl "http://localhost:5173/api/v1/join/risk.user_transactions.txn_join/column/txn_by_user_transaction_amount_count_1h/summary?startTs=1672531200000&endTs=1677628800000&percentiles=p5,p50,p95"
```

#### Custom Percentiles (`p0,p10,p90`)  
```sh
curl "http://localhost:5173/api/v1/join/risk.user_transactions.txn_join/column/txn_by_user_transaction_amount_count_1h/summary?startTs=1672531200000&endTs=1677628800000&percentiles=p0,p10,p90"
```

### Notes  
- Omitting the `percentiles` parameter is the same as explicitly setting
`percentiles=p5,p50,p95`.
- You can test using `curl` or Postman.  
- We need to let users change these percentiles via checkboxes or
another UI control.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for customizable percentile parameters in summary data
requests, with a default setting of "p5, p50, p95".
- Enhanced the ability to retrieve detailed statistical summaries by
allowing users to specify percentile values when querying data.
  - Introduced two new optional dependencies for improved functionality.

- **Bug Fixes**
- Adjusted method signatures to ensure compatibility with the new
percentile parameters in various components.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

Just released 1.0 releases for LayerChart, LayerStack, and Svelte UX
packages which have Svelte 3-5 and Tailwind 3 compatibility
([announcement](https://bsky.app/profile/techniq.dev/post/3lj3sjabpns2z))
(LayerChart 1.0 is also 98% Tailwind 4
[compatibility](techniq/layerchart#388)).

This mostly paves the way to start on `next` branches / `2.0` releases
which will leverage Svelte 5 runes/snippets and Tailwind 4 (along with a
lot of other planned features).

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update


<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

Co-authored-by: Sean Lynch <[email protected]>
## Summary
I noticed we were missing the core chronon fetcher logs during feature
lookup requests. As we anyway wanted to rip out the JUL & logback, I
went ahead and dropped those for a log4j2 properties file.

Confirmed that I am seeing the relevant fetcher logs from classes like
the SawtoothOnlineAggregator etc when I hit the service with a feature
look up request.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [X] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Consolidated service deployment paths and streamlined startup
configuration.
- Improved metrics handling by conditionally enabling reporting based on
environment settings.

- **Chores**
  - Optimized resource packaging and removed legacy dependencies.
- Upgraded logging configuration to enhance performance and log
management.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Added Team, Online, Production and renamed LogicalNodeTable ->
EntityTable

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced the table view with new columns for "Team," "Online," and
"Production" data.
	- Added visual badges to clearly indicate boolean statuses.
  
- **Refactor**
- Replaced the `LogicalNodeTable` component with the updated
`EntityTable` component to improve data presentation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
#438)

## Summary

1. added offset and bound support to staging query macros `{{ start_date
}}` is valid as before, now `{{ start_date(offset=-10,
lower_bound='2023-01-01') }}` is also valid

2. Previously we required users to pass in quotes around the macro
separately. This pr removes the need for it
`{{ start_date }}` used to become `2023-01-01`, it now becomes
`'2023-01-01'`

2. added a unified top level module `api.chronon.types` that contain
everything that users need.

3. added wrappers on source sub types to directly return sources 
```py
ttypes.Source(events=ttypes.EventSource(...))

# now becomes
EventSource(...)
```

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added new functions for creating event, entity, and join data sources.
- Introduced enhanced date macro utilities to enable flexible SQL query
substitutions.

- **Refactor**
- Streamlined naming conventions and standardized parameter formatting.
- Consolidated and simplified import structures for improved
consistency.
- Updated method signatures and calls from `select` to `selects` across
various components.
- Removed reliance on `ttypes` for source definitions and standardized
parameter naming conventions.
  - Simplified macro substitution logic in the `StagingQuery` object.

- **Tests**
- Implemented comprehensive tests for date manipulation features to
ensure robust behavior.
- Updated existing tests to reflect changes in method names and query
formatting.
- Adjusted data generation parameters in tests to increase transaction
volumes.

- **Documentation**
- Updated configuration descriptions to clearly illustrate new date
template options and parameter adjustments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
cleaning up top level dir 

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Refined version control and build settings by updating ignored paths
and tool versions.
- Removed obsolete internal configurations, tooling, and Docker build
files for a cleaner project structure.
- **Documentation**
  - Updated installation guidance links for clearer setup instructions.
- Eliminated legacy contributor, governance, and quickstart guides to
reduce clutter.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
No turning back now

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Removed legacy internal components from workflow orchestration and
task management to streamline operations.
- **Documentation**
  - Updated deployment guidance by removing outdated references.

These internal improvements enhance maintainability and performance
without altering your current user experience.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
move OSS docsite release scripts

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Made behind‑the‑scenes updates to streamline our internal release
management processes.

There are no visible changes to functionality for end-users in this
release.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Consolidated and streamlined build dependencies for improved
integration with AWS services and data processing libraries.
- Expanded the set of supported third-party libraries, including new
artifacts for enhanced performance and compatibility.
- Added new dependencies for Hudi, Jackson, and Zookeeper to enhance
functionality.
- Introduced additional Hudi artifacts for Scala 2.12 and 2.13 to
broaden available functionalities.

- **Tests**
- Added a new test class to verify reliable write/read operations on
Hudi tables using a Spark session.

- **Refactor**
- Enhanced serialization registration to support a broader range of data
types, improving overall processing stability.
- Introduced a new variable for shared library dependencies to simplify
dependency management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary

- Improve code organization
  - rename `ActionButtons` to `SortButton` (only remaining use case)
  - move `PageHeader` out of `EntityTable` and into page
  - remove `Entity.ts` nesting
- Update `Learn` button href based on conf/entity type (ex. `Join` goes
to applicable [page](https://chronon.ai/authoring_features/Join.html)
for both list and individual entity)
- Rename `Distributions` to `Summary` tab/route (feedback from meeting,
and uses `getColumnSummary()` API)
- Sort default metric (PSI) first
- Fix resetting zoom by clicking on chart (without drag)

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- **Chart Controls:** Now feature a streamlined interface with a
dedicated sort button replacing extra action buttons.
- **Drift Metrics:** Display metrics in a sorted order with the default
metric prioritized.
- **Entity Tables:** Automatically derive header labels for clearer data
presentation.
- **Page Headers:** Now support an optional "Learn More" link to guide
users to additional resources.
- **Navigation & Summary Views:** Updated tabs and layouts now emphasize
"Summary" data with column summaries.
- **Entity Configurations:** Enhanced to include resource links for
further learning.

- **Bug Fixes**
- Improved error handling in data retrieval processes for better user
experience.

- **Documentation**
- Updated documentation to reflect changes in entity configurations and
navigation structures.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Sean Lynch <[email protected]>
tchow-zlai and others added 9 commits April 28, 2025 08:48
…partitions (#690)

## Summary

- Getting a 403 querying for a range of partitions in bigquery native
tables:
```
Response too large to return. Consider specifying a destination table in your job configuration
```
- instead, let's just query individual partitions of data as separate
dataframes and union them together.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
- Improved handling of BigQuery partitioned tables, ensuring more
accurate partition filtering and data retrieval.

- **Refactor**
- Streamlined the process for reading partitioned data from BigQuery,
resulting in a clearer and more consistent approach for users working
with partitioned tables.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Ensured the order of keys in query selections is preserved as provided
by the user.

- **Style**
- Improved formatting and spacing for better readability without
affecting functionality.
- Enhanced ordering consistency in dependency metadata for stable JSON
outputs.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

Disabling analyzer checks

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Disabled the join configuration validation step before starting join
jobs.
- Updated time range calculation logic for certain join scenarios to
improve consistency.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
@david-zlai david-zlai requested a review from tchow-zlai April 29, 2025 08:46
tchow-zlai and others added 5 commits April 29, 2025 10:59
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for handling BigQuery views when loading tables,
improving compatibility with a wider range of BigQuery table types.
- **Bug Fixes**
- Updated internal handling of partition column aliases to ensure
accurate retrieval of partition data from BigQuery tables.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Improved handling of Google Cloud Storage (GCS) artifact locations by
requiring a full artifact prefix URI instead of relying on internal
customer ID logic. All GCS interactions now use this provided prefix,
allowing for more flexible and centralized configuration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
#701)

## Summary

- For bigquery views, there won't be an explicit partition column on the
table. Let's just use the same implementation to list primary part
columns.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Improved partition handling for BigQuery tables, allowing direct
retrieval of distinct partition values.
- **Bug Fixes**
- Added clear error handling for unsupported sub-partition filtering in
BigQuery partition queries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary
Based on our conversation with the BigTable team, it seems like using
the Batcher implementation isn't what they recommend. It's primarily
used for flow control and doesn't really help very much to use it. This
PR yanks out that code to make the BT implementation easier to read and
reason about.

## Checklist
- [ ] Added Unit Tests
- [X] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Simplified multi-get operations by removing the bulk read batcher
logic and related configuration options.
	- Consolidated multi-get requests to use a single, consistent approach.

- **Tests**
- Streamlined test setup by removing parameterized tests and updating
mocking strategies to match the new implementation.
	- Removed unused helper methods and imports for cleaner test code.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Improved handling of Google Cloud Storage bucket selection for file
uploads, now automatically using the appropriate warehouse bucket for
each customer.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
david-zlai and others added 6 commits April 30, 2025 12:59
## Summary
Pull in PRs - airbnb/chronon#964 and
airbnb/chronon#932. We hit issues related to 964
in some of our tests at Etsy - groupByServingInfo lookups against BT
timed out and we end up caching the failure response. 964 addresses this
and it depends on 932 so pulling that in as well.

## Checklist
- [ ] Added Unit Tests
- [X] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Improved error handling and reporting for partial failures in join
operations and key-value store lookups.
- Enhanced cache refresh mechanisms for join configurations and
metadata, improving system robustness during failures.
- Added a configurable option to control strictness on invalid dataset
references in the in-memory key-value store.

- **Bug Fixes**
- Exceptions and partial failures are now more accurately surfaced in
fetch responses, ensuring clearer diagnostics for end-users.
	- Updated error key naming for consistency in response maps.

- **Tests**
- Added a new test to verify correct handling and reporting of partial
failures in key-value store operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
…ssion status (#697)

## Summary
This is needed for agent to be able to track status of submitted jobs
and report them back to the orchestration service

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added support for specifying a custom cluster name when submitting EMR
jobs.

- **Improvements**
- Scaling factors for auto-scaling now support decimal values, allowing
more precise scaling adjustments.
- Job status methods now return status as a string, making it easier to
programmatically track job progress and errors.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

- In BigQuery, we have views and tables that can be native. For native
tables we can partition list through the information schema. We cannot
do the same for views. We should take two different approaches for
partition listing for tables and views.In order to do this, we'll do a
blind test - first check the information schema and if we can't get a
partition column out of that, we'll just do a blind `select
distinct(...)`.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
- Improved error handling for missing partition columns, providing
clearer error messages and a more robust fallback method for retrieving
partition values.
  
- **Refactor**
- Centralized the handling of missing partition columns for more
consistent behavior across the application.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Thomas Chow <[email protected]>
@kumar-zlai kumar-zlai closed this May 1, 2025
@david-zlai david-zlai mentioned this pull request May 5, 2025
4 tasks
david-zlai added a commit that referenced this pull request May 5, 2025
## Summary

Putting this up again - #684

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added support for the Avro logical type `timestamp-millis` in schema
and value conversions, enabling better handling of timestamp fields.
- Enhanced BigQuery integration with a new test to verify correct
timestamp conversions based on configuration settings.

- **Documentation**
- Added detailed comments explaining the mapping behavior of timestamp
types and relevant configuration flags.

- **Refactor**
- Improved logging structure for serialized object size calculations for
better readability.
  - Minor formatting and consistency improvements in test assertions.

- **Style**
  - Removed unnecessary trailing whitespace for cleaner code.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@david-zlai david-zlai deleted the davidhan/support_timestamp_gbu branch May 12, 2025 19:36
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

Putting this up again - #684

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added support for the Avro logical type `timestamp-millis` in schema
and value conversions, enabling better handling of timestamp fields.
- Enhanced BigQuery integration with a new test to verify correct
timestamp conversions based on configuration settings.

- **Documentation**
- Added detailed comments explaining the mapping behavior of timestamp
types and relevant configuration flags.

- **Refactor**
- Improved logging structure for serialized object size calculations for
better readability.
  - Minor formatting and consistency improvements in test assertions.

- **Style**
  - Removed unnecessary trailing whitespace for cleaner code.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

Putting this up again - #684

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added support for the Avro logical type `timestamp-millis` in schema
and value conversions, enabling better handling of timestamp fields.
- Enhanced BigQuery integration with a new test to verify correct
timestamp conversions based on configuration settings.

- **Documentation**
- Added detailed comments explaining the mapping behavior of timestamp
types and relevant configuration flags.

- **Refactor**
- Improved logging structure for serialized object size calculations for
better readability.
  - Minor formatting and consistency improvements in test assertions.

- **Style**
  - Removed unnecessary trailing whitespace for cleaner code.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 16, 2025
## Summary

Putting this up again - #684

## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added support for the Avro logical type `timestamp-millis` in schema
and value conversions, enabling better handling of timestamp fields.
- Enhanced BigQuery integration with a new test to verify correct
timestamp conversions based on configuration settings.

- **Documentation**
- Added detailed comments explaining the mapping behavior of timestamp
types and relevant configuration flags.

- **Refactor**
- Improved logging structure for serialized object size calculations for
better readability.
  - Minor formatting and consistency improvements in test assertions.

- **Style**
  - Removed unnecessary trailing whitespace for cleaner code.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants