feat: stagingquery on GCP #225

tchow-zlai · 2025-01-16T09:15:01Z

Summary

This is a shorter-term hacky-ish workaround to support staging queries for BigQuery.
Enables support for it without any API changes by inferring from the project-ID

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1209163611405559

Summary by CodeRabbit

New Features
- Enhanced SQL query execution with improved data source handling.
- Added support for flexible BigQuery query processing.
- Introduced a new BigQueryStagingQuery class for staging queries.
Bug Fixes
- Improved error handling for SQL query parsing and execution.
- Added better warning logging for potential query redefinition scenarios.

coderabbitai · 2025-01-16T09:15:09Z

Walkthrough

The pull request modifies the sql method in TableUtils to enhance query handling for different data sources, particularly focusing on BigQuery. The new implementation parses SQL queries to identify unresolved relations, checks their read format, and dynamically adjusts session configurations when BigQuery-related sources are detected. This approach provides more flexible query execution and improved error handling for diverse data source scenarios.

Changes

File	Change Summary
`spark/src/main/scala/ai/chronon/spark/TableUtils.scala`	- Replaced direct SQL execution with logical plan parsing - Added detection of `UnresolvedRelation` instances - Implemented dynamic configuration for BigQuery formats - Enhanced error handling for query execution
`cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryStagingQuery.scala`	- Introduced `BigQueryStagingQuery` class - Overrode `computeStagingQuery` method with optional parameters

Possibly related PRs

feat: StagingQuery param #406: Introduces BigQueryStagingQuery class that utilizes TableUtils, indicating a direct connection in functionality.
chore: small cleanup of TableUtils #153: Involves significant changes to the TableUtils class, indicating a direct connection at the code level.
feat: GCS format support #147: Both involve handling BigQuery formats and utilize the TableUtils class, indicating a direct connection at the code level.

Suggested reviewers

nikhil-zlai
piyush-zlai
varant-zlai
david-zlai

Poem

🔍 In the realm of Spark and data's dance,
A query transforms with newfound stance.
BigQuery whispers, relations unfold,
Configurations brave and bold!
Code evolves, flexibility reigns supreme! 🚀

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between d8d1e08 and 34e85e7.

📒 Files selected for processing (1)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2 hunks)

coderabbitai · 2025-01-16T10:54:56Z

spark/src/main/scala/ai/chronon/spark/TableUtils.scala

+      val df = if (formats.map { case (_, fmtName) => fmtName.toUpperCase }.contains("BIGQUERY")) {
+        val firstDataset = formats
+          .collectFirst {
+            case (mpi, _) => mpi(1)


⚠️ Potential issue

Ensure safe access to mpi(1)

Accessing mpi(1) may cause an IndexOutOfBoundsException if mpi has fewer than two elements. Verify that mpi always has at least two elements before accessing it.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 34e85e7 and 318246c.

📒 Files selected for processing (1)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: other_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: join_spark_tests
GitHub Check: scala_compile_fmt_fix

🔇 Additional comments (2)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2)

40-44: LGTM!

The new imports support SQL parsing functionality.

411-440: ⚠️ Potential issue

Fix potential array index out of bounds.

The access to mpi(1) at line 428 could throw an IndexOutOfBoundsException if mpi has fewer than 2 elements.
-            case (mpi, _) if mpi.size == 3 => mpi(1)
+            case (mpi, _) if mpi.size >= 3 => mpi(1)
Verify BigQuery table name resolution.

The code assumes table names will resolve to valid BigQuery datasets. Add validation for table name format.

spark/src/main/scala/ai/chronon/spark/TableUtils.scala

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)
411-438: ⚠️ Potential issue

Enhanced query execution with format-specific handling.

The new implementation:

Parses query to LogicalPlan

Identifies table formats

Applies BigQuery-specific configuration when needed

However, there's a potential NPE when accessing mpi(1).

Add null check:
- case (mpi, _) if mpi.size == 3 => mpi(1)
+ case (mpi, _) if mpi.size >= 3 => mpi(1)

🧹 Nitpick comments (1)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)

424-424: Address TODO comment.

Consider using a configuration value for global materializationDataset.

Would you like me to implement a configuration-based solution for the global materializationDataset?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 318246c and 17641f0.

📒 Files selected for processing (2)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: other_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: join_spark_tests
GitHub Check: scala_compile_fmt_fix

🔇 Additional comments (3)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (1)

40-42: Configuration handling simplified.

Direct configuration setting is cleaner than the previous save/restore pattern.
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2)

40-42: New imports for query parsing.

Added necessary imports for logical plan analysis.

425-437: Verify BigQuery format handling across codebase.

Let's ensure consistent BigQuery configuration handling.

✅ Verification successful

BigQuery configuration is correctly implemented

The different materializationDataset sources are intentional:

Ad-hoc queries use format-provided dataset

Partition queries use table's dataset

Write operations use configured output dataset
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Find all BigQuery format usage
rg -l 'format\s*=\s*"bigquery"' 

# Find viewsEnabled configuration
rg 'viewsEnabled'

# Find materializationDataset usage
rg 'materializationDataset'
Length of output: 1041

Script:
#!/bin/bash
# Check BigQueryFormat implementation
cat cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala

# Check GcpFormatProvider implementation
cat cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala
Length of output: 7471

nikhil-zlai · 2025-01-17T00:06:52Z

spark/src/main/scala/ai/chronon/spark/TableUtils.scala

+      val logicalPlan: LogicalPlan = sparkSession.sessionState.sqlParser.parsePlan(query)
+
+      // Load tables through DataPointer resolution.
+      val formats = logicalPlan
+        .collect {
+          case ur @ UnresolvedRelation(mpi, _, _) => {
+            DataPointer.from(ur.tableName, sparkSession).readFormat match {
+              case Some(fmtName) => (mpi, fmtName)
+            }
+          }
+        }
+
+      // todo(tchow): Use a global materializationDataset value.
+      val df = if (formats.map { case (_, fmtName) => fmtName.toUpperCase }.contains("BIGQUERY")) {
+        val firstDataset = formats
+          .collectFirst {
+            case (mpi, _) if mpi.size == 3 => mpi(1)
+          }
+          .getOrElse(throw new IllegalArgumentException("Could not find candidate dataset for materializationDataset"))
+        val df = sparkSession.read
+          .option("viewsEnabled", true.toString)
+          .option("materializationDataset", firstDataset)
+          .format("bigquery")
+          .load(query)
+        df
+      } else sparkSession.sql(query)
+      df.coalesce(partitionCount)


break this out into a function checkAndRunBigQuery

ack will do

nikhil-zlai

actually this will trigger bigquery run on even groupBy selects which is probably not the desired behavior.

we should move this into staging query - or an opportunisticallyRunOnBigQuery func.

tchow-zlai · 2025-01-17T01:23:46Z

actually this will trigger bigquery run on even groupBy selects which is probably not the desired behavior.

@nikhil-zlai can you show me where that is? AFAICT this will be used to run setups in GroupBy and Join, plus some other spark sql stuff which will mostly stay in spark based on this logic.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)

422-435: Consider extracting BigQuery handling to a separate method.

The BigQuery-specific logic would be more maintainable as a separate method.

+  private def handleBigQueryExecution(query: String, firstDataset: String): DataFrame = {
+    sparkSession.read
+      .option("viewsEnabled", true.toString)
+      .option("materializationDataset", firstDataset)
+      .format("bigquery")
+      .load(query)
+  }
+
   def sql(query: String): DataFrame = {
     // ... existing code ...
     val df = if (formats.map { case (_, fmtName) => fmtName.toUpperCase }.contains("BIGQUERY")) {
       val firstDataset = formats
         .collectFirst {
           case (mpi, _) if mpi.size >= 2 => mpi(1)
         }
         .getOrElse(throw new IllegalArgumentException("Could not find candidate dataset for materializationDataset"))
-      val df = sparkSession.read
-        .option("viewsEnabled", true.toString)
-        .option("materializationDataset", firstDataset)
-        .format("bigquery")
-        .load(query)
-      df
+      handleBigQueryExecution(query, firstDataset)
     } else sparkSession.sql(query)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 17641f0 and 22a3d70.

📒 Files selected for processing (2)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: join_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: other_spark_tests

🔇 Additional comments (2)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2)

40-43: LGTM: Required imports for SQL parsing.

424-428: ⚠️ Potential issue

Add bounds check for mpi array access.

Accessing mpi(1) could throw IndexOutOfBoundsException if mpi has fewer than 2 elements.
-            case (mpi, _) if mpi.size == 3 => mpi(1)
+            case (mpi, _) if mpi.size >= 2 => mpi(1)
Likely invalid or redundant comment.

coderabbitai · 2025-01-17T21:34:31Z

spark/src/main/scala/ai/chronon/spark/TableUtils.scala

+          case ur @ UnresolvedRelation(mpi, _, _) => {
+            DataPointer.from(ur.tableName, sparkSession).readFormat match {
+              case Some(fmtName) => (mpi, fmtName)
+            }
+          }


⚠️ Potential issue

Add null check for format extraction.

The format extraction could throw NPE if DataPointer.from(ur.tableName, sparkSession).readFormat returns None.

- DataPointer.from(ur.tableName, sparkSession).readFormat match { - case Some(fmtName) => (mpi, fmtName) - } + DataPointer.from(ur.tableName, sparkSession).readFormat match { + case Some(fmtName) => Some((mpi, fmtName)) + case None => None + }

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2025-01-17T21:34:32Z

spark/src/main/scala/ai/chronon/spark/TableUtils.scala

+      // todo(tchow): Use a global materializationDataset value.
+      val df = if (formats.map { case (_, fmtName) => fmtName.toUpperCase }.contains("BIGQUERY")) {


💡 Codebase verification

Consider implementing materialization dataset configuration now

Current implementation uses brittle position-based access (mpi(1)) and is tightly coupled with format-specific logic. A global configuration would improve maintainability.

🔗 Analysis chain

Consider using configuration for materialization dataset.

The TODO comment suggests using a global value. Consider implementing this now to avoid technical debt.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check if there's already a configuration for materialization dataset rg -l "materializationDataset.*=.*" --type scala

Length of output: 50

Script:

#!/bin/bash # Search for the TODO comment and surrounding context rg -B 5 -A 5 "todo.*Use a global materializationDataset" --type scala # Search for BigQuery-related configuration patterns rg -l "bigquery.*config" --type scala # Search for similar configuration patterns ast-grep --pattern 'val $_ = config.getString($_)'

Length of output: 1148

Script:

#!/bin/bash # Search for configuration-related files fd -e scala config # Search for configuration patterns in scala files rg -l "extends.*Config" --type scala

Length of output: 132

Script:

#!/bin/bash # Check content of Config.scala cat orchestration/src/main/scala/ai/chronon/orchestration/utils/Config.scala

Length of output: 1796

nikhil-zlai · 2025-01-21T22:23:23Z

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala

    // See: https://github.com/GoogleCloudDataproc/spark-bigquery-connector/issues/434#issuecomment-886156191
    // and: https://cloud.google.com/bigquery/docs/information-schema-intro#limitations


do you want to kill these lines too?

these are still valid, we're just setting them in a different way.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)

433-438: Consider using DataFrameReader options consistently.

The implementation correctly supports staging queries by inferring from project-ID.

Consider moving viewsEnabled and materializationDataset to a configuration object for better maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 22a3d70 and c3dd039.

📒 Files selected for processing (2)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: join_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: other_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: fetcher_spark_tests

🔇 Additional comments (3)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (3)

413-415: LGTM!

Clean implementation of logical plan parsing.

428-432: ⚠️ Potential issue

Add bounds check for mpi access.

Direct access to mpi(1) could throw IndexOutOfBoundsException.
-          .collectFirst {
-            case (mpi, _) if mpi.size == 3 => mpi(1)
-          }
+          .collectFirst {
+            case (mpi, _) if mpi.size >= 2 => mpi(1)
+          }
Likely invalid or redundant comment.

416-424: ⚠️ Potential issue

Add error handling for format extraction.

Missing case for None in pattern matching could cause NPE.
-            DataPointer.from(ur.tableName, sparkSession).readFormat match {
-              case Some(fmtName) => (mpi, fmtName)
-            }
+            DataPointer.from(ur.tableName, sparkSession).readFormat match {
+              case Some(fmtName) => Some((mpi, fmtName))
+              case None => None
+            }
Likely invalid or redundant comment.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)

426-427: Consider implementing global configuration now.

Move materialization dataset configuration to a global setting to improve maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between c3dd039 and 0e35c87.

📒 Files selected for processing (1)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: mutation_spark_tests
GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: other_spark_tests
GitHub Check: join_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: scala_compile_fmt_fix

🔇 Additional comments (4)

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (4)

413-415: LGTM!

Clean and standard approach to SQL parsing.

433-438: LGTM!

Clean BigQuery read configuration.

426-431: ⚠️ Potential issue

Add bounds check for array access.

Accessing mpi(1) could throw IndexOutOfBoundsException.

Apply this diff:
-          .collectFirst {
-            case (mpi, _) if mpi.size == 3 => mpi(1)
-          }
+          .collectFirst {
+            case (mpi, _) if mpi.size >= 2 => mpi(1)
+          }
Likely invalid or redundant comment.

416-424: ⚠️ Potential issue

Add error handling for format resolution.

The pattern matching is incomplete and could fail if readFormat returns None.

Apply this diff:
-          case ur @ UnresolvedRelation(mpi, _, _) => {
-            DataPointer.from(ur.tableName, sparkSession).readFormat match {
-              case Some(fmtName) => (mpi, fmtName)
-            }
-          }
+          case ur @ UnresolvedRelation(mpi, _, _) =>
+            DataPointer.from(ur.tableName, sparkSession).readFormat match {
+              case Some(fmtName) => Some((mpi, fmtName))
+              case None => None
+            }
Likely invalid or redundant comment.

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryStagingQuery.scala (1)

6-11: Class currently just proxies to parent implementation.

This new class extends StagingQuery but currently only calls the parent implementation without adding BigQuery-specific functionality.

Consider adding documentation explaining the purpose of this specialized class and its expected future behavior for BigQuery staging queries.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 04c50dc and 98dc781.

📒 Files selected for processing (1)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryStagingQuery.scala (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryStagingQuery.scala (1)

Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:29-60
Timestamp: 2025-01-24T23:55:30.256Z
Learning: In BigQuery integration, table existence check is performed outside the BigQueryFormat.createTable method, at a higher level in TableUtils.createTable.

tchow-zlai changed the base branch from tchow/temp-table-dataset to main January 16, 2025 09:16

tchow-zlai force-pushed the tchow/staging-query-gcp branch from 6ca603a to f4fc0ee Compare January 16, 2025 09:16

tchow-zlai changed the title ~~create views on datapointers~~ feat: stagingquery on GCP Jan 16, 2025

tchow-zlai marked this pull request as draft January 16, 2025 09:17

tchow-zlai marked this pull request as ready for review January 16, 2025 10:52

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

spark/src/main/scala/ai/chronon/spark/TableUtils.scala Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

tchow-zlai requested review from nikhil-zlai, piyush-zlai and david-zlai January 16, 2025 18:14

david-zlai previously approved these changes Jan 16, 2025

View reviewed changes

nikhil-zlai reviewed Jan 17, 2025

View reviewed changes

nikhil-zlai requested changes Jan 17, 2025

View reviewed changes

tchow-zlai force-pushed the tchow/staging-query-gcp branch from 17641f0 to 22a3d70 Compare January 17, 2025 21:31

coderabbitai bot reviewed Jan 17, 2025

View reviewed changes

nikhil-zlai reviewed Jan 21, 2025

View reviewed changes

tchow-zlai changed the base branch from main to tchow/write-option January 23, 2025 02:53

tchow-zlai force-pushed the tchow/staging-query-gcp branch from 22a3d70 to a3bdc42 Compare January 23, 2025 02:53

Base automatically changed from tchow/write-option to main January 23, 2025 18:03

tchow-zlai force-pushed the tchow/staging-query-gcp branch from a3bdc42 to c3dd039 Compare January 23, 2025 18:09

coderabbitai bot reviewed Jan 23, 2025

View reviewed changes

tchow-zlai changed the base branch from main to tchow/again January 23, 2025 20:10

tchow-zlai force-pushed the tchow/staging-query-gcp branch 2 times, most recently from 8c32c96 to a14078c Compare January 23, 2025 20:18

Base automatically changed from tchow/again to main January 23, 2025 20:27

tchow-zlai force-pushed the tchow/staging-query-gcp branch from a14078c to 0e35c87 Compare January 23, 2025 20:27

coderabbitai bot reviewed Jan 23, 2025

View reviewed changes

tchow-zlai dismissed david-zlai’s stale review via 0e35c87 January 28, 2025 19:26

tchow-zlai force-pushed the tchow/staging-query-gcp branch from 0e35c87 to 04c50dc Compare February 6, 2025 22:06

tchow-zlai changed the base branch from main to tchow/sq-bq February 21, 2025 01:48

tchow-zlai force-pushed the tchow/staging-query-gcp branch from 04c50dc to 6b14f76 Compare February 21, 2025 01:48

tchow-zlai force-pushed the tchow/sq-bq branch from 4bd578b to 2ad3039 Compare February 21, 2025 06:04

tchow-zlai force-pushed the tchow/staging-query-gcp branch 2 times, most recently from 22fb501 to 7dab021 Compare February 24, 2025 23:44

Base automatically changed from tchow/sq-bq to main February 25, 2025 00:37

tchow-zlai and others added 9 commits February 25, 2025 08:44

wip

394eb58

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

update

ae19dce

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

wip

98dc781

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

tchow-zlai force-pushed the tchow/staging-query-gcp branch from 7dab021 to 98dc781 Compare February 25, 2025 16:44

coderabbitai bot reviewed Feb 25, 2025

View reviewed changes

tchow-zlai closed this Mar 14, 2025

tchow-zlai deleted the tchow/staging-query-gcp branch March 14, 2025 07:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: stagingquery on GCP #225

feat: stagingquery on GCP #225

tchow-zlai commented Jan 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 16, 2025 •

edited

Loading

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Jan 16, 2025

coderabbitai bot left a comment

coderabbitai bot left a comment

nikhil-zlai Jan 17, 2025

tchow-zlai Jan 17, 2025

nikhil-zlai left a comment

tchow-zlai commented Jan 17, 2025

coderabbitai bot left a comment

coderabbitai bot Jan 17, 2025

coderabbitai bot Jan 17, 2025

nikhil-zlai Jan 21, 2025

tchow-zlai Jan 23, 2025

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

		// todo(tchow): Use a global materializationDataset value.
		val df = if (formats.map { case (_, fmtName) => fmtName.toUpperCase }.contains("BIGQUERY")) {

		// See: https://github.com/GoogleCloudDataproc/spark-bigquery-connector/issues/434#issuecomment-886156191
		// and: https://cloud.google.com/bigquery/docs/information-schema-intro#limitations

feat: stagingquery on GCP #225

feat: stagingquery on GCP #225

Conversation

tchow-zlai commented Jan 16, 2025 • edited by coderabbitai bot Loading

Summary

Checklist

Summary by CodeRabbit

coderabbitai bot commented Jan 16, 2025 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 16, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

nikhil-zlai Jan 17, 2025

Choose a reason for hiding this comment

tchow-zlai Jan 17, 2025

Choose a reason for hiding this comment

nikhil-zlai left a comment

Choose a reason for hiding this comment

tchow-zlai commented Jan 17, 2025

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 17, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 17, 2025

Choose a reason for hiding this comment

nikhil-zlai Jan 21, 2025

Choose a reason for hiding this comment

tchow-zlai Jan 23, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

tchow-zlai commented Jan 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 16, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)