Skip to content

feat: stagingquery on GCP #225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed

feat: stagingquery on GCP #225

wants to merge 9 commits into from

Conversation

tchow-zlai
Copy link
Collaborator

@tchow-zlai tchow-zlai commented Jan 16, 2025

Summary

  • This is a shorter-term hacky-ish workaround to support staging queries for BigQuery.
  • Enables support for it without any API changes by inferring from the project-ID

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • New Features

    • Enhanced SQL query execution with improved data source handling.
    • Added support for flexible BigQuery query processing.
    • Introduced a new BigQueryStagingQuery class for staging queries.
  • Bug Fixes

    • Improved error handling for SQL query parsing and execution.
    • Added better warning logging for potential query redefinition scenarios.

Copy link

coderabbitai bot commented Jan 16, 2025

Walkthrough

The pull request modifies the sql method in TableUtils to enhance query handling for different data sources, particularly focusing on BigQuery. The new implementation parses SQL queries to identify unresolved relations, checks their read format, and dynamically adjusts session configurations when BigQuery-related sources are detected. This approach provides more flexible query execution and improved error handling for diverse data source scenarios.

Changes

File Change Summary
spark/src/main/scala/ai/chronon/spark/TableUtils.scala - Replaced direct SQL execution with logical plan parsing
- Added detection of UnresolvedRelation instances
- Implemented dynamic configuration for BigQuery formats
- Enhanced error handling for query execution
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryStagingQuery.scala - Introduced BigQueryStagingQuery class
- Overrode computeStagingQuery method with optional parameters

Possibly related PRs

Suggested reviewers

  • nikhil-zlai
  • piyush-zlai
  • varant-zlai
  • david-zlai

Poem

🔍 In the realm of Spark and data's dance,
A query transforms with newfound stance.
BigQuery whispers, relations unfold,
Configurations brave and bold!
Code evolves, flexibility reigns supreme! 🚀

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@tchow-zlai tchow-zlai changed the base branch from tchow/temp-table-dataset to main January 16, 2025 09:16
@tchow-zlai tchow-zlai force-pushed the tchow/staging-query-gcp branch from 6ca603a to f4fc0ee Compare January 16, 2025 09:16
@tchow-zlai tchow-zlai changed the title create views on datapointers feat: stagingquery on GCP Jan 16, 2025
@tchow-zlai tchow-zlai marked this pull request as draft January 16, 2025 09:17
@tchow-zlai tchow-zlai marked this pull request as ready for review January 16, 2025 10:52
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between d8d1e08 and 34e85e7.

📒 Files selected for processing (1)
  • spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2 hunks)

val df = if (formats.map { case (_, fmtName) => fmtName.toUpperCase }.contains("BIGQUERY")) {
val firstDataset = formats
.collectFirst {
case (mpi, _) => mpi(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure safe access to mpi(1)

Accessing mpi(1) may cause an IndexOutOfBoundsException if mpi has fewer than two elements. Verify that mpi always has at least two elements before accessing it.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 34e85e7 and 318246c.

📒 Files selected for processing (1)
  • spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: table_utils_delta_format_spark_tests
  • GitHub Check: other_spark_tests
  • GitHub Check: mutation_spark_tests
  • GitHub Check: fetcher_spark_tests
  • GitHub Check: join_spark_tests
  • GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (2)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2)

40-44: LGTM!

The new imports support SQL parsing functionality.


411-440: ⚠️ Potential issue

Fix potential array index out of bounds.

The access to mpi(1) at line 428 could throw an IndexOutOfBoundsException if mpi has fewer than 2 elements.

-            case (mpi, _) if mpi.size == 3 => mpi(1)
+            case (mpi, _) if mpi.size >= 3 => mpi(1)

Verify BigQuery table name resolution.

The code assumes table names will resolve to valid BigQuery datasets. Add validation for table name format.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)

411-438: ⚠️ Potential issue

Enhanced query execution with format-specific handling.

The new implementation:

  1. Parses query to LogicalPlan
  2. Identifies table formats
  3. Applies BigQuery-specific configuration when needed

However, there's a potential NPE when accessing mpi(1).

Add null check:

- case (mpi, _) if mpi.size == 3 => mpi(1)
+ case (mpi, _) if mpi.size >= 3 => mpi(1)
🧹 Nitpick comments (1)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)

424-424: Address TODO comment.

Consider using a configuration value for global materializationDataset.

Would you like me to implement a configuration-based solution for the global materializationDataset?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 318246c and 17641f0.

📒 Files selected for processing (2)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: table_utils_delta_format_spark_tests
  • GitHub Check: fetcher_spark_tests
  • GitHub Check: other_spark_tests
  • GitHub Check: mutation_spark_tests
  • GitHub Check: join_spark_tests
  • GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (3)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (1)

40-42: Configuration handling simplified.

Direct configuration setting is cleaner than the previous save/restore pattern.

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2)

40-42: New imports for query parsing.

Added necessary imports for logical plan analysis.


425-437: Verify BigQuery format handling across codebase.

Let's ensure consistent BigQuery configuration handling.

✅ Verification successful

BigQuery configuration is correctly implemented

The different materializationDataset sources are intentional:

  • Ad-hoc queries use format-provided dataset
  • Partition queries use table's dataset
  • Write operations use configured output dataset
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Find all BigQuery format usage
rg -l 'format\s*=\s*"bigquery"' 

# Find viewsEnabled configuration
rg 'viewsEnabled'

# Find materializationDataset usage
rg 'materializationDataset'

Length of output: 1041


Script:

#!/bin/bash
# Check BigQueryFormat implementation
cat cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala

# Check GcpFormatProvider implementation
cat cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala

Length of output: 7471

david-zlai
david-zlai previously approved these changes Jan 16, 2025
Comment on lines 412 to 438
val logicalPlan: LogicalPlan = sparkSession.sessionState.sqlParser.parsePlan(query)

// Load tables through DataPointer resolution.
val formats = logicalPlan
.collect {
case ur @ UnresolvedRelation(mpi, _, _) => {
DataPointer.from(ur.tableName, sparkSession).readFormat match {
case Some(fmtName) => (mpi, fmtName)
}
}
}

// todo(tchow): Use a global materializationDataset value.
val df = if (formats.map { case (_, fmtName) => fmtName.toUpperCase }.contains("BIGQUERY")) {
val firstDataset = formats
.collectFirst {
case (mpi, _) if mpi.size == 3 => mpi(1)
}
.getOrElse(throw new IllegalArgumentException("Could not find candidate dataset for materializationDataset"))
val df = sparkSession.read
.option("viewsEnabled", true.toString)
.option("materializationDataset", firstDataset)
.format("bigquery")
.load(query)
df
} else sparkSession.sql(query)
df.coalesce(partitionCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break this out into a function checkAndRunBigQuery

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack will do

Copy link
Contributor

@nikhil-zlai nikhil-zlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this will trigger bigquery run on even groupBy selects which is probably not the desired behavior.

we should move this into staging query - or an opportunisticallyRunOnBigQuery func.

@tchow-zlai
Copy link
Collaborator Author

actually this will trigger bigquery run on even groupBy selects which is probably not the desired behavior.

@nikhil-zlai can you show me where that is? AFAICT this will be used to run setups in GroupBy and Join, plus some other spark sql stuff which will mostly stay in spark based on this logic.

@tchow-zlai tchow-zlai force-pushed the tchow/staging-query-gcp branch from 17641f0 to 22a3d70 Compare January 17, 2025 21:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)

422-435: Consider extracting BigQuery handling to a separate method.

The BigQuery-specific logic would be more maintainable as a separate method.

+  private def handleBigQueryExecution(query: String, firstDataset: String): DataFrame = {
+    sparkSession.read
+      .option("viewsEnabled", true.toString)
+      .option("materializationDataset", firstDataset)
+      .format("bigquery")
+      .load(query)
+  }
+
   def sql(query: String): DataFrame = {
     // ... existing code ...
     val df = if (formats.map { case (_, fmtName) => fmtName.toUpperCase }.contains("BIGQUERY")) {
       val firstDataset = formats
         .collectFirst {
           case (mpi, _) if mpi.size >= 2 => mpi(1)
         }
         .getOrElse(throw new IllegalArgumentException("Could not find candidate dataset for materializationDataset"))
-      val df = sparkSession.read
-        .option("viewsEnabled", true.toString)
-        .option("materializationDataset", firstDataset)
-        .format("bigquery")
-        .load(query)
-      df
+      handleBigQueryExecution(query, firstDataset)
     } else sparkSession.sql(query)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 17641f0 and 22a3d70.

📒 Files selected for processing (2)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: table_utils_delta_format_spark_tests
  • GitHub Check: mutation_spark_tests
  • GitHub Check: fetcher_spark_tests
  • GitHub Check: join_spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: other_spark_tests
🔇 Additional comments (2)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2)

40-43: LGTM: Required imports for SQL parsing.


424-428: ⚠️ Potential issue

Add bounds check for mpi array access.

Accessing mpi(1) could throw IndexOutOfBoundsException if mpi has fewer than 2 elements.

-            case (mpi, _) if mpi.size == 3 => mpi(1)
+            case (mpi, _) if mpi.size >= 2 => mpi(1)

Likely invalid or redundant comment.

Comment on lines 415 to 406
case ur @ UnresolvedRelation(mpi, _, _) => {
DataPointer.from(ur.tableName, sparkSession).readFormat match {
case Some(fmtName) => (mpi, fmtName)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add null check for format extraction.

The format extraction could throw NPE if DataPointer.from(ur.tableName, sparkSession).readFormat returns None.

-            DataPointer.from(ur.tableName, sparkSession).readFormat match {
-              case Some(fmtName) => (mpi, fmtName)
-            }
+            DataPointer.from(ur.tableName, sparkSession).readFormat match {
+              case Some(fmtName) => Some((mpi, fmtName))
+              case None => None
+            }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 422 to 410
// todo(tchow): Use a global materializationDataset value.
val df = if (formats.map { case (_, fmtName) => fmtName.toUpperCase }.contains("BIGQUERY")) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Consider implementing materialization dataset configuration now

Current implementation uses brittle position-based access (mpi(1)) and is tightly coupled with format-specific logic. A global configuration would improve maintainability.

🔗 Analysis chain

Consider using configuration for materialization dataset.

The TODO comment suggests using a global value. Consider implementing this now to avoid technical debt.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there's already a configuration for materialization dataset
rg -l "materializationDataset.*=.*" --type scala

Length of output: 50


Script:

#!/bin/bash
# Search for the TODO comment and surrounding context
rg -B 5 -A 5 "todo.*Use a global materializationDataset" --type scala

# Search for BigQuery-related configuration patterns
rg -l "bigquery.*config" --type scala

# Search for similar configuration patterns
ast-grep --pattern 'val $_ = config.getString($_)'

Length of output: 1148


Script:

#!/bin/bash
# Search for configuration-related files
fd -e scala config

# Search for configuration patterns in scala files
rg -l "extends.*Config" --type scala

Length of output: 132


Script:

#!/bin/bash
# Check content of Config.scala
cat orchestration/src/main/scala/ai/chronon/orchestration/utils/Config.scala

Length of output: 1796

Comment on lines 25 to 26
// See: https://github.com/GoogleCloudDataproc/spark-bigquery-connector/issues/434#issuecomment-886156191
// and: https://cloud.google.com/bigquery/docs/information-schema-intro#limitations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to kill these lines too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are still valid, we're just setting them in a different way.

@tchow-zlai tchow-zlai changed the base branch from main to tchow/write-option January 23, 2025 02:53
@tchow-zlai tchow-zlai force-pushed the tchow/staging-query-gcp branch from 22a3d70 to a3bdc42 Compare January 23, 2025 02:53
Base automatically changed from tchow/write-option to main January 23, 2025 18:03
@tchow-zlai tchow-zlai force-pushed the tchow/staging-query-gcp branch from a3bdc42 to c3dd039 Compare January 23, 2025 18:09
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)

433-438: Consider using DataFrameReader options consistently.

The implementation correctly supports staging queries by inferring from project-ID.

Consider moving viewsEnabled and materializationDataset to a configuration object for better maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 22a3d70 and c3dd039.

📒 Files selected for processing (2)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: table_utils_delta_format_spark_tests
  • GitHub Check: join_spark_tests
  • GitHub Check: mutation_spark_tests
  • GitHub Check: other_spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: fetcher_spark_tests
🔇 Additional comments (3)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (3)

413-415: LGTM!

Clean implementation of logical plan parsing.


428-432: ⚠️ Potential issue

Add bounds check for mpi access.

Direct access to mpi(1) could throw IndexOutOfBoundsException.

-          .collectFirst {
-            case (mpi, _) if mpi.size == 3 => mpi(1)
-          }
+          .collectFirst {
+            case (mpi, _) if mpi.size >= 2 => mpi(1)
+          }

Likely invalid or redundant comment.


416-424: ⚠️ Potential issue

Add error handling for format extraction.

Missing case for None in pattern matching could cause NPE.

-            DataPointer.from(ur.tableName, sparkSession).readFormat match {
-              case Some(fmtName) => (mpi, fmtName)
-            }
+            DataPointer.from(ur.tableName, sparkSession).readFormat match {
+              case Some(fmtName) => Some((mpi, fmtName))
+              case None => None
+            }

Likely invalid or redundant comment.

@tchow-zlai tchow-zlai changed the base branch from main to tchow/again January 23, 2025 20:10
@tchow-zlai tchow-zlai force-pushed the tchow/staging-query-gcp branch 2 times, most recently from 8c32c96 to a14078c Compare January 23, 2025 20:18
Base automatically changed from tchow/again to main January 23, 2025 20:27
@tchow-zlai tchow-zlai force-pushed the tchow/staging-query-gcp branch from a14078c to 0e35c87 Compare January 23, 2025 20:27
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)

426-427: Consider implementing global configuration now.

Move materialization dataset configuration to a global setting to improve maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between c3dd039 and 0e35c87.

📒 Files selected for processing (1)
  • spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: mutation_spark_tests
  • GitHub Check: table_utils_delta_format_spark_tests
  • GitHub Check: other_spark_tests
  • GitHub Check: join_spark_tests
  • GitHub Check: fetcher_spark_tests
  • GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (4)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (4)

413-415: LGTM!

Clean and standard approach to SQL parsing.


433-438: LGTM!

Clean BigQuery read configuration.


426-431: ⚠️ Potential issue

Add bounds check for array access.

Accessing mpi(1) could throw IndexOutOfBoundsException.

Apply this diff:

-          .collectFirst {
-            case (mpi, _) if mpi.size == 3 => mpi(1)
-          }
+          .collectFirst {
+            case (mpi, _) if mpi.size >= 2 => mpi(1)
+          }

Likely invalid or redundant comment.


416-424: ⚠️ Potential issue

Add error handling for format resolution.

The pattern matching is incomplete and could fail if readFormat returns None.

Apply this diff:

-          case ur @ UnresolvedRelation(mpi, _, _) => {
-            DataPointer.from(ur.tableName, sparkSession).readFormat match {
-              case Some(fmtName) => (mpi, fmtName)
-            }
-          }
+          case ur @ UnresolvedRelation(mpi, _, _) =>
+            DataPointer.from(ur.tableName, sparkSession).readFormat match {
+              case Some(fmtName) => Some((mpi, fmtName))
+              case None => None
+            }

Likely invalid or redundant comment.

@tchow-zlai tchow-zlai force-pushed the tchow/staging-query-gcp branch from 0e35c87 to 04c50dc Compare February 6, 2025 22:06
@tchow-zlai tchow-zlai changed the base branch from main to tchow/sq-bq February 21, 2025 01:48
@tchow-zlai tchow-zlai force-pushed the tchow/staging-query-gcp branch from 04c50dc to 6b14f76 Compare February 21, 2025 01:48
@tchow-zlai tchow-zlai force-pushed the tchow/staging-query-gcp branch 2 times, most recently from 22fb501 to 7dab021 Compare February 24, 2025 23:44
Base automatically changed from tchow/sq-bq to main February 25, 2025 00:37
tchow-zlai and others added 9 commits February 25, 2025 08:44
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>
@tchow-zlai tchow-zlai force-pushed the tchow/staging-query-gcp branch from 7dab021 to 98dc781 Compare February 25, 2025 16:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryStagingQuery.scala (1)

6-11: Class currently just proxies to parent implementation.

This new class extends StagingQuery but currently only calls the parent implementation without adding BigQuery-specific functionality.

Consider adding documentation explaining the purpose of this specialized class and its expected future behavior for BigQuery staging queries.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 04c50dc and 98dc781.

📒 Files selected for processing (1)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryStagingQuery.scala (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryStagingQuery.scala (1)
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:29-60
Timestamp: 2025-01-24T23:55:30.256Z
Learning: In BigQuery integration, table existence check is performed outside the BigQueryFormat.createTable method, at a higher level in TableUtils.createTable.

@tchow-zlai tchow-zlai closed this Mar 14, 2025
@tchow-zlai tchow-zlai deleted the tchow/staging-query-gcp branch March 14, 2025 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants