Skip to content

feat: (Preview) Support automatic load of timedelta from BQ tables. #1429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Mar 4, 2025

Conversation

sycai
Copy link
Contributor

@sycai sycai commented Feb 25, 2025

This works only with read_gbq_table() under the hood, because read_gbq_query() erases column descriptions.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Feb 25, 2025
@sycai sycai marked this pull request as ready for review February 26, 2025 20:56
@sycai sycai requested review from a team as code owners February 26, 2025 20:56
@sycai sycai requested a review from GarrettWu February 26, 2025 20:56
@sycai sycai requested review from tswast, TrevorBergeron and chelsea-lin and removed request for GarrettWu February 26, 2025 20:57
@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Feb 26, 2025
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Feb 26, 2025
@sycai sycai requested a review from tswast February 27, 2025 17:52
@sycai
Copy link
Contributor Author

sycai commented Feb 27, 2025

Closing the PR for now to think about annotations for RECORD and ARRAY. I will prepare another PR once everything falls into place

@sycai sycai closed this Feb 27, 2025
@sycai sycai reopened this Mar 3, 2025
@sycai sycai requested a review from TrevorBergeron March 3, 2025 23:24
return query_job

def _attach_timedelta_metadata(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not put this logic instead in convert_to_schema_field? it is the natural inverse of convert_schema_field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! That makes the code cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though interestingly, in export_gbq we rely on the BigQuery engine to figure out the schema based on the provided SQL. I need to update the schema again afterwards to include the tag

elif (
field.field_type == "INTEGER"
and field.description is not None
and TIMEDELTA_DESCRIPTION_TAG in field.description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: but I do wonder if once in a while, we might catch a user description that accidentally uses '#microseconds' somewhere? maybe restrict to beginning, or put some odds character sequence to ensure non-accidental triggering

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG. Changed the logic to check whether the description ends with the tag.

@sycai sycai requested a review from TrevorBergeron March 4, 2025 00:09
@sycai sycai enabled auto-merge (squash) March 4, 2025 01:21
@sycai sycai merged commit b2917bb into main Mar 4, 2025
22 of 23 checks passed
@sycai sycai deleted the sycai_timedelta_autoload branch March 4, 2025 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants