Skip to content

feat!: Enable reading JSON data with dbjson extension dtype #1139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 23, 2025

Conversation

chelsea-lin
Copy link
Contributor

@chelsea-lin chelsea-lin commented Nov 7, 2024

feat!: Enable reading JSON data with dbjson extension dtype (#1139)

This change updates how we handle JSON data types read from BigQuery.

Previously, BigQuery JSON types were treated as generic large strings within our system. To improve accuracy and functionality, we now map them to a dedicated JSON data type (db_dtypes.JSONType or db_dtypes.JSONArrowType for pyarrow).

While this provides a more appropriate representation of JSON data, it's important to note that this feature is still in preview and may evolve.

Release-As: 1.34.0

  • Fixes internal issue 377764399
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes internal issue 377764399 🦕

@chelsea-lin chelsea-lin requested a review from tswast November 7, 2024 21:24
@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Nov 7, 2024
@chelsea-lin chelsea-lin changed the title feat: Enable reading/writing JSON data with dbjson extension dtype feat: Enable reading JSON data with dbjson extension dtype Nov 7, 2024
@tswast tswast changed the title feat: Enable reading JSON data with dbjson extension dtype feat!: Enable reading JSON data with dbjson extension dtype Nov 8, 2024
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

Let's make sure we mark this as a "breaking change" in our release notes (https://github.com/googleapis/release-please/blob/main/README.md#how-should-i-write-my-commits)

Since it's a breaking change for a preview feature, we shouldn't bump to 2.0 though. Let's use the Release-As footer in the commit message to make sure we do a 1.x release. https://github.com/googleapis/release-please/blob/main/README.md#how-do-i-change-the-version-number

@chelsea-lin chelsea-lin force-pushed the main_chelsealin_readdbjsontype branch from 7c81975 to 48ca926 Compare November 13, 2024 19:43
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Nov 13, 2024
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_readdbjsontype branch from 48ca926 to 2707038 Compare January 22, 2025 18:22
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Jan 22, 2025
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_readdbjsontype branch from 2707038 to 6e1aacc Compare January 22, 2025 19:39
@chelsea-lin chelsea-lin marked this pull request as ready for review January 22, 2025 19:40
@chelsea-lin chelsea-lin requested review from a team as code owners January 22, 2025 19:40
@chelsea-lin chelsea-lin requested a review from jialuoo January 22, 2025 19:40
@chelsea-lin chelsea-lin requested a review from tswast January 23, 2025 18:18
Comment on lines -206 to -211
# b/381148539
def test_json_in_struct():
df = bpd.read_gbq(
"SELECT STRUCT(JSON '{\\\"a\\\": 1}' AS data, 1 AS number) as struct_col"
)
assert df["struct_col"].struct.field("data")[0] == '{"a":1}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep / update this test, instead? I'd like to make sure we avoid regressions since I believe this was added to make sure we can work with some AI/ML/ObjectRef features.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I moved this test to test_dataframe_io.py. Also add similar tests for both struct and array

@tswast tswast merged commit f672262 into main Jan 23, 2025
22 checks passed
@tswast tswast deleted the main_chelsealin_readdbjsontype branch January 23, 2025 22:33
shuoweil pushed a commit that referenced this pull request Jan 24, 2025
This change updates how we handle JSON data types read from BigQuery.

Previously, BigQuery JSON types were treated as generic large strings within our system. To improve accuracy and functionality, we now map them to a dedicated JSON data type (db_dtypes.JSONType or db_dtypes.JSONArrowType for pyarrow).

While this provides a more appropriate representation of JSON data, it's important to note that this feature is still in preview and may evolve.

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Tim Sweña (Swast) <[email protected]>
Release-As: 1.34.0
shuoweil pushed a commit that referenced this pull request Jan 24, 2025
This change updates how we handle JSON data types read from BigQuery.

Previously, BigQuery JSON types were treated as generic large strings within our system. To improve accuracy and functionality, we now map them to a dedicated JSON data type (db_dtypes.JSONType or db_dtypes.JSONArrowType for pyarrow).

While this provides a more appropriate representation of JSON data, it's important to note that this feature is still in preview and may evolve.

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Tim Sweña (Swast) <[email protected]>
Release-As: 1.34.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants