Skip to content

feat: support bytes type in remote_function #761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 7, 2024

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Jun 6, 2024

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

🦕

@tswast tswast requested review from a team as code owners June 6, 2024 20:48
@tswast tswast requested a review from chelsea-lin June 6, 2024 20:48
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jun 6, 2024
@@ -65,6 +102,7 @@ def get_pd_series(row):
"Int64": int,
"Float64": float,
"string": str,
"binary[pyarrow]": base64.b64decode,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dictionary worries me. It seems pretty brittle to rely on the string representations of pandas dtypes. We should revisit this in b/345222844

Copy link
Contributor

@chelsea-lin chelsea-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall with some minor comments.

@@ -484,17 +485,27 @@ def add_one(x):


@pytest.mark.flaky(retries=2, delay=120)
def test_series_map(session_with_bq_connection, scalars_dfs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need the old test_series_map test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think test_series_map_bytes should be sufficent. We have other tests that check integer input and output with remote_function.

@@ -3313,11 +3313,12 @@ def apply(self, func, *, axis=0, args: typing.Tuple = (), **kwargs):
# Early check whether the dataframe dtypes are currently supported
# in the remote function
# NOTE: Keep in sync with the value converters used in the gcf code
# generated in generate_cloud_function_main_code in remote_function.py
# generated in in remote_function_template.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in generate_cloud_function_main_code? or just remove extra in here?

)
return entry_point

def create_cloud_function(
self,
def_,
cf_name,
*,
input_types: Tuple[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do input_types and output_types need default values, when they are defined after "*"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the * means they are keyword-only. Without a default value they are required.

def_,
directory,
*,
input_types: Tuple[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar questions: whether they need default values or not.

@tswast tswast enabled auto-merge (squash) June 6, 2024 23:47
@tswast
Copy link
Collaborator Author

tswast commented Jun 7, 2024

test_df_apply_axis_1_unsupported_dtype tests are failing because binary is an ArrowDtype, so the isinstance based validation considers all ArrowDtype to be valid.

I will update the validation to check the arrow dtype as well in the case of arrow values.

@tswast tswast merged commit 4915424 into main Jun 7, 2024
23 checks passed
@tswast tswast deleted the tswast-remote_function-bytes branch June 7, 2024 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants