-
Notifications
You must be signed in to change notification settings - Fork 48
feat: support bytes type in remote_function
#761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -65,6 +102,7 @@ def get_pd_series(row): | |||
"Int64": int, | |||
"Float64": float, | |||
"string": str, | |||
"binary[pyarrow]": base64.b64decode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dictionary worries me. It seems pretty brittle to rely on the string representations of pandas dtypes. We should revisit this in b/345222844
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall with some minor comments.
@@ -484,17 +485,27 @@ def add_one(x): | |||
|
|||
|
|||
@pytest.mark.flaky(retries=2, delay=120) | |||
def test_series_map(session_with_bq_connection, scalars_dfs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need the old test_series_map
test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think test_series_map_bytes
should be sufficent. We have other tests that check integer input and output with remote_function.
bigframes/dataframe.py
Outdated
@@ -3313,11 +3313,12 @@ def apply(self, func, *, axis=0, args: typing.Tuple = (), **kwargs): | |||
# Early check whether the dataframe dtypes are currently supported | |||
# in the remote function | |||
# NOTE: Keep in sync with the value converters used in the gcf code | |||
# generated in generate_cloud_function_main_code in remote_function.py | |||
# generated in in remote_function_template.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in generate_cloud_function_main_code
? or just remove extra in
here?
) | ||
return entry_point | ||
|
||
def create_cloud_function( | ||
self, | ||
def_, | ||
cf_name, | ||
*, | ||
input_types: Tuple[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do input_types
and output_types
need default values, when they are defined after "*"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the *
means they are keyword-only. Without a default value they are required.
def_, | ||
directory, | ||
*, | ||
input_types: Tuple[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar questions: whether they need default values or not.
I will update the validation to check the arrow dtype as well in the case of arrow values. |
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
🦕