-
Notifications
You must be signed in to change notification settings - Fork 48
fix: fix pandas.cut errors with empty bins #1499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bigframes/core/reshape/tile.py
Outdated
window_spec=window_specs.unbound(), | ||
) | ||
op = agg_ops.CutOp(bins, right=right, labels=labels) | ||
if isinstance(bins, typing.Iterable) and len(as_index) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we directly return the [pd.NA] * len(x) result from the elif branch len(list(bins)) == 0
?
In that case we don't need the additional if-else branch at the bottom of the function, and the logic looks more straightforward. Plus, we don't need to alter the behavior of CutOp.output_type() in aggregations.py too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the elif
branch can turn to the [pd.NA] * len(x)
result, such as
- 1st elif for
pd.IntervalIndex.from_tuples([])
. - 2nd elif for
[]
- 4st elif for
[1]
Because of that,CutOp.output_type()
might return different result for each case above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is that the "if" check on line 88 has quite some overlap with line 49 + line 56. It seems we are doing the same check repeatedly.
Anyway, this shouldn't affect the overall functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just refactor these if
branches for more readable. With this refactor, we also catches another edge cases bug. Thanks!
bigframes/core/reshape/tile.py
Outdated
window_spec=window_specs.unbound(), | ||
) | ||
op = agg_ops.CutOp(bins, right=right, labels=labels) | ||
if isinstance(bins, typing.Iterable) and len(as_index) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is that the "if" check on line 88 has quite some overlap with line 49 + line 56. It seems we are doing the same check repeatedly.
Anyway, this shouldn't affect the overall functionality.
dc098b3
to
d67531a
Compare
Fixes internal issue 403638910 🦕