-
Notifications
You must be signed in to change notification settings - Fork 48
feat: add subset
parameter to DataFrame.dropna
to select which columns to consider
#981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…lumns to consider
Looks like some real test failures. I'll look at these later today. |
Looks like there was a bug with |
@@ -1662,6 +1664,15 @@ def dropna( | |||
<BLANKLINE> | |||
[3 rows x 3 columns] | |||
|
|||
Define in which columns to look for missing values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From doctest.
@@ -1,5 +1,5 @@
- name toy born
-1 Batman Batmobile 1940-04-25
-2 Catwoman Bullwhip NaT
+ name toy born
+1 Batman Batmobile 1940-04-25
+2 Catwoman Bullwhip <NA>
[2 rows x 3 columns]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed! The diff got a little big because I pulled in #987. It should be easier to review again once that is merged.
With this change I can once again run ``` pytest --doctest-modules third_party/bigframes_vendored/pandas/core/frame.py ``` Note: having multiple `version.py` files should be fine. release-please will update all such files it finds.
…-subset" This reverts commit 0f18294.
bigframes/dataframe.py
Outdated
if subset is None: | ||
subset_ids = None | ||
elif not utils.is_list_like(subset): | ||
subset_ids = [self._block.label_to_col_id[subset]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of this seems to be a list of tuple, and don't work. May need a loop like in else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is supposed to cover the single column string case. A loop would give one character per iteration. I'll take a look at the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the defination def label_to_col_id(self) -> typing.Mapping[Label, typing.Sequence[str]]
, I think self._block.label_to_col_id[subset] will be a sequence/tuple? So I think a loop like for id in self._block.label_to_col_id[label]
should be included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted offline. self._block.label_to_col_id[subset]
is a tuple. I've corrected.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes internal issue 366248570
🦕