Skip to content

docs: code samples for Series.where and Series.mask #217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 17, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions third_party/bigframes_vendored/pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -1696,6 +1696,49 @@ def kurt(self):
def where(self, cond, other):
"""Replace values where the condition is False.

**Examples:**

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series([10, 11, 12, 13, 14])
>>> s
0 10
1 11
2 12
3 13
4 14
dtype: Int64

You can filter the values in the Series based on a condition. The values
matching the condition would be kept, and not matching would be replaced.
The default replacement value is ``NA``.

>>> s.where(s % 2 == 0)
0 10
1 <NA>
2 12
3 <NA>
4 14
dtype: Int64

You can specify a custom replacement value for non-matching values.

>>> s.where(s % 2 == 0, -1)
0 10
1 -1
2 12
3 -1
4 14
dtype: Int64
>>> s.where(s % 2 == 0, 100*s)
0 10
1 1100
2 12
3 1300
4 14
dtype: Int64

Args:
cond (bool Series/DataFrame, array-like, or callable):
Where cond is True, keep the original value. Where False, replace
Expand All @@ -1720,6 +1763,77 @@ def where(self, cond, other):
def mask(self, cond, other):
"""Replace values where the condition is True.

**Examples:**

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series([10, 11, 12, 13, 14])
>>> s
0 10
1 11
2 12
3 13
4 14
dtype: Int64

You can mask the values in the Series based on a condition. The values
matching the condition would be masked.

>>> s.mask(s % 2 == 0)
0 <NA>
1 11
2 <NA>
3 13
4 <NA>
dtype: Int64

You can specify a custom mask value.

>>> s.mask(s % 2 == 0, -1)
0 -1
1 11
2 -1
3 13
4 -1
dtype: Int64
>>> s.mask(s % 2 == 0, 100*s)
0 1000
1 11
2 1200
3 13
4 1400
dtype: Int64

You can also use a remote function to evaluate the mask condition. This
is useful in situation such as the following, where the mask
condition is evaluated based on a complicated business logic which cannot
be expressed in form of a Series.

>>> @bpd.remote_function([str], bool, reuse=False)
... def should_mask(name):
... hash = 0
... for char_ in name:
... hash += ord(char_)
... return hash % 2 == 0

>>> s = bpd.Series(["Alice", "Bob", "Caroline"])
>>> s
0 Alice
1 Bob
2 Caroline
dtype: string
>>> s.mask(should_mask)
0 <NA>
1 Bob
2 Caroline
dtype: string
>>> s.mask(should_mask, "REDACTED")
0 REDACTED
1 Bob
2 Caroline
dtype: string

Args:
cond (bool Series/DataFrame, array-like, or callable):
Where cond is False, keep the original value. Where True, replace
Expand Down