Skip to content

docs(python): Replace pandas where with mask in Migrating -> Coming from Pandas #21085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 4, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions docs/source/user-guide/migration/pandas.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,20 +72,20 @@ However, the best way to select data in Polars is to use the expression API. For
want to select a column in pandas, you can do one of the following:

```python
df['a']
df.loc[:,'a']
df["a"]
df.loc[:,"a"]
```

but in Polars you would use the `.select` method:

```python
df.select('a')
df.select("a")
```

If you want to select rows based on the values then in Polars you use the `.filter` method:

```python
df.filter(pl.col('a') < 10)
df.filter(pl.col("a") < 10)
```

As noted in the section on expressions below, Polars can run operations in `.select` and `filter` in
Expand All @@ -104,16 +104,16 @@ has numerous columns but we just want to do a group by on one of the id columns
by a value column (`v1`). In pandas this would be:

```python
df = pd.read_csv(csv_file, usecols=['id1','v1'])
grouped_df = df.loc[:,['id1','v1']].groupby('id1').sum('v1')
df = pd.read_csv(csv_file, usecols=["id1","v1"])
grouped_df = df.loc[:,["id1","v1"]].groupby("id1").sum("v1")
```

In Polars you can build this query in lazy mode with query optimization and evaluate it by replacing
the eager pandas function `read_csv` with the implicitly lazy Polars function `scan_csv`:

```python
df = pl.scan_csv(csv_file)
grouped_df = df.group_by('id1').agg(pl.col('v1').sum()).collect()
grouped_df = df.group_by("id1").agg(pl.col("v1").sum()).collect()
```

Polars optimizes this query by identifying that only the `id1` and `v1` columns are relevant and so
Expand Down Expand Up @@ -167,7 +167,7 @@ value in `a` with the value in `b`.
In pandas this would be:

```python
df.assign(a=lambda df_: df_.a.where(df_.c != 2, df_.b))
df.assign(a=lambda df_: df_["a"].mask(df_["c"] == 2, df_["b"]))
```

while in Polars this would be:
Expand Down
Loading