Skip to content

Commit 801e5ce

Browse files
committed
Add polars with_columns
Support for polars with_columns api for eager and lazy execution.
1 parent e04df19 commit 801e5ce

File tree

15 files changed

+1925
-2
lines changed

15 files changed

+1925
-2
lines changed

docs/reference/decorators/with_columns.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,16 @@ We have a ``with_columns`` option to run operations on columns of a Pandas dataf
1313
:special-members: __init__
1414

1515

16+
Polars
17+
--------------
18+
19+
We have a ``with_columns`` decorator to run operations on columns of a Polars dataframe or lazyframe and append the results as new columns.
20+
21+
**Reference Documentation**
22+
23+
.. autoclass:: hamilton.plugins.h_polars.with_columns
24+
:special-members: __init__
25+
1626
PySpark
1727
--------------
1828

116 KB
Loading
116 KB
Loading

examples/polars/with_columns/README

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Using with_columns with Pandas
2+
3+
We show the ability to use the familiar `with_columns` from `polars`. Supported for both `pl.DataFrame` and `pl.LazyFrame`.
4+
5+
To see the example look at the notebook.
6+
7+
![image info](./dag.png)
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import polars as pl
2+
3+
from hamilton.function_modifiers import config
4+
5+
"""
6+
Notes:
7+
1. This file is used for all the [ray|dask|spark]/hello_world examples.
8+
2. It therefore show cases how you can write something once and not only scale it, but port it
9+
to different frameworks with ease!
10+
"""
11+
12+
13+
@config.when(case="millions")
14+
def avg_3wk_spend__millions(spend: pl.Series) -> pl.Series:
15+
"""Rolling 3 week average spend."""
16+
return (
17+
spend.to_frame("spend").select(pl.col("spend").rolling_mean(window_size=3) / 1e6)
18+
).to_series(0)
19+
20+
21+
@config.when(case="thousands")
22+
def avg_3wk_spend__thousands(spend: pl.Series) -> pl.Series:
23+
"""Rolling 3 week average spend."""
24+
return (
25+
spend.to_frame("spend").select(pl.col("spend").rolling_mean(window_size=3) / 1e3)
26+
).to_series(0)
27+
28+
29+
def spend_per_signup(spend: pl.Series, signups: pl.Series) -> pl.Series:
30+
"""The cost per signup in relation to spend."""
31+
return spend / signups
32+
33+
34+
def spend_mean(spend: pl.Series) -> float:
35+
"""Shows function creating a scalar. In this case it computes the mean of the entire column."""
36+
return spend.mean()
37+
38+
39+
def spend_zero_mean(spend: pl.Series, spend_mean: float) -> pl.Series:
40+
"""Shows function that takes a scalar. In this case to zero mean spend."""
41+
return spend - spend_mean
42+
43+
44+
def spend_std_dev(spend: pl.Series) -> float:
45+
"""Function that computes the standard deviation of the spend column."""
46+
return spend.std()
47+
48+
49+
def spend_zero_mean_unit_variance(spend_zero_mean: pl.Series, spend_std_dev: float) -> pl.Series:
50+
"""Function showing one way to make spend have zero mean and unit variance."""
51+
return spend_zero_mean / spend_std_dev
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
import polars as pl
2+
3+
from hamilton.function_modifiers import config
4+
5+
"""
6+
Notes:
7+
1. This file is used for all the [ray|dask|spark]/hello_world examples.
8+
2. It therefore show cases how you can write something once and not only scale it, but port it
9+
to different frameworks with ease!
10+
"""
11+
12+
13+
@config.when(case="millions")
14+
def avg_3wk_spend__millions(spend: pl.Expr) -> pl.Expr:
15+
"""Rolling 3 week average spend."""
16+
return spend.rolling_mean(window_size=3) / 1e6
17+
18+
19+
@config.when(case="thousands")
20+
def avg_3wk_spend__thousands(spend: pl.Expr) -> pl.Expr:
21+
"""Rolling 3 week average spend."""
22+
return spend.rolling_mean(window_size=3) / 1e3
23+
24+
25+
def spend_per_signup(spend: pl.Expr, signups: pl.Expr) -> pl.Expr:
26+
"""The cost per signup in relation to spend."""
27+
return spend / signups
28+
29+
30+
def spend_mean(spend: pl.Expr) -> float:
31+
"""Shows function creating a scalar. In this case it computes the mean of the entire column."""
32+
return spend.mean()
33+
34+
35+
def spend_zero_mean(spend: pl.Expr, spend_mean: float) -> pl.Expr:
36+
"""Shows function that takes a scalar. In this case to zero mean spend."""
37+
return spend - spend_mean
38+
39+
40+
def spend_std_dev(spend: pl.Expr) -> float:
41+
"""Function that computes the standard deviation of the spend column."""
42+
return spend.std()
43+
44+
45+
def spend_zero_mean_unit_variance(spend_zero_mean: pl.Expr, spend_std_dev: float) -> pl.Expr:
46+
"""Function showing one way to make spend have zero mean and unit variance."""
47+
return spend_zero_mean / spend_std_dev

examples/polars/with_columns/notebook.ipynb

Lines changed: 1219 additions & 0 deletions
Large diffs are not rendered by default.

hamilton/plugins/h_pandas.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,8 @@ def __init__(
144144
:param pass_dataframe_as: The name of the dataframe that we're modifying, as known to the subdag.
145145
If you pass this in, you are responsible for extracting columns out. If not provided, you have
146146
to pass columns_to_pass in, and we will extract the columns out for you.
147+
:param select: The end nodes that represent columns to be appended to the original dataframe
148+
via with_columns. Existing columns will be overridden.
147149
:param namespace: The namespace of the nodes, so they don't clash with the global namespace
148150
and so this can be reused. If its left out, there will be no namespace (in which case you'll want
149151
to be careful about repeating it/reusing the nodes in other parts of the DAG.)
@@ -153,6 +155,7 @@ def __init__(
153155

154156
self.subdag_functions = subdag.collect_functions(load_from)
155157

158+
# TODO: select none should append all nodes like h_spark
156159
if select is None:
157160
raise ValueError("Please specify at least one column to append or update.")
158161
else:

0 commit comments

Comments
 (0)