|
1 | 1 | # pandas
|
2 | 2 |
|
3 |
| -[https://pandas.pydata.org/](https://pandas.pydata.org/) |
4 |
| - |
5 |
| -   |
6 |
| - |
7 |
| -::: {.callout-warning} |
8 |
| -## The Pandas backend is slated for removal in Ibis 10.0 |
9 |
| -We recommend using one of our other backends. |
10 |
| - |
11 |
| -Many workloads work well on the DuckDB and Polars backends, for example. |
12 |
| -::: |
13 |
| - |
14 |
| - |
15 |
| -## Install |
16 |
| - |
17 |
| -Install Ibis and dependencies for the pandas backend: |
18 |
| - |
19 |
| -::: {.panel-tabset} |
20 |
| - |
21 |
| -## `pip` |
22 |
| - |
23 |
| -Install with the `pandas` extra: |
24 |
| - |
25 |
| -```{.bash} |
26 |
| -pip install 'ibis-framework[pandas]' |
27 |
| -``` |
28 |
| - |
29 |
| -And connect: |
30 |
| - |
31 |
| -```{.python} |
32 |
| -import ibis |
33 |
| -
|
34 |
| -con = ibis.pandas.connect() # <1> |
35 |
| -``` |
36 |
| - |
37 |
| -1. Adjust connection parameters as needed. |
38 |
| - |
39 |
| -## `conda` |
40 |
| - |
41 |
| -Install for pandas: |
42 |
| - |
43 |
| -```{.bash} |
44 |
| -conda install -c conda-forge ibis-pandas |
45 |
| -``` |
46 |
| - |
47 |
| -And connect: |
48 |
| - |
49 |
| -```{.python} |
50 |
| -import ibis |
51 |
| -
|
52 |
| -con = ibis.pandas.connect() # <1> |
53 |
| -``` |
54 |
| - |
55 |
| -1. Adjust connection parameters as needed. |
56 |
| - |
57 |
| -## `mamba` |
58 |
| - |
59 |
| -Install for pandas: |
60 |
| - |
61 |
| -```{.bash} |
62 |
| -mamba install -c conda-forge ibis-pandas |
63 |
| -``` |
64 |
| - |
65 |
| -And connect: |
66 |
| - |
67 |
| -```{.python} |
68 |
| -import ibis |
69 |
| -
|
70 |
| -con = ibis.pandas.connect() # <1> |
71 |
| -``` |
72 |
| - |
73 |
| -1. Adjust connection parameters as needed. |
| 3 | +::: {.callout-note} |
| 4 | +## The pandas backend was removed in Ibis version 10.0 |
74 | 5 |
|
| 6 | +See [our blog post](../posts/farewell-pandas/index.qmd) on the topic for more information. |
75 | 7 | :::
|
76 |
| - |
77 |
| - |
78 |
| - |
79 |
| -## User Defined functions (UDF) |
80 |
| - |
81 |
| -Ibis supports defining three kinds of user-defined functions for operations on |
82 |
| -expressions targeting the pandas backend: **element-wise**, **reduction**, and |
83 |
| -**analytic**. |
84 |
| - |
85 |
| -### Elementwise Functions |
86 |
| - |
87 |
| -An **element-wise** function is a function that takes N rows as input and |
88 |
| -produces N rows of output. `log`, `exp`, and `floor` are examples of |
89 |
| -element-wise functions. |
90 |
| - |
91 |
| -Here's how to define an element-wise function: |
92 |
| - |
93 |
| -```python |
94 |
| -import ibis.expr.datatypes as dt |
95 |
| -from ibis.backends.pandas.udf import udf |
96 |
| - |
97 |
| -@udf.elementwise(input_type=[dt.int64], output_type=dt.double) |
98 |
| -def add_one(x): |
99 |
| - return x + 1.0 |
100 |
| -``` |
101 |
| - |
102 |
| -### Reduction Functions |
103 |
| - |
104 |
| -A **reduction** is a function that takes N rows as input and produces 1 row |
105 |
| -as output. `sum`, `mean` and `count` are examples of reductions. In |
106 |
| -the context of a `GROUP BY`, reductions produce 1 row of output _per |
107 |
| -group_. |
108 |
| - |
109 |
| -Here's how to define a reduction function: |
110 |
| - |
111 |
| -```python |
112 |
| -import ibis.expr.datatypes as dt |
113 |
| -from ibis.backends.pandas.udf import udf |
114 |
| - |
115 |
| -@udf.reduction(input_type=[dt.double], output_type=dt.double) |
116 |
| -def double_mean(series): |
117 |
| - return 2 * series.mean() |
118 |
| -``` |
119 |
| - |
120 |
| -### Analytic Functions |
121 |
| - |
122 |
| -An **analytic** function is like an **element-wise** function in that it takes |
123 |
| -N rows as input and produces N rows of output. The key difference is that |
124 |
| -analytic functions can be applied _per group_ using window functions. Z-score |
125 |
| -is an example of an analytic function. |
126 |
| - |
127 |
| -Here's how to define an analytic function: |
128 |
| - |
129 |
| -```python |
130 |
| -import ibis.expr.datatypes as dt |
131 |
| -from ibis.backends.pandas.udf import udf |
132 |
| - |
133 |
| -@udf.analytic(input_type=[dt.double], output_type=dt.double) |
134 |
| -def zscore(series): |
135 |
| - return (series - series.mean()) / series.std() |
136 |
| -``` |
137 |
| - |
138 |
| -### Details of pandas UDFs |
139 |
| - |
140 |
| -- Element-wise provide support |
141 |
| - for applying your UDF to any combination of scalar values and columns. |
142 |
| -- Reductions provide support for |
143 |
| - whole column aggregations, grouped aggregations, and application of your |
144 |
| - function over a window. |
145 |
| -- Analytic functions work in both grouped and non-grouped |
146 |
| - settings |
147 |
| -- The objects you receive as input arguments are either `pandas.Series` or |
148 |
| - Python/NumPy scalars. |
149 |
| - |
150 |
| -::: {.callout-warning} |
151 |
| -## Keyword arguments must be given a default |
152 |
| - |
153 |
| -Any keyword arguments must be given a default value or the function **will |
154 |
| -not work**. |
155 |
| -::: |
156 |
| - |
157 |
| -A common Python convention is to set the default value to `None` and |
158 |
| -handle setting it to something not `None` in the body of the function. |
159 |
| - |
160 |
| -Using `add_one` from above as an example, the following call will receive a |
161 |
| -`pandas.Series` for the `x` argument: |
162 |
| - |
163 |
| -```python |
164 |
| -import ibis |
165 |
| -import pandas as pd |
166 |
| -df = pd.DataFrame({'a': [1, 2, 3]}) |
167 |
| -con = ibis.pandas.connect({'df': df}) |
168 |
| -t = con.table('df') |
169 |
| -expr = add_one(t.a) |
170 |
| -expr |
171 |
| -``` |
172 |
| - |
173 |
| -And this will receive the `int` 1: |
174 |
| - |
175 |
| -```python |
176 |
| -expr = add_one(1) |
177 |
| -expr |
178 |
| -``` |
179 |
| - |
180 |
| -Since the pandas backend passes around `**kwargs` you can accept `**kwargs` |
181 |
| -in your function: |
182 |
| - |
183 |
| -```python |
184 |
| -import ibis.expr.datatypes as dt |
185 |
| -from ibis.backends.pandas.udf import udf |
186 |
| - |
187 |
| -@udf.elementwise([dt.int64], dt.double) |
188 |
| -def add_two(x, **kwargs): # do stuff with kwargs |
189 |
| - return x + 2.0 |
190 |
| -``` |
191 |
| - |
192 |
| -Or you can leave them out as we did in the example above. You can also |
193 |
| -optionally accept specific keyword arguments. |
194 |
| - |
195 |
| -For example: |
196 |
| - |
197 |
| -```python |
198 |
| -import ibis.expr.datatypes as dt |
199 |
| -from ibis.backends.pandas.udf import udf |
200 |
| - |
201 |
| -@udf.elementwise([dt.int64], dt.double) |
202 |
| -def add_two_with_none(x, y=None): |
203 |
| - if y is None: |
204 |
| - y = 2.0 |
205 |
| - return x + y |
206 |
| -``` |
207 |
| - |
208 |
| -```{python} |
209 |
| -#| echo: false |
210 |
| -BACKEND = "Pandas" |
211 |
| -``` |
212 |
| - |
213 |
| -{{< include ./_templates/api.qmd >}} |
0 commit comments