Skip to content

Commit 93f8e10

Browse files
committed
refactor(pandas): remove the pandas backend
BREAKING CHANGE: The `pandas` backend is removed. Note that **pandas DataFrames are STILL VALID INPUTS AND OUTPUTS** and will remain so for the foreseeable future. Please use one of the other local backends like DuckDB, Polars, or DataFusion to perform operations directly on pandas DataFrames.
1 parent dcdeaea commit 93f8e10

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+264
-7503
lines changed

docs/backends/_utils.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,7 @@ def get_renderer(level: int) -> MdRenderer:
1717

1818
@cache
1919
def get_backend(backend: str):
20-
if backend == "pandas":
21-
return get_object(f"ibis.backends.{backend}", "BasePandasBackend")
22-
else:
23-
return get_object(f"ibis.backends.{backend}", "Backend")
20+
return get_object(f"ibis.backends.{backend}", "Backend")
2421

2522

2623
def get_callable(obj, name):

docs/backends/pandas.qmd

Lines changed: 3 additions & 209 deletions
Original file line numberDiff line numberDiff line change
@@ -1,213 +1,7 @@
11
# pandas
22

3-
[https://pandas.pydata.org/](https://pandas.pydata.org/)
4-
5-
![](https://img.shields.io/badge/memtables-native-green?style=flat-square) ![](https://img.shields.io/badge/inputs-CSV | Parquet-blue?style=flat-square) ![](https://img.shields.io/badge/outputs-CSV | pandas | Parquet | PyArrow-orange?style=flat-square)
6-
7-
::: {.callout-warning}
8-
## The Pandas backend is slated for removal in Ibis 10.0
9-
We recommend using one of our other backends.
10-
11-
Many workloads work well on the DuckDB and Polars backends, for example.
12-
:::
13-
14-
15-
## Install
16-
17-
Install Ibis and dependencies for the pandas backend:
18-
19-
::: {.panel-tabset}
20-
21-
## `pip`
22-
23-
Install with the `pandas` extra:
24-
25-
```{.bash}
26-
pip install 'ibis-framework[pandas]'
27-
```
28-
29-
And connect:
30-
31-
```{.python}
32-
import ibis
33-
34-
con = ibis.pandas.connect() # <1>
35-
```
36-
37-
1. Adjust connection parameters as needed.
38-
39-
## `conda`
40-
41-
Install for pandas:
42-
43-
```{.bash}
44-
conda install -c conda-forge ibis-pandas
45-
```
46-
47-
And connect:
48-
49-
```{.python}
50-
import ibis
51-
52-
con = ibis.pandas.connect() # <1>
53-
```
54-
55-
1. Adjust connection parameters as needed.
56-
57-
## `mamba`
58-
59-
Install for pandas:
60-
61-
```{.bash}
62-
mamba install -c conda-forge ibis-pandas
63-
```
64-
65-
And connect:
66-
67-
```{.python}
68-
import ibis
69-
70-
con = ibis.pandas.connect() # <1>
71-
```
72-
73-
1. Adjust connection parameters as needed.
3+
::: {.callout-note}
4+
## The pandas backend was removed in Ibis version 10.0
745

6+
See [our blog post](../posts/farewell-pandas/index.qmd) on the topic for more information.
757
:::
76-
77-
78-
79-
## User Defined functions (UDF)
80-
81-
Ibis supports defining three kinds of user-defined functions for operations on
82-
expressions targeting the pandas backend: **element-wise**, **reduction**, and
83-
**analytic**.
84-
85-
### Elementwise Functions
86-
87-
An **element-wise** function is a function that takes N rows as input and
88-
produces N rows of output. `log`, `exp`, and `floor` are examples of
89-
element-wise functions.
90-
91-
Here's how to define an element-wise function:
92-
93-
```python
94-
import ibis.expr.datatypes as dt
95-
from ibis.backends.pandas.udf import udf
96-
97-
@udf.elementwise(input_type=[dt.int64], output_type=dt.double)
98-
def add_one(x):
99-
return x + 1.0
100-
```
101-
102-
### Reduction Functions
103-
104-
A **reduction** is a function that takes N rows as input and produces 1 row
105-
as output. `sum`, `mean` and `count` are examples of reductions. In
106-
the context of a `GROUP BY`, reductions produce 1 row of output _per
107-
group_.
108-
109-
Here's how to define a reduction function:
110-
111-
```python
112-
import ibis.expr.datatypes as dt
113-
from ibis.backends.pandas.udf import udf
114-
115-
@udf.reduction(input_type=[dt.double], output_type=dt.double)
116-
def double_mean(series):
117-
return 2 * series.mean()
118-
```
119-
120-
### Analytic Functions
121-
122-
An **analytic** function is like an **element-wise** function in that it takes
123-
N rows as input and produces N rows of output. The key difference is that
124-
analytic functions can be applied _per group_ using window functions. Z-score
125-
is an example of an analytic function.
126-
127-
Here's how to define an analytic function:
128-
129-
```python
130-
import ibis.expr.datatypes as dt
131-
from ibis.backends.pandas.udf import udf
132-
133-
@udf.analytic(input_type=[dt.double], output_type=dt.double)
134-
def zscore(series):
135-
return (series - series.mean()) / series.std()
136-
```
137-
138-
### Details of pandas UDFs
139-
140-
- Element-wise provide support
141-
for applying your UDF to any combination of scalar values and columns.
142-
- Reductions provide support for
143-
whole column aggregations, grouped aggregations, and application of your
144-
function over a window.
145-
- Analytic functions work in both grouped and non-grouped
146-
settings
147-
- The objects you receive as input arguments are either `pandas.Series` or
148-
Python/NumPy scalars.
149-
150-
::: {.callout-warning}
151-
## Keyword arguments must be given a default
152-
153-
Any keyword arguments must be given a default value or the function **will
154-
not work**.
155-
:::
156-
157-
A common Python convention is to set the default value to `None` and
158-
handle setting it to something not `None` in the body of the function.
159-
160-
Using `add_one` from above as an example, the following call will receive a
161-
`pandas.Series` for the `x` argument:
162-
163-
```python
164-
import ibis
165-
import pandas as pd
166-
df = pd.DataFrame({'a': [1, 2, 3]})
167-
con = ibis.pandas.connect({'df': df})
168-
t = con.table('df')
169-
expr = add_one(t.a)
170-
expr
171-
```
172-
173-
And this will receive the `int` 1:
174-
175-
```python
176-
expr = add_one(1)
177-
expr
178-
```
179-
180-
Since the pandas backend passes around `**kwargs` you can accept `**kwargs`
181-
in your function:
182-
183-
```python
184-
import ibis.expr.datatypes as dt
185-
from ibis.backends.pandas.udf import udf
186-
187-
@udf.elementwise([dt.int64], dt.double)
188-
def add_two(x, **kwargs): # do stuff with kwargs
189-
return x + 2.0
190-
```
191-
192-
Or you can leave them out as we did in the example above. You can also
193-
optionally accept specific keyword arguments.
194-
195-
For example:
196-
197-
```python
198-
import ibis.expr.datatypes as dt
199-
from ibis.backends.pandas.udf import udf
200-
201-
@udf.elementwise([dt.int64], dt.double)
202-
def add_two_with_none(x, y=None):
203-
if y is None:
204-
y = 2.0
205-
return x + y
206-
```
207-
208-
```{python}
209-
#| echo: false
210-
BACKEND = "Pandas"
211-
```
212-
213-
{{< include ./_templates/api.qmd >}}

docs/backends_sankey.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ def to_greyish(hex_code, grey_value=128):
4242
"SQLite",
4343
"Trino",
4444
],
45-
list(category_colors.keys())[2]: ["Dask", "pandas", "Polars"],
45+
list(category_colors.keys())[2]: ["Polars"],
4646
}
4747

4848
nodes, links = [], []

ibis/backends/conftest.py

Lines changed: 2 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,13 @@
44
import importlib
55
import importlib.metadata
66
import itertools
7-
import operator
87
from functools import cache
98
from pathlib import Path
109
from typing import TYPE_CHECKING, Any
1110

1211
import _pytest
1312
import pytest
1413
from packaging.requirements import Requirement
15-
from packaging.version import parse as vparse
1614

1715
import ibis
1816
from ibis import util
@@ -30,22 +28,6 @@
3028
from ibis.backends.tests.base import BackendTest
3129

3230

33-
def compare_versions(module_name, given_version, op):
34-
try:
35-
current_version = importlib.metadata.version(module_name)
36-
return op(vparse(current_version), vparse(given_version))
37-
except importlib.metadata.PackageNotFoundError:
38-
return False
39-
40-
41-
def is_newer_than(module_name, given_version):
42-
return compare_versions(module_name, given_version, operator.gt)
43-
44-
45-
def is_older_than(module_name, given_version):
46-
return compare_versions(module_name, given_version, operator.lt)
47-
48-
4931
TEST_TABLES = {
5032
"functional_alltypes": ibis.schema(
5133
{
@@ -486,7 +468,7 @@ def _setup_backend(request, data_dir, tmp_path_factory, worker_id):
486468

487469

488470
@pytest.fixture(
489-
params=_get_backends_to_test(discard=("pandas",)),
471+
params=_get_backends_to_test(),
490472
scope="session",
491473
)
492474
def ddl_backend(request, data_dir, tmp_path_factory, worker_id):
@@ -501,7 +483,7 @@ def ddl_con(ddl_backend):
501483

502484

503485
@pytest.fixture(
504-
params=_get_backends_to_test(keep=("pandas", "pyspark")),
486+
params=_get_backends_to_test(keep=("pyspark",)),
505487
scope="session",
506488
)
507489
def udf_backend(request, data_dir, tmp_path_factory, worker_id):

0 commit comments

Comments
 (0)