Skip to content

Commit 5f18747

Browse files
authored
docs(blog): ibis, duckdb geo and lonboard for overture maps (#10215)
1 parent d8638b6 commit 5f18747

File tree

5 files changed

+233
-0
lines changed

5 files changed

+233
-0
lines changed

docs/_freeze/posts/ibis-overturemaps/index/execute-results/html.json

Lines changed: 16 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Loading
Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
---
2+
title: "From query to plot: Exploring GeoParquet Overture Maps with Ibis, DuckDB, and Lonboard"
3+
author: Naty Clementi and Kyle Barron
4+
date: 2024-09-25
5+
categories:
6+
- blog
7+
- duckdb
8+
- overturemaps
9+
- lonboard
10+
- geospatial
11+
execute:
12+
freeze: false
13+
---
14+
15+
With the release of `DuckDB 1.1.1`, now we have support for reading GeoParquet
16+
files! With this exciting update we can query rich datasets from Overture Maps
17+
using python via Ibis with the performance of `DuckDB`.
18+
19+
But the good news doesn't stop there, since `Ibis 9.2`, `lonboard` can plot data
20+
directly from an `Ibis` table, adding more simplicity and speed to your
21+
geospatial analysis.
22+
23+
Let’s dive into how these tools come together.
24+
25+
## Installation
26+
27+
First make sure you have `duckdb>=1.1.1`, then install Ibis with the dependencies
28+
needed to work with geospatial data using DuckDB.
29+
30+
```bash
31+
$ pip install 'duckdb>=1.1.1'
32+
$ pip install 'ibis-framework[duckdb,geospatial]' lonboard
33+
```
34+
35+
## Motivation
36+
37+
Overture Maps is an open-source initiative that provides high-quality,
38+
interoperable map data by integrating contributions from leading companies and
39+
open data sources to support a wide range of applications.
40+
41+
Overture Maps offers a variety of datasets to query. For example, there is plenty
42+
of information about power infrastructure.
43+
44+
Let's create some plots of the U.S. power infrastructure. We'll look into power
45+
plants and power lines for the lower 48 states (excluding Hawaii and Alaska for
46+
simplicity of the bounding box).
47+
48+
## Download data
49+
50+
First we import Ibis, its [deferred expression object](https://ibis-project.org/reference/expression-generic.html#ibis.expr.api.deferred) `_` ,
51+
and we use our default backend, DuckDB:
52+
```python
53+
import ibis
54+
from ibis import _
55+
56+
con = ibis.get_backend() # default duckdb backend
57+
```
58+
59+
With Ibis and DuckDB we can be more specific about the data we want thanks to the
60+
filter push down. For example, if we want to select only a few columns and
61+
look only at the power infrastructure when can do this as follow.
62+
63+
64+
```python
65+
# look into type infrastructure
66+
url = (
67+
"s3://overturemaps-us-west-2/release/2024-07-22.0/theme=base/type=infrastructure/*"
68+
)
69+
t = con.read_parquet(url, table_name="infra-usa")
70+
71+
# filter for USA bounding box, subtype="power", and selecting only few columns
72+
expr = t.filter(
73+
_.bbox.xmin > -125.0,
74+
_.bbox.ymin > 24.8,
75+
_.bbox.xmax < -65.8,
76+
_.bbox.ymax < 49.2,
77+
_.subtype == "power",
78+
).select(["names", "geometry", "bbox", "class", "sources", "source_tags"])
79+
```
80+
81+
::: {.callout-note}
82+
If you inspect expr, you can see that the filters and projections get pushed down,
83+
meaning you only download the data that you asked for.
84+
:::
85+
86+
```python
87+
con.to_parquet(expr, "power-infra-usa.geoparquet")
88+
```
89+
90+
Now that we have the data lets explore it in Ibis interactive mode and make some
91+
beautiful maps.
92+
93+
## Data exploration
94+
95+
To explore the data interactively we turn on the interactive mode:
96+
```python
97+
ibis.options.interactive = True
98+
```
99+
100+
```python
101+
usa_power_infra = con.read_parquet("power-infra-usa.geoparquet")
102+
usa_power_infra
103+
```
104+
105+
Let's quickly rename the `class` column, since this is a reserved word and causes
106+
conflicts when using the deferred operator:
107+
108+
```python
109+
usa_power_infra = usa_power_infra.rename(infra_class="class")
110+
```
111+
112+
We take a look at the different classes of infrastructure under the subtype power:
113+
114+
```python
115+
usa_power_infra.infra_class.value_counts().order_by(
116+
ibis.desc("infra_class_count")
117+
).preview(max_rows=15)
118+
```
119+
120+
Looks like we have `plant`, `power_line` and `minor_line` among others.
121+
122+
```python
123+
plants = usa_power_infra.filter(_.infra_class=="plant")
124+
power_lines = usa_power_infra.filter(_.infra_class=="power_line")
125+
minor_lines = usa_power_infra.filter(_.infra_class=="minor_line")
126+
```
127+
128+
129+
## Plotting with Lonboard
130+
131+
Lonboard is a Python plotting library optimized for efficient visualizations
132+
of large geospatial data. It integrates well with Ibis and DuckDB, making
133+
interactive plotting scalable.
134+
135+
::: {.callout-note}
136+
You can try this in your machine, for the purpose the blog file size, we will show
137+
screenshots of the visualization
138+
:::
139+
140+
```python
141+
import lonboard
142+
from lonboard.basemap import CartoBasemap # to choose color of basemap
143+
```
144+
145+
Let's visualize the `power plants`
146+
147+
```python
148+
lonboard.viz(
149+
plants,
150+
scatterplot_kwargs={"get_fill_color": "red"},
151+
polygon_kwargs={"get_fill_color": "red"},
152+
map_kwargs={
153+
"basemap_style": CartoBasemap.Positron,
154+
"view_state": {"longitude": -100, "latitude": 36, "zoom": 3},
155+
},
156+
)
157+
```
158+
159+
![Power plants in the USA](usa-power-plants.png)
160+
161+
If you are visualizing this in your machine, you can zoom in and see some of the
162+
geometry where the plants are located. As an example, we can plot in a small
163+
area of California:
164+
165+
```python
166+
plants_CA = plants.filter(
167+
_.bbox.xmin.between(-118.6, -117.9), _.bbox.ymin.between(34.5, 35.3)
168+
).select(_.names.primary, _.geometry)
169+
```
170+
171+
```python
172+
lonboard.viz(
173+
plants_CA,
174+
scatterplot_kwargs={"get_fill_color": "red"},
175+
polygon_kwargs={"get_fill_color": "red"},
176+
map_kwargs={
177+
"basemap_style": CartoBasemap.Positron,
178+
},
179+
)
180+
```
181+
182+
![Power plants near Lancaster, CA](ca-power-plants.png)
183+
184+
We can also visualize together the `power_lines` and the `minor_lines` by doing:
185+
186+
```python
187+
lonboard.viz([minor_lines, power_lines])
188+
```
189+
190+
![Minor and Power lines of USA](usa-power-and-minor-lines.png)
191+
192+
and that's how you can visualize ~7 million coordinates from the comfort of
193+
your laptop.
194+
195+
```python
196+
>>> power_lines.geometry.n_points().sum()
197+
5329836
198+
>>> minor_lines.geometry.n_points().sum()
199+
1430042
200+
```
201+
202+
With Ibis and DuckDB working with geospatial data has never been easier or faster.
203+
We saw how to query a dataset from Overture Maps with the simplicity of Python and
204+
the performance of DuckDB. Last but not least, we saw how simple and quick Lonboard
205+
got us from query-to-plot. Together, these libraries make exploring and handling
206+
geospatial data a breeze.
207+
208+
209+
## Resources
210+
- [Ibis Docs](https://ibis-project.org/)
211+
- [Lonboard Docs](https://developmentseed.org/lonboard/latest/)
212+
- [DuckDB spatial extension](https://duckdb.org/docs/extensions/spatial.html)
213+
- [DuckDB spatial functions docs](https://github.com/duckdb/duckdb_spatial/blob/main/docs/functions.md)
214+
215+
Chat with us on Zulip:
216+
217+
- [Ibis Zulip Chat](https://ibis-project.zulipchat.com/)
Loading
Loading

0 commit comments

Comments
 (0)