Skip to content

Commit def8031

Browse files
ncclementigforsyth
authored andcommitted
docs(blog-post): add blogpost ibis duckdb geospatial
ci(deps): remove lonboard from min versions ci jobs ci(deps): remove lonboard from pyspark ci job
1 parent 18bdb7e commit def8031

File tree

5 files changed

+841
-8
lines changed

5 files changed

+841
-8
lines changed

.github/workflows/ibis-backends.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -485,6 +485,10 @@ jobs:
485485
- name: install poetry
486486
run: python -m pip install --upgrade pip 'poetry==1.7.1'
487487

488+
- name: remove lonboard
489+
# it requires a version of pandas that min versions are not compatible with
490+
run: poetry remove lonboard
491+
488492
- name: install minimum versions
489493
run: poetry add --lock --optional ${{ join(matrix.backend.deps, ' ') }}
490494

@@ -552,6 +556,10 @@ jobs:
552556
- name: install poetry
553557
run: python -m pip install --upgrade pip 'poetry==1.7.1'
554558

559+
- name: remove lonboard
560+
# it requires a version of pandas that pyspark is not compatible with
561+
run: poetry remove lonboard
562+
555563
- name: install maximum versions of pandas and numpy
556564
run: poetry add --lock 'pandas@<2' 'numpy<1.24'
557565

Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
---
2+
title: "Ibis + DuckDB geospatial: a match made on Earth"
3+
author: Naty Clementi
4+
date: 2023-12-07
5+
categories:
6+
- blog
7+
- duckdb
8+
- geospatial
9+
---
10+
11+
Ibis now has support for [DuckDB geospatial functions](https://gist.github.com/ncclementi/fbc5564af709e2d7f8882821e3a8649f)!
12+
13+
This blogpost showcases some examples of the geospatial API for the DuckDB backend. The material is inspired by
14+
the ["DuckDB: Spatial Relationships"](https://geog-414.gishub.org/book/duckdb/07_spatial_relationships.html) lesson from
15+
[Dr. Qiusheng Wu](https://geog-414.gishub.org/book/preface/instructor.html)'s course "Spatial Data Management" from the
16+
Department of Geography & Sustainability at the University of Tennessee, Knoxville.
17+
18+
::: {.callout-note}
19+
You can check Dr. Qiusheng Wu's full Spatial Data Management course material on its
20+
[website](https://geog-414.gishub.org/index.html), and the classes are also on
21+
[YouTube](https://www.youtube.com/watch?v=A4TOAdsXsEs&list=PLAxJ4-o7ZoPe9SkgnophygyLjTDBzIEbi).
22+
:::
23+
24+
## Data
25+
26+
We are going to be working with data from New York City. The database contains multiple tables with information about
27+
subway stations, streets, neighborhood, census data and, homicides. The datasets in the database are in NAD83 / UTM zone
28+
18N projection, EPSG:26918.
29+
30+
```{python}
31+
from pathlib import Path
32+
from zipfile import ZipFile
33+
from urllib.request import urlretrieve
34+
35+
# Download and unzip
36+
url = "https://open.gishub.org/data/duckdb/nyc_data.db.zip"
37+
zip_path = Path("nyc_data.db.zip")
38+
db_path = Path("nyc_data.db")
39+
40+
if not zip_path.exists():
41+
urlretrieve(url, zip_path)
42+
43+
if not db_path.exists():
44+
with ZipFile(zip_path) as zip_file:
45+
zip_file.extract("nyc_data.db")
46+
```
47+
48+
## Let's get started
49+
50+
The beauty of spatial databases is that they allow us to both store *and* compute over geometries.
51+
52+
```{python}
53+
import ibis
54+
from ibis import _
55+
56+
ibis.options.interactive = True
57+
58+
con = ibis.duckdb.connect("nyc_data.db")
59+
con.list_tables()
60+
```
61+
62+
We have multiple tables with information about New York City. Following Dr. Wu's class, we'll take a look at some
63+
spatial relations.
64+
65+
We can start by taking a peak at the `nyc_subway_stations` table.
66+
67+
```{python}
68+
subway_stations = con.table("nyc_subway_stations")
69+
subway_stations
70+
```
71+
72+
Notice that the last column has a `geometry` type, and in this case it contains points that represent the location of
73+
each subway station. Let's grab the entry for the Broad St subway station.
74+
75+
```{python}
76+
broad_station = subway_stations.filter(subway_stations.NAME == "Broad St")
77+
broad_station
78+
```
79+
80+
### `geo_equals` (`ST_Equals`)
81+
82+
In DuckDB `ST_Equals` returns `True` if two geometries are topologically equal. This means that they have the same
83+
dimension and identical coordinate values, although the order of the vertices may be different.
84+
85+
The following is a bit redundant but we can check if our `"Broad St"` point matches only one point in our data using
86+
`geo_equals`
87+
88+
```{python}
89+
subway_stations.filter(subway_stations.geom.geo_equals(broad_station.geom))
90+
```
91+
92+
We can also write this query without using `broad_station` as a variable, and with the help of the deferred expressions
93+
API, also known as [the underscore API](../../how-to/analytics/chain_expressions.qmd).
94+
95+
```{python}
96+
subway_stations.filter(_.geom.geo_equals(_.filter(_.NAME == "Broad St").geom))
97+
```
98+
99+
### `intersect` (ST_Intersect)
100+
101+
Let's locate the neighborhood of the "Broad Street" subway station using the
102+
geospatial `intersect` function. The `intersect` function returns `True` if two geometries have any points in common.
103+
104+
```{python}
105+
boroughs = con.table("nyc_neighborhoods")
106+
boroughs
107+
```
108+
109+
```{python}
110+
boroughs.filter(boroughs.geom.intersects(broad_station.select(broad_station.geom).to_array()))
111+
```
112+
113+
### `d_within` (ST_DWithin)
114+
115+
We can also find the streets near (say, within 10 meters) the Broad St subway station using the `d_within`
116+
function. The `d_within` function returns True if the geometries are within a given distance.
117+
118+
```{python}
119+
streets = con.table("nyc_streets")
120+
streets
121+
```
122+
123+
Using the deferred API, we can check which streets are within `d=10` meters of distance.
124+
125+
```{python}
126+
sts_near_broad = streets.filter(_.geom.d_within(broad_station.select(_.geom).to_array(), 10))
127+
sts_near_broad
128+
```
129+
130+
::: {.callout-note}
131+
In the previous query, `streets` and `broad_station` are different tables. We use [`to_array()`](../../reference/expression-tables.qmd#ibis.expr.types.relations.Table.to_array) to generate a
132+
scalar subquery from a table with a single column (whose shape is scalar).
133+
:::
134+
135+
To visualize the findings, we will convert the tables to Geopandas DataFrames.
136+
137+
```{python}
138+
broad_station_gdf = broad_station.to_pandas()
139+
broad_station_gdf.crs = "EPSG:26918"
140+
141+
sts_near_broad_gdf = sts_near_broad.to_pandas()
142+
sts_near_broad_gdf.crs = "EPSG:26918"
143+
144+
streets_gdf = streets.to_pandas()
145+
streets_gdf.crs = "EPSG:26918"
146+
```
147+
148+
```{python}
149+
import leafmap.deckgl as leafmap # <1>
150+
```
151+
152+
1. `leafmap.deckgl` allows us to visualize multiple layers
153+
154+
```{python}
155+
m = leafmap.Map()
156+
157+
m.add_vector(broad_station_gdf, get_fill_color="blue")
158+
m.add_vector(sts_near_broad_gdf, get_color="red", opacity=0.5)
159+
m.add_vector(streets_gdf, get_color="grey", zoom_to_layer=False, opacity=0.3)
160+
m
161+
```
162+
163+
You can zoom in and out, and hover over the map to check on the streets names.
164+
165+
### `buffer` (ST_Buffer)
166+
167+
Next, we'll take a look at the homicides table and showcase some
168+
additional functionality related to polygon handling.
169+
170+
```{python}
171+
homicides = con.table("nyc_homicides")
172+
homicides
173+
```
174+
175+
Let's use the `buffer` method to find homicides near our `"Broad St"` station point.
176+
177+
The `buffer` method computes a polygon or multipolygon that represents all points whose distance from a geometry is less
178+
than or equal to a given distance.
179+
180+
```{python}
181+
broad_station.geom.buffer(200)
182+
```
183+
184+
We can check the area using the `area` (`ST_Area`) function, and see that is $~ \pi r^{2}=125664$
185+
186+
```{python}
187+
broad_station.geom.buffer(200).area()
188+
```
189+
190+
To find if there were any homicides in that area, we can find where the polygon resulting from adding the
191+
200 meters buffer to our "Broad St" station point intersects with the geometry column in our homicides table.
192+
193+
```{python}
194+
h_near_broad = homicides.filter(_.geom.intersects(broad_station.select(_.geom.buffer(200)).to_array()))
195+
h_near_broad
196+
```
197+
198+
It looks like there was one homicide within 200 meters from the "Broad St" station, but from this
199+
data we can't tell the street near which it happened. However, we can check if the homicide point is within a small
200+
distance of a street.
201+
202+
```{python}
203+
h_street = streets.filter(_.geom.d_within(h_near_broad.select(_.geom).to_array(), 2))
204+
h_street
205+
```
206+
207+
Let's plot this:
208+
209+
```{python}
210+
broad_station_zone = broad_station.mutate(geom=broad_station.geom.buffer(200))
211+
broad_station_zone = broad_station_zone.to_pandas()
212+
broad_station_zone.crs = "EPSG:26918"
213+
214+
h_near_broad_gdf = h_near_broad.to_pandas()
215+
h_near_broad_gdf.crs = "EPSG:26918"
216+
217+
h_street_gdf = h_street.to_pandas()
218+
h_street_gdf.crs = "EPSG:26918"
219+
220+
221+
mh = leafmap.Map()
222+
mh.add_vector(broad_station_gdf, get_fill_color="orange")
223+
mh.add_vector(broad_station_zone, get_fill_color="orange", opacity=0.1)
224+
mh.add_vector(h_near_broad_gdf, get_fill_color="red", opacity=0.5)
225+
mh.add_vector(h_street_gdf, get_color="blue", opacity=0.3)
226+
mh.add_vector(streets_gdf, get_color="grey", zoom_to_layer=False, opacity=0.2)
227+
228+
mh
229+
```
230+
231+
232+
## Functions supported and next steps
233+
234+
At the moment in Ibis we have support for around thirty geospatial functions for the DuckDB and we will add some more
235+
(see list [here](https://gist.github.com/ncclementi/fbc5564af709e2d7f8882821e3a8649f)).
236+
237+
We also support reading multiple geospatial formats via [`read_geo()`](../../backends/duckdb.qmd#ibis.backends.duckdb.Backend.read_geo).
238+
239+
Here are some resources to learn more about Ibis:
240+
241+
- [Ibis Docs](https://ibis-project.org/)
242+
- [Ibis GitHub](https://github.com/ibis-project/ibis)
243+
244+
Chat with us on Zulip:
245+
246+
- [Ibis Zulip Chat](https://ibis-project.zulipchat.com/)

0 commit comments

Comments
 (0)