|
| 1 | +--- |
| 2 | +title: "Ibis + DuckDB geospatial: a match made on Earth" |
| 3 | +author: Naty Clementi |
| 4 | +date: 2023-12-07 |
| 5 | +categories: |
| 6 | + - blog |
| 7 | + - duckdb |
| 8 | + - geospatial |
| 9 | +--- |
| 10 | + |
| 11 | +Ibis now has support for [DuckDB geospatial functions](https://gist.github.com/ncclementi/fbc5564af709e2d7f8882821e3a8649f)! |
| 12 | + |
| 13 | +This blogpost showcases some examples of the geospatial API for the DuckDB backend. The material is inspired by |
| 14 | +the ["DuckDB: Spatial Relationships"](https://geog-414.gishub.org/book/duckdb/07_spatial_relationships.html) lesson from |
| 15 | +[Dr. Qiusheng Wu](https://geog-414.gishub.org/book/preface/instructor.html)'s course "Spatial Data Management" from the |
| 16 | +Department of Geography & Sustainability at the University of Tennessee, Knoxville. |
| 17 | + |
| 18 | +::: {.callout-note} |
| 19 | +You can check Dr. Qiusheng Wu's full Spatial Data Management course material on its |
| 20 | +[website](https://geog-414.gishub.org/index.html), and the classes are also on |
| 21 | +[YouTube](https://www.youtube.com/watch?v=A4TOAdsXsEs&list=PLAxJ4-o7ZoPe9SkgnophygyLjTDBzIEbi). |
| 22 | +::: |
| 23 | + |
| 24 | +## Data |
| 25 | + |
| 26 | +We are going to be working with data from New York City. The database contains multiple tables with information about |
| 27 | +subway stations, streets, neighborhood, census data and, homicides. The datasets in the database are in NAD83 / UTM zone |
| 28 | +18N projection, EPSG:26918. |
| 29 | + |
| 30 | +```{python} |
| 31 | +from pathlib import Path |
| 32 | +from zipfile import ZipFile |
| 33 | +from urllib.request import urlretrieve |
| 34 | +
|
| 35 | +# Download and unzip |
| 36 | +url = "https://open.gishub.org/data/duckdb/nyc_data.db.zip" |
| 37 | +zip_path = Path("nyc_data.db.zip") |
| 38 | +db_path = Path("nyc_data.db") |
| 39 | +
|
| 40 | +if not zip_path.exists(): |
| 41 | + urlretrieve(url, zip_path) |
| 42 | +
|
| 43 | +if not db_path.exists(): |
| 44 | + with ZipFile(zip_path) as zip_file: |
| 45 | + zip_file.extract("nyc_data.db") |
| 46 | +``` |
| 47 | + |
| 48 | +## Let's get started |
| 49 | + |
| 50 | +The beauty of spatial databases is that they allow us to both store *and* compute over geometries. |
| 51 | + |
| 52 | +```{python} |
| 53 | +import ibis |
| 54 | +from ibis import _ |
| 55 | +
|
| 56 | +ibis.options.interactive = True |
| 57 | +
|
| 58 | +con = ibis.duckdb.connect("nyc_data.db") |
| 59 | +con.list_tables() |
| 60 | +``` |
| 61 | + |
| 62 | +We have multiple tables with information about New York City. Following Dr. Wu's class, we'll take a look at some |
| 63 | +spatial relations. |
| 64 | + |
| 65 | +We can start by taking a peak at the `nyc_subway_stations` table. |
| 66 | + |
| 67 | +```{python} |
| 68 | +subway_stations = con.table("nyc_subway_stations") |
| 69 | +subway_stations |
| 70 | +``` |
| 71 | + |
| 72 | +Notice that the last column has a `geometry` type, and in this case it contains points that represent the location of |
| 73 | +each subway station. Let's grab the entry for the Broad St subway station. |
| 74 | + |
| 75 | +```{python} |
| 76 | +broad_station = subway_stations.filter(subway_stations.NAME == "Broad St") |
| 77 | +broad_station |
| 78 | +``` |
| 79 | + |
| 80 | +### `geo_equals` (`ST_Equals`) |
| 81 | + |
| 82 | +In DuckDB `ST_Equals` returns `True` if two geometries are topologically equal. This means that they have the same |
| 83 | +dimension and identical coordinate values, although the order of the vertices may be different. |
| 84 | + |
| 85 | +The following is a bit redundant but we can check if our `"Broad St"` point matches only one point in our data using |
| 86 | +`geo_equals` |
| 87 | + |
| 88 | +```{python} |
| 89 | +subway_stations.filter(subway_stations.geom.geo_equals(broad_station.geom)) |
| 90 | +``` |
| 91 | + |
| 92 | +We can also write this query without using `broad_station` as a variable, and with the help of the deferred expressions |
| 93 | +API, also known as [the underscore API](../../how-to/analytics/chain_expressions.qmd). |
| 94 | + |
| 95 | +```{python} |
| 96 | +subway_stations.filter(_.geom.geo_equals(_.filter(_.NAME == "Broad St").geom)) |
| 97 | +``` |
| 98 | + |
| 99 | +### `intersect` (ST_Intersect) |
| 100 | + |
| 101 | +Let's locate the neighborhood of the "Broad Street" subway station using the |
| 102 | +geospatial `intersect` function. The `intersect` function returns `True` if two geometries have any points in common. |
| 103 | + |
| 104 | +```{python} |
| 105 | +boroughs = con.table("nyc_neighborhoods") |
| 106 | +boroughs |
| 107 | +``` |
| 108 | + |
| 109 | +```{python} |
| 110 | +boroughs.filter(boroughs.geom.intersects(broad_station.select(broad_station.geom).to_array())) |
| 111 | +``` |
| 112 | + |
| 113 | +### `d_within` (ST_DWithin) |
| 114 | + |
| 115 | +We can also find the streets near (say, within 10 meters) the Broad St subway station using the `d_within` |
| 116 | +function. The `d_within` function returns True if the geometries are within a given distance. |
| 117 | + |
| 118 | +```{python} |
| 119 | +streets = con.table("nyc_streets") |
| 120 | +streets |
| 121 | +``` |
| 122 | + |
| 123 | +Using the deferred API, we can check which streets are within `d=10` meters of distance. |
| 124 | + |
| 125 | +```{python} |
| 126 | +sts_near_broad = streets.filter(_.geom.d_within(broad_station.select(_.geom).to_array(), 10)) |
| 127 | +sts_near_broad |
| 128 | +``` |
| 129 | + |
| 130 | +::: {.callout-note} |
| 131 | +In the previous query, `streets` and `broad_station` are different tables. We use [`to_array()`](../../reference/expression-tables.qmd#ibis.expr.types.relations.Table.to_array) to generate a |
| 132 | +scalar subquery from a table with a single column (whose shape is scalar). |
| 133 | +::: |
| 134 | + |
| 135 | +To visualize the findings, we will convert the tables to Geopandas DataFrames. |
| 136 | + |
| 137 | +```{python} |
| 138 | +broad_station_gdf = broad_station.to_pandas() |
| 139 | +broad_station_gdf.crs = "EPSG:26918" |
| 140 | +
|
| 141 | +sts_near_broad_gdf = sts_near_broad.to_pandas() |
| 142 | +sts_near_broad_gdf.crs = "EPSG:26918" |
| 143 | +
|
| 144 | +streets_gdf = streets.to_pandas() |
| 145 | +streets_gdf.crs = "EPSG:26918" |
| 146 | +``` |
| 147 | + |
| 148 | +```{python} |
| 149 | +import leafmap.deckgl as leafmap # <1> |
| 150 | +``` |
| 151 | + |
| 152 | +1. `leafmap.deckgl` allows us to visualize multiple layers |
| 153 | + |
| 154 | +```{python} |
| 155 | +m = leafmap.Map() |
| 156 | +
|
| 157 | +m.add_vector(broad_station_gdf, get_fill_color="blue") |
| 158 | +m.add_vector(sts_near_broad_gdf, get_color="red", opacity=0.5) |
| 159 | +m.add_vector(streets_gdf, get_color="grey", zoom_to_layer=False, opacity=0.3) |
| 160 | +m |
| 161 | +``` |
| 162 | + |
| 163 | +You can zoom in and out, and hover over the map to check on the streets names. |
| 164 | + |
| 165 | +### `buffer` (ST_Buffer) |
| 166 | + |
| 167 | +Next, we'll take a look at the homicides table and showcase some |
| 168 | +additional functionality related to polygon handling. |
| 169 | + |
| 170 | +```{python} |
| 171 | +homicides = con.table("nyc_homicides") |
| 172 | +homicides |
| 173 | +``` |
| 174 | + |
| 175 | +Let's use the `buffer` method to find homicides near our `"Broad St"` station point. |
| 176 | + |
| 177 | +The `buffer` method computes a polygon or multipolygon that represents all points whose distance from a geometry is less |
| 178 | +than or equal to a given distance. |
| 179 | + |
| 180 | +```{python} |
| 181 | +broad_station.geom.buffer(200) |
| 182 | +``` |
| 183 | + |
| 184 | +We can check the area using the `area` (`ST_Area`) function, and see that is $~ \pi r^{2}=125664$ |
| 185 | + |
| 186 | +```{python} |
| 187 | +broad_station.geom.buffer(200).area() |
| 188 | +``` |
| 189 | + |
| 190 | +To find if there were any homicides in that area, we can find where the polygon resulting from adding the |
| 191 | +200 meters buffer to our "Broad St" station point intersects with the geometry column in our homicides table. |
| 192 | + |
| 193 | +```{python} |
| 194 | +h_near_broad = homicides.filter(_.geom.intersects(broad_station.select(_.geom.buffer(200)).to_array())) |
| 195 | +h_near_broad |
| 196 | +``` |
| 197 | + |
| 198 | +It looks like there was one homicide within 200 meters from the "Broad St" station, but from this |
| 199 | +data we can't tell the street near which it happened. However, we can check if the homicide point is within a small |
| 200 | +distance of a street. |
| 201 | + |
| 202 | +```{python} |
| 203 | +h_street = streets.filter(_.geom.d_within(h_near_broad.select(_.geom).to_array(), 2)) |
| 204 | +h_street |
| 205 | +``` |
| 206 | + |
| 207 | +Let's plot this: |
| 208 | + |
| 209 | +```{python} |
| 210 | +broad_station_zone = broad_station.mutate(geom=broad_station.geom.buffer(200)) |
| 211 | +broad_station_zone = broad_station_zone.to_pandas() |
| 212 | +broad_station_zone.crs = "EPSG:26918" |
| 213 | +
|
| 214 | +h_near_broad_gdf = h_near_broad.to_pandas() |
| 215 | +h_near_broad_gdf.crs = "EPSG:26918" |
| 216 | +
|
| 217 | +h_street_gdf = h_street.to_pandas() |
| 218 | +h_street_gdf.crs = "EPSG:26918" |
| 219 | +
|
| 220 | +
|
| 221 | +mh = leafmap.Map() |
| 222 | +mh.add_vector(broad_station_gdf, get_fill_color="orange") |
| 223 | +mh.add_vector(broad_station_zone, get_fill_color="orange", opacity=0.1) |
| 224 | +mh.add_vector(h_near_broad_gdf, get_fill_color="red", opacity=0.5) |
| 225 | +mh.add_vector(h_street_gdf, get_color="blue", opacity=0.3) |
| 226 | +mh.add_vector(streets_gdf, get_color="grey", zoom_to_layer=False, opacity=0.2) |
| 227 | +
|
| 228 | +mh |
| 229 | +``` |
| 230 | + |
| 231 | + |
| 232 | +## Functions supported and next steps |
| 233 | + |
| 234 | +At the moment in Ibis we have support for around thirty geospatial functions for the DuckDB and we will add some more |
| 235 | +(see list [here](https://gist.github.com/ncclementi/fbc5564af709e2d7f8882821e3a8649f)). |
| 236 | + |
| 237 | +We also support reading multiple geospatial formats via [`read_geo()`](../../backends/duckdb.qmd#ibis.backends.duckdb.Backend.read_geo). |
| 238 | + |
| 239 | +Here are some resources to learn more about Ibis: |
| 240 | + |
| 241 | +- [Ibis Docs](https://ibis-project.org/) |
| 242 | +- [Ibis GitHub](https://github.com/ibis-project/ibis) |
| 243 | + |
| 244 | +Chat with us on Zulip: |
| 245 | + |
| 246 | +- [Ibis Zulip Chat](https://ibis-project.zulipchat.com/) |
0 commit comments