|
| 1 | +--- |
| 2 | +title: "From query to plot: Exploring GeoParquet Overture Maps with Ibis, DuckDB, and Lonboard" |
| 3 | +author: Naty Clementi and Kyle Barron |
| 4 | +date: 2024-09-25 |
| 5 | +categories: |
| 6 | + - blog |
| 7 | + - duckdb |
| 8 | + - overturemaps |
| 9 | + - lonboard |
| 10 | + - geospatial |
| 11 | +execute: |
| 12 | + freeze: false |
| 13 | +--- |
| 14 | + |
| 15 | +With the release of `DuckDB 1.1.1`, now we have support for reading GeoParquet |
| 16 | +files! With this exciting update we can query rich datasets from Overture Maps |
| 17 | +using python via Ibis with the performance of `DuckDB`. |
| 18 | + |
| 19 | +But the good news doesn't stop there, since `Ibis 9.2`, `lonboard` can plot data |
| 20 | +directly from an `Ibis` table, adding more simplicity and speed to your |
| 21 | +geospatial analysis. |
| 22 | + |
| 23 | +Let’s dive into how these tools come together. |
| 24 | + |
| 25 | +## Installation |
| 26 | + |
| 27 | +First make sure you have `duckdb>=1.1.1`, then install Ibis with the dependencies |
| 28 | +needed to work with geospatial data using DuckDB. |
| 29 | + |
| 30 | +```bash |
| 31 | +$ pip install 'duckdb>=1.1.1' |
| 32 | +$ pip install 'ibis-framework[duckdb,geospatial]' lonboard |
| 33 | +``` |
| 34 | + |
| 35 | +## Motivation |
| 36 | + |
| 37 | +Overture Maps is an open-source initiative that provides high-quality, |
| 38 | +interoperable map data by integrating contributions from leading companies and |
| 39 | +open data sources to support a wide range of applications. |
| 40 | + |
| 41 | +Overture Maps offers a variety of datasets to query. For example, there is plenty |
| 42 | +of information about power infrastructure. |
| 43 | + |
| 44 | +Let's create some plots of the U.S. power infrastructure. We'll look into power |
| 45 | +plants and power lines for the lower 48 states (excluding Hawaii and Alaska for |
| 46 | +simplicity of the bounding box). |
| 47 | + |
| 48 | +## Download data |
| 49 | + |
| 50 | +First we import Ibis, its [deferred expression object](https://ibis-project.org/reference/expression-generic.html#ibis.expr.api.deferred) `_` , |
| 51 | +and we use our default backend, DuckDB: |
| 52 | +```python |
| 53 | +import ibis |
| 54 | +from ibis import _ |
| 55 | + |
| 56 | +con = ibis.get_backend() # default duckdb backend |
| 57 | +``` |
| 58 | + |
| 59 | +With Ibis and DuckDB we can be more specific about the data we want thanks to the |
| 60 | +filter push down. For example, if we want to select only a few columns and |
| 61 | +look only at the power infrastructure when can do this as follow. |
| 62 | + |
| 63 | + |
| 64 | +```python |
| 65 | +# look into type infrastructure |
| 66 | +url = ( |
| 67 | + "s3://overturemaps-us-west-2/release/2024-07-22.0/theme=base/type=infrastructure/*" |
| 68 | +) |
| 69 | +t = con.read_parquet(url, table_name="infra-usa") |
| 70 | + |
| 71 | +# filter for USA bounding box, subtype="power", and selecting only few columns |
| 72 | +expr = t.filter( |
| 73 | + _.bbox.xmin > -125.0, |
| 74 | + _.bbox.ymin > 24.8, |
| 75 | + _.bbox.xmax < -65.8, |
| 76 | + _.bbox.ymax < 49.2, |
| 77 | + _.subtype == "power", |
| 78 | +).select(["names", "geometry", "bbox", "class", "sources", "source_tags"]) |
| 79 | +``` |
| 80 | + |
| 81 | +::: {.callout-note} |
| 82 | +If you inspect expr, you can see that the filters and projections get pushed down, |
| 83 | +meaning you only download the data that you asked for. |
| 84 | +::: |
| 85 | + |
| 86 | +```python |
| 87 | +con.to_parquet(expr, "power-infra-usa.geoparquet") |
| 88 | +``` |
| 89 | + |
| 90 | +Now that we have the data lets explore it in Ibis interactive mode and make some |
| 91 | +beautiful maps. |
| 92 | + |
| 93 | +## Data exploration |
| 94 | + |
| 95 | +To explore the data interactively we turn on the interactive mode: |
| 96 | +```python |
| 97 | +ibis.options.interactive = True |
| 98 | +``` |
| 99 | + |
| 100 | +```python |
| 101 | +usa_power_infra = con.read_parquet("power-infra-usa.geoparquet") |
| 102 | +usa_power_infra |
| 103 | +``` |
| 104 | + |
| 105 | +Let's quickly rename the `class` column, since this is a reserved word and causes |
| 106 | +conflicts when using the deferred operator: |
| 107 | + |
| 108 | +```python |
| 109 | +usa_power_infra = usa_power_infra.rename(infra_class="class") |
| 110 | +``` |
| 111 | + |
| 112 | +We take a look at the different classes of infrastructure under the subtype power: |
| 113 | + |
| 114 | +```python |
| 115 | +usa_power_infra.infra_class.value_counts().order_by( |
| 116 | + ibis.desc("infra_class_count") |
| 117 | +).preview(max_rows=15) |
| 118 | +``` |
| 119 | + |
| 120 | +Looks like we have `plant`, `power_line` and `minor_line` among others. |
| 121 | + |
| 122 | +```python |
| 123 | +plants = usa_power_infra.filter(_.infra_class=="plant") |
| 124 | +power_lines = usa_power_infra.filter(_.infra_class=="power_line") |
| 125 | +minor_lines = usa_power_infra.filter(_.infra_class=="minor_line") |
| 126 | +``` |
| 127 | + |
| 128 | + |
| 129 | +## Plotting with Lonboard |
| 130 | + |
| 131 | +Lonboard is a Python plotting library optimized for efficient visualizations |
| 132 | +of large geospatial data. It integrates well with Ibis and DuckDB, making |
| 133 | +interactive plotting scalable. |
| 134 | + |
| 135 | +::: {.callout-note} |
| 136 | +You can try this in your machine, for the purpose the blog file size, we will show |
| 137 | +screenshots of the visualization |
| 138 | +::: |
| 139 | + |
| 140 | +```python |
| 141 | +import lonboard |
| 142 | +from lonboard.basemap import CartoBasemap # to choose color of basemap |
| 143 | +``` |
| 144 | + |
| 145 | +Let's visualize the `power plants` |
| 146 | + |
| 147 | +```python |
| 148 | +lonboard.viz( |
| 149 | + plants, |
| 150 | + scatterplot_kwargs={"get_fill_color": "red"}, |
| 151 | + polygon_kwargs={"get_fill_color": "red"}, |
| 152 | + map_kwargs={ |
| 153 | + "basemap_style": CartoBasemap.Positron, |
| 154 | + "view_state": {"longitude": -100, "latitude": 36, "zoom": 3}, |
| 155 | + }, |
| 156 | +) |
| 157 | +``` |
| 158 | + |
| 159 | + |
| 160 | + |
| 161 | +If you are visualizing this in your machine, you can zoom in and see some of the |
| 162 | +geometry where the plants are located. As an example, we can plot in a small |
| 163 | +area of California: |
| 164 | + |
| 165 | +```python |
| 166 | +plants_CA = plants.filter( |
| 167 | + _.bbox.xmin.between(-118.6, -117.9), _.bbox.ymin.between(34.5, 35.3) |
| 168 | +).select(_.names.primary, _.geometry) |
| 169 | +``` |
| 170 | + |
| 171 | +```python |
| 172 | +lonboard.viz( |
| 173 | + plants_CA, |
| 174 | + scatterplot_kwargs={"get_fill_color": "red"}, |
| 175 | + polygon_kwargs={"get_fill_color": "red"}, |
| 176 | + map_kwargs={ |
| 177 | + "basemap_style": CartoBasemap.Positron, |
| 178 | + }, |
| 179 | +) |
| 180 | +``` |
| 181 | + |
| 182 | + |
| 183 | + |
| 184 | +We can also visualize together the `power_lines` and the `minor_lines` by doing: |
| 185 | + |
| 186 | +```python |
| 187 | +lonboard.viz([minor_lines, power_lines]) |
| 188 | +``` |
| 189 | + |
| 190 | + |
| 191 | + |
| 192 | +and that's how you can visualize ~7 million coordinates from the comfort of |
| 193 | +your laptop. |
| 194 | + |
| 195 | +```python |
| 196 | +>>> power_lines.geometry.n_points().sum() |
| 197 | +5329836 |
| 198 | +>>> minor_lines.geometry.n_points().sum() |
| 199 | +1430042 |
| 200 | +``` |
| 201 | + |
| 202 | +With Ibis and DuckDB working with geospatial data has never been easier or faster. |
| 203 | +We saw how to query a dataset from Overture Maps with the simplicity of Python and |
| 204 | +the performance of DuckDB. Last but not least, we saw how simple and quick Lonboard |
| 205 | +got us from query-to-plot. Together, these libraries make exploring and handling |
| 206 | +geospatial data a breeze. |
| 207 | + |
| 208 | + |
| 209 | +## Resources |
| 210 | +- [Ibis Docs](https://ibis-project.org/) |
| 211 | +- [Lonboard Docs](https://developmentseed.org/lonboard/latest/) |
| 212 | +- [DuckDB spatial extension](https://duckdb.org/docs/extensions/spatial.html) |
| 213 | +- [DuckDB spatial functions docs](https://github.com/duckdb/duckdb_spatial/blob/main/docs/functions.md) |
| 214 | + |
| 215 | +Chat with us on Zulip: |
| 216 | + |
| 217 | +- [Ibis Zulip Chat](https://ibis-project.zulipchat.com/) |
0 commit comments