Skip to content

Commit 474afce

Browse files
Merge pull request #575 from databrickslabs/pre-0.4.3-release
Pre 0.4.3 release
2 parents 3a99cac + 851e69d commit 474afce

File tree

162 files changed

+5538
-1917
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

162 files changed

+5538
-1917
lines changed

.github/actions/python_build/action.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,11 @@ runs:
1616
pip install build wheel pyspark==${{ matrix.spark }} numpy==${{ matrix.numpy }}
1717
pip install --no-build-isolation --no-cache-dir --force-reinstall gdal==${{ matrix.gdal }}
1818
pip install .
19+
- name: Give Python interpreter write access to checkpointing / raster write location
20+
shell: bash
21+
run: |
22+
sudo mkdir -p /mnt/mosaic_tmp
23+
sudo chmod -R 777 /mnt/mosaic_tmp
1924
- name: Test and build python package
2025
shell: bash
2126
run: |

.github/workflows/build_main.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ on:
1111
- "**"
1212
jobs:
1313
build:
14-
runs-on: ubuntu-22.04
14+
runs-on: larger
1515
env:
1616
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
1717
strategy:

.github/workflows/build_python.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ on:
77

88
jobs:
99
build:
10-
runs-on: ubuntu-22.04
10+
runs-on: larger
1111
env:
1212
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
1313
strategy:

.github/workflows/build_r.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ on:
88

99
jobs:
1010
build:
11-
runs-on: ubuntu-22.04
11+
runs-on: larger
1212
env:
1313
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
1414
strategy:

.github/workflows/build_scala.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ on:
66

77
jobs:
88
build:
9-
runs-on: ubuntu-22.04
9+
runs-on: larger
1010
env:
1111
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
1212
strategy:

CHANGELOG.md

Lines changed: 51 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,57 @@
11
## v0.4.3 [DBR 13.3 LTS]
2+
3+
This is the final mainline release of Mosaic. Future development will be focused on the planned spatial-utils library, which will be a successor to Mosaic and will include new features and improvements. The first release of spatial-utils is expected in the coming months.
4+
5+
We will continue to maintain Mosaic for the foreseeable future, including bug fixes and security updates. However, we recommend that users start transitioning to spatial-utils as soon as possible to take advantage of the new features and improvements that will be available in that library.
6+
7+
This release includes a number of enhancements and fixes, detailed below.
8+
9+
### Raster checkpointing is enabled by default
10+
Fuse-based checkpointing for raster operations is now enabled by default and managed through:
11+
- spark configs `spark.databricks.labs.mosaic.raster.use.checkpoint` and `spark.databricks.labs.mosaic.raster.checkpoint`.
12+
- python: `mos.enable_gdal(spark, with_checkpoint_path=path)`.
13+
- scala: `MosaicGDAL.enableGDALWithCheckpoint(spark, path)`.
14+
15+
This feature is designed to improve performance and reduce memory usage for raster operations by writing intermediate data to a fuse directory. This is particularly useful for large rasters or when working with many rasters in a single operation.
16+
17+
We plan further enhancements to this feature (including automatic cleanup of checkpoint locations) as part of the first release of spatial-utils.
18+
19+
### Enhancements and fixes to the raster processing APIs
20+
- Added `RST_Write`, a function that permits writing each raster 'tile' in a DataFrame to a specified location (e.g. fuse directory) using the appropriate GDAL driver and tile data / path. This is useful for formalizing the path when writing a Lakehouse table and allows removal of interim checkpointed data.
21+
- Python bindings added for `RST_Avg`, `RST_Max`, `RST_Median`, `RST_Min`, and `RST_PixelCount`.
22+
- `RST_PixelCount` now supports optional 'countNoData' and 'countMask' parameters (defaults are false, can now be true) to optionally get full pixel counts where mask is 0.0 and noData is what is configured in the tile.
23+
- `RST_Clip` now exposes the GDAL Warp option `CUTLINE_ALL_TOUCHED` which determines whether or not any given pixel is included whether the clipping geometry crosses the centre point of the pixel (false) or any part of the pixel (true). The default is true but this is now configurable.
24+
- Within clipping operations such as `RST_Clip` we now correctly set the CRS in the generated Shapefile Feature Layer used for clipping. This means that the CRS of the input geometry will be respected when clipping rasters.
25+
- Added two new functions for getting and upcasting the datatype of a raster band: `RST_Type` and `RST_UpdateType`. Use these for ensuring that the datatype of a raster is appropriate for the operations being performed, e.g. upcasting the types of integer-typed input rasters before performing raster algebra like NDVI calculations where the result needs to be a float.
26+
- The logic underpinning `RST_MemSize` (and related operations) has been updated to fall back to estimating based on the raster dimensions and data types of each band if the raster is held in-memory.
27+
- `RST_To_Overlapping_Tiles` is renamed `RST_ToOverlappingTiles`. The original expression remains but is marked as deprecated.
28+
- `RST_WorldToRasterCoordY` now returns the correct `y` value (was returning `x`)
29+
- Docs added for expression `RST_SetSRID`.
30+
- Docs updated for `RST_FromContent` to capture the optional 'driver' parameter.
31+
32+
### Dependency management
33+
Updates to and pinning of Python language and dependency versions:
234
- Pyspark requirement removed from python setup.cfg as it is supplied by DBR
335
- Python version limited to "<3.11,>=3.10" for DBR
4-
- iPython dependency limited to "<8.11,>=7.4.2" for both DBR and keplergl-jupyter
5-
- Expanded support for fuse-based checkpointing (persisted raster storage), managed through:
6-
- spark config 'spark.databricks.labs.mosaic.raster.use.checkpoint' in addition to 'spark.databricks.labs.mosaic.raster.checkpoint'.
7-
- python: `mos.enable_gdal(spark, with_checkpoint_path=path)`.
8-
- scala: `MosaicGDAL.enableGDALWithCheckpoint(spark, path)`.
36+
- iPython dependency limited to "<8.11,>=7.4.2" for both DBR and keplergl-jupyter
37+
- numpy now limited to "<2.0,>=1.21.5" to match DBR minimum
38+
39+
### Surface mesh APIs
40+
A set of experimental APIs for for creating and working with surface meshes (i.e. triangulated irregular networks) have been added to Mosaic. Users can now generate a conforming Delaunay triangulation over point data (optionally including 'break' lines as hard constraints), interpolate elevation over a regular grid and rasterize the results to produce terrain models.
41+
- `ST_Triangulate` performs a conforming Delaunay triangulation using a set of mass points and break lines.
42+
- `ST_InterpolateElevation` computes the interpolated elevations of a grid of points.
43+
- `RST_DTMFromGeoms` burns the interpolated elevations into a raster.
44+
45+
### British National Grid
46+
Two fixes have been made to the British National Grid indexing system:
47+
- Corrected a typo in the grid letter array used to perform lookups.
48+
- Updated the logic used for identifying quadrants when these are specified in a grid reference
49+
50+
### Documentation
51+
A few updates to our documentation and examples library:
52+
- An example walkthrough has been added for arbitrary GDAL Warp and Transform operations using a pyspark UDF (see the section "API Documentation / Rasterio + GDAL UDFs")
53+
- The Python "Quickstart Notebook" has been updated to use the `MosaicAnalyzer` class (added after `MosaicFrame` was deprecated)
54+
955

1056
## v0.4.2 [DBR 13.3 LTS]
1157
- Geopandas now fixed to "<0.14.4,>=0.14" due to conflict with minimum numpy version in geopandas 0.14.4.

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ The python bindings can be tested using [unittest](https://docs.python.org/3/lib
102102
- Move to the `python/` directory and install the project and its dependencies:
103103
`pip install . && pip install pyspark==<project_spark_version>`
104104
(where 'project_spark_version' corresponds to the version of Spark
105-
used for the target Databricks Runtime, e.g. `3.2.1`.
105+
used for the target Databricks Runtime, e.g. `3.4.1` for DBR 13.3 LTS.
106106
- Run the tests using `unittest`: `python -m unittest`
107107

108108
The project wheel file can be built with [build](https://pypa-build.readthedocs.io/en/stable/).

R/.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
11
**/.Rhistory
22
**/*.tar.gz
3+
**/*.Rproj
34
/sparklyr-mosaic/metastore_db/
5+
/sparklyr-mosaic/mosaic_checkpoint/
6+
/sparklyr-mosaic/mosaic_tmp/
7+
/sparkr-mosaic/metastore_db/
8+
/sparkr-mosaic/mosaic_checkpoint/
9+
/sparkr-mosaic/mosaic_tmp/

R/sparkR-mosaic/SparkR.Rproj

Lines changed: 0 additions & 13 deletions
This file was deleted.

R/sparkR-mosaic/sparkrMosaic/DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Description: This package extends SparkR to bring the Databricks Mosaic for geos
88
License: Databricks
99
Encoding: UTF-8
1010
Roxygen: list(markdown = TRUE)
11-
RoxygenNote: 7.3.1
11+
RoxygenNote: 7.3.2
1212
Collate:
1313
'enableGDAL.R'
1414
'enableMosaic.R'

0 commit comments

Comments
 (0)