TST: Replace test datasets with pyogrio-generated files where possible #441

brendan-ward · 2024-07-04T01:53:41Z

Per #433, we were including test data files from other parties and may not have been in great compliance with their licenses, even though we were usually using a tiny extract of the source files.

See the updates to tests/fixtures/README.md first for updated guidance on managing test files, and steps used to create those still present as files.

Where we were using GeoJSON files, I migrated those directly into conftest.py so they are generated using code, which should make the licensing of those files more clear (i.e., part of Pyogrio source code so our MIT license is clear). I don't have a strong opinion about whether these should be files or in code, but putting them in code seemed reasonable; let me know if you disagree.

This also made it easier to create an invalid GeoJSON file for testing polygons with insufficient coordinates (replaces data file with problematic license).

Where we were using 3rd party FGDB datasets, I created new datasets to mimic how we were using those. I used QGIS to hand-digitize (super simple) LineString ZM, Polygon ZM, Curve, CurvePolygon, and MultiSurface datasets, since we were using one of the FGDB datasets to verify that they were correctly downgraded to supported geometry types.

brendan-ward · 2024-07-04T02:14:34Z

pyogrio/tests/test_geopandas_io.py

@@ -196,61 +194,82 @@ def test_read_no_geometry_no_columns_no_fids(naturalearth_lowres, use_arrow):
        )


-def test_read_force_2d(test_fgdb_vsi, use_arrow):
-    with pytest.warns(


Verifying the warning was incidental to this test, and already covered by test_core.py::test_list_layers.

brendan-ward · 2024-07-04T02:15:32Z

pyogrio/tests/test_geopandas_io.py


-@pytest.mark.filterwarnings("ignore: Measured")
-@pytest.mark.filterwarnings("ignore: More than one layer found in")


We actually should verify that the multiple layers warning is raised.

theroggy

Looks good! The geojson files in code look fine to me as they are all quite simple!

martinfleis · 2024-08-21T07:34:28Z

pyogrio/tests/conftest.py

-def test_fgdb_vsi():
-    return f"/vsizip/{_data_dir}/test_fgdb.gdb.zip"


Does it matter that we don't have a direct replacement for a zipped FGB?

I don't think it matters. We have other tests that use the /vsizip/ interface for working with a zipped shapefile, which should be a reasonable proxy for zip files containing other formats.

jorisvandenbossche

This looks great!

One remark: I am not sure we still have coverage for reading an FGDB file? (now that it is removed as fixture file) Probably not that important, but I see we have a test_write_openfilegdb that only asserts the file exists on disk, maybe we can just read the resulting file as well to test the full roundtrip.

Where we were using GeoJSON files, I migrated those directly into conftest.py so they are generated using code, which should make the licensing of those files more clear (i.e., part of Pyogrio source code so our MIT license is clear). I don't have a strong opinion about whether these should be files or in code, but putting them in code seemed reasonable; let me know if you disagree.

Those files were generated manually, so their licensing was OK anyway I think. But also no strong opinion. IIRC I added several of them, and while developing / testing, I think it is a bit easier to have them as files (I typically created them with gdal/pyogrio or manually edited them afterwards) and sometimes tested/compared directly with ogrinfo. But at the end moving the text of the file into the conftest.py is of course trivial, so it's perfectly fine going that way.

brendan-ward · 2024-08-29T22:13:25Z

Added tests to verify roundtrip using OpenFileGDB driver, and test of int64 dtype handling for GDAL >= 3.9 via dataset creation option.

brendan-ward added 3 commits July 3, 2024 15:14

Replace test file for reading file with invalid poly ring

869b42a

Replace other test datasets where possible and update tests

13e2037

Further cleanup

59bbe41

brendan-ward commented Jul 4, 2024

View reviewed changes

brendan-ward marked this pull request as ready for review July 4, 2024 02:19

Merge branch 'main' into cleanup_test_data_fixtures

cea7d58

brendan-ward requested review from jorisvandenbossche and theroggy and removed request for jorisvandenbossche July 15, 2024 19:07

brendan-ward mentioned this pull request Jul 15, 2024

Data files license #433

Open

theroggy approved these changes Jul 17, 2024

View reviewed changes

brendan-ward added this to the 0.10.0 milestone Jul 25, 2024

brendan-ward requested review from jorisvandenbossche and martinfleis August 20, 2024 19:20

martinfleis reviewed Aug 21, 2024

View reviewed changes

jorisvandenbossche approved these changes Aug 21, 2024

View reviewed changes

brendan-ward added 4 commits August 29, 2024 11:48

Merge branch 'main' into cleanup_test_data_fixtures

a343f16

Add roundtrip tests for OpenFileGDB and add int64 test

d6394ce

No warning raised for int64 on GDAL <3.9.0

8c93975

Add missing annotation

945abfd

martinfleis approved these changes Aug 30, 2024

View reviewed changes

brendan-ward merged commit 412a441 into main Aug 30, 2024
20 checks passed

brendan-ward deleted the cleanup_test_data_fixtures branch August 30, 2024 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: Replace test datasets with pyogrio-generated files where possible #441

TST: Replace test datasets with pyogrio-generated files where possible #441

brendan-ward commented Jul 4, 2024

brendan-ward Jul 4, 2024

brendan-ward Jul 4, 2024

theroggy left a comment •

edited

Loading

martinfleis Aug 21, 2024

brendan-ward Aug 29, 2024

jorisvandenbossche left a comment

brendan-ward commented Aug 29, 2024


		@pytest.mark.filterwarnings("ignore: Measured")
		@pytest.mark.filterwarnings("ignore: More than one layer found in")

		def test_fgdb_vsi():
		return f"/vsizip/{_data_dir}/test_fgdb.gdb.zip"

TST: Replace test datasets with pyogrio-generated files where possible #441

TST: Replace test datasets with pyogrio-generated files where possible #441

Conversation

brendan-ward commented Jul 4, 2024

brendan-ward Jul 4, 2024

Choose a reason for hiding this comment

brendan-ward Jul 4, 2024

Choose a reason for hiding this comment

theroggy left a comment • edited Loading

Choose a reason for hiding this comment

martinfleis Aug 21, 2024

Choose a reason for hiding this comment

brendan-ward Aug 29, 2024

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

brendan-ward commented Aug 29, 2024

theroggy left a comment •

edited

Loading