Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Replace test datasets with pyogrio-generated files where possible #441

Merged
merged 8 commits into from
Aug 30, 2024

Conversation

brendan-ward
Copy link
Member

Per #433, we were including test data files from other parties and may not have been in great compliance with their licenses, even though we were usually using a tiny extract of the source files.

See the updates to tests/fixtures/README.md first for updated guidance on managing test files, and steps used to create those still present as files.

Where we were using GeoJSON files, I migrated those directly into conftest.py so they are generated using code, which should make the licensing of those files more clear (i.e., part of Pyogrio source code so our MIT license is clear). I don't have a strong opinion about whether these should be files or in code, but putting them in code seemed reasonable; let me know if you disagree.

This also made it easier to create an invalid GeoJSON file for testing polygons with insufficient coordinates (replaces data file with problematic license).

Where we were using 3rd party FGDB datasets, I created new datasets to mimic how we were using those. I used QGIS to hand-digitize (super simple) LineString ZM, Polygon ZM, Curve, CurvePolygon, and MultiSurface datasets, since we were using one of the FGDB datasets to verify that they were correctly downgraded to supported geometry types.

@@ -196,61 +194,82 @@ def test_read_no_geometry_no_columns_no_fids(naturalearth_lowres, use_arrow):
)


def test_read_force_2d(test_fgdb_vsi, use_arrow):
with pytest.warns(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verifying the warning was incidental to this test, and already covered by test_core.py::test_list_layers.


@pytest.mark.filterwarnings("ignore: Measured")
@pytest.mark.filterwarnings("ignore: More than one layer found in")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually should verify that the multiple layers warning is raised.

@brendan-ward brendan-ward marked this pull request as ready for review July 4, 2024 02:19
@brendan-ward brendan-ward requested review from jorisvandenbossche and theroggy and removed request for jorisvandenbossche July 15, 2024 19:07
Copy link
Member

@theroggy theroggy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The geojson files in code look fine to me as they are all quite simple!

@brendan-ward brendan-ward added this to the 0.10.0 milestone Jul 25, 2024
Comment on lines -126 to -130
def test_fgdb_vsi():
return f"/vsizip/{_data_dir}/test_fgdb.gdb.zip"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter that we don't have a direct replacement for a zipped FGB?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters. We have other tests that use the /vsizip/ interface for working with a zipped shapefile, which should be a reasonable proxy for zip files containing other formats.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

One remark: I am not sure we still have coverage for reading an FGDB file? (now that it is removed as fixture file) Probably not that important, but I see we have a test_write_openfilegdb that only asserts the file exists on disk, maybe we can just read the resulting file as well to test the full roundtrip.

Where we were using GeoJSON files, I migrated those directly into conftest.py so they are generated using code, which should make the licensing of those files more clear (i.e., part of Pyogrio source code so our MIT license is clear). I don't have a strong opinion about whether these should be files or in code, but putting them in code seemed reasonable; let me know if you disagree.

Those files were generated manually, so their licensing was OK anyway I think. But also no strong opinion. IIRC I added several of them, and while developing / testing, I think it is a bit easier to have them as files (I typically created them with gdal/pyogrio or manually edited them afterwards) and sometimes tested/compared directly with ogrinfo. But at the end moving the text of the file into the conftest.py is of course trivial, so it's perfectly fine going that way.

@brendan-ward
Copy link
Member Author

Added tests to verify roundtrip using OpenFileGDB driver, and test of int64 dtype handling for GDAL >= 3.9 via dataset creation option.

@brendan-ward brendan-ward merged commit 412a441 into main Aug 30, 2024
20 checks passed
@brendan-ward brendan-ward deleted the cleanup_test_data_fixtures branch August 30, 2024 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants