-
Notifications
You must be signed in to change notification settings - Fork 65
GeoJSON to/from custom struct using serde #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I like this a lot! The only possible feature I can think of at the moment is the ability to have a concrete geo type (eg a polygon) instead of the Geometry enum, which would save some conversions if you new what the type was ahead of time (I could see that being tricky to implement). I would have definitely liked something like this when I was doing geospatial work last summer. |
That's a great idea. I think I can add that pretty easily actually. Thanks for looking! |
522fd3f
to
3b9a986
Compare
I've pushed up another commit that does just this. |
I like the idea a lot. Do you have in mind to add serialization capabilities as well? I was wondering how a roundtrip would look like in regards to foreign members. Could they be preserved with the current approach to potentially achieve a lossless roundtrip? |
Thanks for looking @b4l!
Not with the current approach. An example of the current behavior to highlight this:
The approach in this PR flattens
I can see why this behavior might not work for some use cases. The use case driving this PR is to have processing code that is agnostic (or requires only minimal changes) between formats. e.g. I'd like to have code that can read it's input from either GeoJSON or CSV (and some day maybe other formats as well) while only changing a couple lines of code. Having multiple sources of inputs requires a sort of "lowest common denominator" of what can be accounted for, and foreign members fall outside of that. If you did want to keep all the structural information, like foreign members, you can still do that same as always. You have to deserialize to a It's also possible that if we threw enough proc_macro at the problem that we could come up with something more powerful, but I don't plan to work on that just yet. |
Oh sorry I forgot to answer this in my original reply. If this seems like a good overall direction, I'm happy to add serialization. |
Thanks for the explanations @michaelkirk ! My two potential use cases I have in mind are enforcing a certain schema with a concrete type/struct and support for https://github.com/opengeospatial/ogc-feat-geo-json. This PR gives us the ability to enforce a concrete geometry type and a certain properties struct. I don't really know how important my use cases are, just wanted to mention them to raise awareness. |
3b9a986
to
34aca01
Compare
This feature would be brilliant! |
200: Flexible field ordering for FeatureIterator r=frewsxcv a=michaelkirk - [x] I agree to follow the project's [code of conduct](https://github.com/georust/geo/blob/master/CODE_OF_CONDUCT.md). - [x] I added an entry to `CHANGES.md` if knowledge of this change could be valuable to users. --- I encountered this while working on #199, but I thought it made sense to extract the fix and address it separately. Previously, `FeatureIterator` could read feature collections written like this: `{ "type": "FeatureCollection", "features": [ ... ] }` But would error when encountering a feature collection written like this: `{"features": [ ... ], "type": "FeatureCollection" }` Now we handle both orderings. Co-authored-by: Michael Kirk <[email protected]>
This is indeed a fantastic idea: really flexible and ergonomic conversion to/from geojson! Thanks @michaelkirk |
228a61b
to
489be8a
Compare
489be8a
to
daddbd4
Compare
Thanks for the feedback everybody - I think this is ready for review!
I've gone and done this and made some other substantial changes, error handling, and docs. It's a big diff, but a lot of it is tests and docs. I've updated the description with some caveats. Please review them and let me know what you think. Especially interesting to me would be if you're willing to integrate this into an actual use case. I'm curious to see what kind of rough edges surface. I'm happy to help with the integration if you want to ping me. Also, I'd super appreciate if anyone has time to build and review the new docs. |
Unfortunately, it's probably not much of a performance improvement at this point. In fact, it might be worse in some ways. For one, there's this really unfortunate ...where, in order to "move" a structs fields into the output's "properties" field, we roundtrip each of your structs to a We do this round trip with only one feature at a time, not the entire feature collection at once, so its memory impact should be limited, but still, it's pretty gross. With the time I had, I wasn't able to come up with a better solution. But the bigger problem might be that (currently anyway) serialization requires a |
The conversion to
converts the whole feature collection into a much larger representation before writing. Typically the user is already storing the representation in memory, and the conversion takes a lot (~10x) more memory. This PR nicely side-steps this issue, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, and the docs in ser
and de
are particularly helpful and clear, which leads me to my only question: should we call out this functionality early on in the "main" docs?
|
||
use std::io::Read; | ||
|
||
pub struct FeatureReader<R> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise their description sounds too similar. I think for people who want it, ultimately it'd be best if FeatureIterator were hidden opaquely behind FeatureReader::features(). This is similar to `CSVReader::string_records()` - which deserializes to the libraries lowest-common-denominator built in type (geojson::Feature in our case) And then we have `FeatureReader::deserialize` which, just like `CSVReader::deserialize`, allows the user to deserialize directly to their custom struct.
I've done this in 493463e And then I realized that the short doc string was basically indistinguishable from The thought is that we wouldn't delete it, but eventually make it private once the deprecation cycle has run its course. So to recap the new suggested usage:
And then:
This is somewhat inspired by the approach taken by WDYT? |
Done in 413b959 |
Yep, this seems clear and sensible to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
src/ser.rs
Outdated
let mut json_object: JsonObject = { | ||
// PERF: this is an extra round-trip just to juggle some fields around. | ||
// How can we skip this? | ||
let bytes = serde_json::to_vec(self.feature) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serde-json provides a to_value
method that we can use (and unwrap to the map enum variant). That would avoid the round-trip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had no idea - this is just what I needed. Thanks @rmanoka!
With this 35% improvement, we're now a bit faster (22%) than the legacy bench.
serializing the old way
serialize geojson::FeatureCollection struct (countries.geojson)
time: [3.1352 ms 3.1433 ms 3.1520 ms]
change: [-0.3321% -0.0199% +0.3402%] (p = 0.91 > 0.05)
serializing the new way
serialize custom struct (countries.geojson)
time: [2.4598 ms 2.4647 ms 2.4697 ms]
change: [-35.490% -35.313% -35.150%] (p = 0.00 < 0.05)
Performance has improved.
With the 35% improvement, we're now a bit faster (22%) than the legacy bench. Running benches/serialize.rs (target/release/deps/serialize-0873325e9a7982ee) serialize geojson::FeatureCollection struct (countries.geojson) time: [3.1352 ms 3.1433 ms 3.1520 ms] change: [-0.3321% -0.0199% +0.3402%] (p = 0.91 > 0.05) No change in performance detected. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild serialize custom struct (countries.geojson) time: [2.4598 ms 2.4647 ms 2.4697 ms] change: [-35.490% -35.313% -35.150%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild
I pushed up some examples/* as documentation and to facilitate memory profiling. I've gotten a couple approvals and only pushed up some non-controversial(I think) followups, so I'll plan to merge in a couple days unless I hear otherwise. Thanks for the reviews! Some followup commentary:
The left hump is the deserialization, the right hump is the serialization. Note that there is still a big hump for deserialization. Likely due to my other PERF note: https://github.com/georust/geojson/pull/199/files#diff-a9463680bdf3fa7278b52b437bfbe9072e20023a015621ed23bcb589f6ccd4b5R155 It's not any worse than before, so I'm inclined to merge as-is. One follow-up solution might be to do something like we do in |
af0b08c
to
b936683
Compare
bors r=urschrei,rmanoka |
Build succeeded: |
205: Add FeatureWriter to stream writes r=urschrei a=michaelkirk - [x] I agree to follow the project's [code of conduct](https://github.com/georust/geo/blob/master/CODE_OF_CONDUCT.md). - [x] I added an entry to `CHANGES.md` if knowledge of this change could be valuable to users. --- This allows the user to incrementally serialize a FeatureCollection, rather than requiring them to do it one-shot. This is a pretty logical counterpart to the FeatureReader introduced in #199. Co-authored-by: Michael Kirk <[email protected]>
CHANGES.md
if knowledge of this change could be valuable to users.This is an attempt to improve the ergonomics of parsing GeoJson using serde (FIXES #184) .
This PR is a draft because there are a lot of error paths related to invalid parsing that I'd like to add tests for, but I first wanted to check in on overall direction of the API. What do people think?I think this is ready for review!Examples
Given some geojson like this:
Before
Deserialization
You used to parse it like this:
Serialization
Then, to write it back to geojson, you'd have to either do all your processing strictly with the geojson types, or somehow convert your entities from and back to one of the GeoJson entities:
Something like:
After
But now you also have the option of parsing it into your own declarative struct using serde like this:
Declaration
Deserialization
Serialization
Caveats
Performance
Performance currently isn't great. There's a couple of things which seem ridiculous in the code that I've marked with
PERF:
that I don't have an immediate solution for. This is my first time really diving into the internals of serde and it's kind of a lot! My hope is that performance improvements would be possible with no or little changes to the API.Some specific numbers (from some admittedly crude benchmarks):
Old Deserialization:
New Deserialization:
Old serialization:
New serialization:
So the new "ergonomic" serialization/deserialization takes about 1.2x the time as the old way. Though it's actually probably a bit better than that because with the new form you have all your data ready to use. With the old way, you still need to go through this dance before you can start your analysis:
Anyway, I think this kind of speed difference is well worth the improved usability for my use cases.
Foreign Members
This doesn't support anything besides
geometry
andproperties
- e.g. foreign members are dropped. I'm hopeful that this is useful even with that limitation, but if not, maybe we can think of a way to accommodate most people's needs.