-
Notifications
You must be signed in to change notification settings - Fork 10
Rewrite using JSON3(read, Type) #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Would have been nice to keep the same files so we can see the changes ;) |
I tried, but failed. Essentially the following happened:
|
Ok, so this requires the unreleased GeoInterface for getting the extent (using The remaining error is from Aqua, which complains about Unbound type parameters, in all the constructors for Geometry, like: GeoJSON.Polygon(bbox::Union{Nothing, Vector{Float32}}, coordinates::Union{Nothing, Array{Array{Tuple{Vararg{Float32, D}}, 1}, 1}}) where D @ GeoJSON ~/code/GeoJSON.jl/src/geojson_types.jl:46 I'm not sure what the unbound problem is, so help would be appreciated. I've removed the ComputedFieldTypes (so bbox is not D*2 NTuple, just a Vector), as it added a Type Parameter that we don't control, for little to no gain. I've changed the coordinate number to Float32, as it should be enough for WGS84 precision. The remaining thing is to improve the shadowing of fields vs keys in the properties of Feature(Collections), as we also need to add I also still need to write a changelog with all the breaking changes in this release.
|
The unbound problem is with the Or we need to add explicit inner constructors where the Something like: struct Polygon{D} <: AbstractGeometry{D}
bbox::Union{Nothing,Vector{Float32}}
coordinates::Union{Nothing,Vector{Vector{NTuple{D,Float32}}}}
Polygon{D}(bbox, coordinates) = new{D}(bbox, coordinates)
Polygon{D}(; bbox=nothing, coordinates=nothing) = Polygon{D}(bbox, coordinates)
end I think there is no way around providing |
Well you can do Point(; bbox=nothing, coordinates::NTuple{D,T}) where {D,T<:Real} = Point{D}(bbox, coordinates) But then you have unbound T. Otherwise you have to hardcode the Type to Float32, or change the signature from {D}, to {D,T} for everything. What do you think? |
Don't we know |
This is mainly for end users, to easily construct a thing with kw arguments. And D can be inferred when |
Ok, I've now set {D,T} as the type parameters. Sadly, I still get the unbound errors. Another thing, precompile doesn't seem to do much at all, even with SnoopPrecompile. Reading in the 24MB always takes 2.5s at the first time, afterwards it drops to 0.04s. Maybe we need to place a bit of JSON3 reads just in the module itself to warm things up a bit. Or make a PR over at JSON3 to precompile much more types? Anyway, I'd appreciate a deeper review, as this is close to complete. |
Ok well I didn't recommend adding
But personally I would force everyone to use D, or like GeometryBasics.jl, add I'll try review tomorrow or wednesday |
After some experiments with precompile/snoopprecompile, I just added a lot of precompile statements from --trace-compile. This is the only reliable method that got initial times down for parsing the 24MB featurecollection from GeoMakie. Also fixed the unbound parameters test by defaulting to 2d for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly great, some comments on types and minor improvments.
Seems that write here no longer accepts other geometries? It needs to write anything not just GeoJSON objects
This fixes it: write(io, obj) = JSON3.write(io, _lower(obj))
write(obj) = JSON3.write(_lower(obj)) |
I've been using this branch for day to day work, but hitting some bugs. For example his file was created and loads properly with |
Yep, that's it. Should be |
Sweet easy fix |
Can you describe the other bugs? Otherwise, especially if you can use this for your day to day work, let's merge this to main, revive #59 and make a breaking release. |
One was it's really slow using mixed geometry type features from overpass turbo. I'll have to put an MWE together. Something else didn't work but it might have been |
Ok MWE of slow loads. I chose a workload that would take a second or so so we can profile it. My use case needs the whole island, which takes a lot longer. The file is here: https://drive.google.com/file/d/1B-3ziH0oWivLye6V0wKk40Nxr3PTsAb0/view?usp=sharing using GeoJSON
using ProfileView
@time GeoJSON.read(read("slow.geojson")) Gives: julia> @time GeoJSON.read(read("slow.geojson"))
1.430720 seconds (1.13 M allocations: 54.966 MiB)
FeatureCollection with 5789 Features And the profile: @profview GeoJSON.read(read("slow.geojson")) Mostly type instability in StructTypes.jl It may be better if the properties are a |
How slow is "isn't that slow" ? Are you on julia 1.9? That's a very different profile it would be good to know whats happening here. |
I think the other half of my profile was that "invalid" method. So you just have that part, without the StructTypes instabilities. Interesting. |
Properties made into Dict again 1.9 julia> @time GeoJSON.read(data)
1.686686 seconds (3.22 M allocations: 157.501 MiB, 4.47% gc time, 64.26% compilation time)
FeatureCollection with 5789 Features
julia> @time GeoJSON.read(data)
0.600645 seconds (983.32 k allocations: 39.599 MiB, 1.28% gc time)
FeatureCollection with 5789 Features 1.8 julia> @time GeoJSON.read(data)
2.148807 seconds (3.77 M allocations: 174.611 MiB, 3.27% gc time, 69.70% compilation time)
FeatureCollection with 5789 Features
julia> @time GeoJSON.read(data)
0.646021 seconds (936.02 k allocations: 38.156 MiB, 5.79% gc time)
FeatureCollection with 5789 Features this PR (still with NamedTuple) 1.9 julia> @time GeoJSON.read(data)
7.736202 seconds (10.49 M allocations: 623.268 MiB, 1.06% gc time, 87.94% compilation time)
FeatureCollection with 5789 Features
julia> @time GeoJSON.read(data)
0.806888 seconds (1.15 M allocations: 46.902 MiB, 1.44% gc time)
FeatureCollection with 5789 Features 1.8 julia> @time GeoJSON.read(data)
8.658965 seconds (14.97 M allocations: 787.488 MiB, 1.99% gc time, 88.66% compilation time)
FeatureCollection with 5789 Features
julia> @time GeoJSON.read(data)
0.821887 seconds (1.10 M allocations: 44.540 MiB, 0.69% gc time)
FeatureCollection with 5789 Features So it seems that while the NamedTuple allocates slightly more, it is mostly much harder on type inference, the first call is much much slower. These timings are done on an M1 macbook. |
Also, I don't understand why it would hit @noinline invalid(error, buf, pos, T) = throw(ArgumentError("""
invalid JSON at byte position $pos while parsing type $T: $error
$(String(buf[max(1, pos-25):min(end, pos+25)]))
""")) |
Nice, so But that Is it getting caught by a try/catch block somewhere and ignored? |
I guess its this try - catch: https://github.com/quinnj/JSON3.jl/blob/main/src/structs.jl#L82-L86 We could remove the try catch and just run the first |
Yes and it's caused by the id of the Feature, having an unholy Union. Changing from julia> @time GeoJSON.read(data)
1.174992 seconds (3.05 M allocations: 148.982 MiB, 2.30% gc time, 91.93% compilation time)
FeatureCollection with 5789 Features
julia> @time GeoJSON.read(data)
0.086503 seconds (890.70 k allocations: 35.536 MiB)
FeatureCollection with 5789 Features Spec however says:
|
I would burn the spec for that speedup, especially if it just means ids are always strings. But can we hack around it some how by defining a custom Where we check |
Or setting it to Any? Ugly, but just as fast. |
Oh yeah let's do that. But does it actually parse the numbers? Or just make them all strings anyway. |
julia> d = """{
"type": "Feature",
"id": 1,
"bbox": [-180.0, -90.0, 180.0, 90.0],
"geometry": {"type": "MultiLineString", "coordinates": [[[3.75, 9.25], [-130.95, 1.52]], [[23.15, -34.25], [-1.35, -4.65], [3.45, 77.95]]]},
"properties": {"title": "Dict 1"}
}""";
julia> f = GeoJSON.read(d)
Feature with 2D MultiLineString geometry and 2 properties: (:geometry, :title)
julia> f.id
1
julia> typeof(f.id)
Int64 String also works. I did note that accessing the properties is a bit type unstable, due to us overloading getproperty, but it still seems pretty fast to me. Only some culling of the precompile is needed. While the large examples (yours, the original Makie one), now take ~1s to parse for the first time and ~0.05s for subsequent calls, precompiling now takes ~15s. |
I'm fine with shifting that time to precompile for that performance gain. SnoopCompile But we should just get this merged, I've already switched to it for practical work, the precompile is less than it was taking to load single json files. |
Feel free to merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm.
Im wondering how many of those precompile statements actually work in practice, just running the functions tends to be better if there is any type instability. But we can fix that later.
Will be a severe breaking release as almost everything changed, but should also offer great performance.
Needs convert/construction logic from Tables/GeoInterfaceother PRs.Our previous code was much more flexible in allowing anything to be parsed as GeoJSON, but we will miss out on that, but gain performance. Setting Feature(Collection) to mutable does allow some flexibility, such as not writing the bbox or other attributes if they're nothing. It does double the compilation time, but is almost on par once compiled. I can imagine we add other nullable attributes such as id and crs to be slightly backwards compatible.
Added a dep on ComputedFieldTypes to calculate the type of the bounding box, but it might be overkill.