-
Notifications
You must be signed in to change notification settings - Fork 10
Propertynames of feature collection falsely taken from the first element #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It seems the spec doesn't say either way. So mixed names are probably acceptable. Your problem comes from our maybe overly simplistic implementation of the Tables.jl interface. We can instead use a custom function to get column names and leave |
Indeed that is a valid GeoJSON file. Currently the undocumented assumption for the Tables interface is that the GeoJSON file must be tabular, i.e. the propertynames of each Feature must be equal. So the simple answer is currently that the Tables interface doesn't apply since this file is not tabular. But it's good to have this issue to discuss this. Note that propertynames does work per feature. So it's still possible to work with this file, just not through the Tables interface. julia> propertynames(b)
(:geometry, :bar, :foo)
julia> propertynames.(b)
2-element Vector{Tuple{Symbol, Symbol, Vararg{Symbol}}}:
(:geometry, :bar, :foo)
(:geometry, :foo) Some tests related to this are here: https://github.com/JuliaGeo/GeoJSON.jl/blob/v0.6.0/test/runtests.jl#L228-L248 The current behavior, selecting the first row, was simple to implement and fast. And my assumption was that non tabular GeoJSON files are rare (e.g. GDAL doesn't export them like that, it includes |
We can at least fix this error, we should check the feature property exists before calling We could also make a keyword argument to specify the column names explicitly in the case they are not those of the first feature, or to force looping and detecting them. |
I'm somewhat hesitant about that
If e.g.
If indeed benchmarking shows it to be too expensive to do by default, we should probably do this. |
But if we have an argument to force looping, in that case we will also need to make a change somewhat as I suggest so that the missing keys in individual features show as missing instead of throwing an error (for a mixed property table to load at all). To get the best of both we could put the detected |
This is an implementation of the complete looping function that you can use in the meanwhile: function properlynames(fc::GeoJSON.FeatureCollection)
f, features = Iterators.peel(fc)
set = Set(propertynames(f))
for f in features
for name in keys(GeoJSON.properties(f))
push!(set, name)
end
end
return set
end For a huge GeoJSON with 16k rows and 300 cols, this takes 4 seconds. For comparison, If we could for instance cache the propertynames in the
How about having |
That's possible as well, although the problem is table rows usually have all the same properties as the table they come from. So it will break the Tables.jl rows interface. |
I do not know if all GeoJSON features should share the same properties. But if not, the following is a bit of a nuisance.
With GeoJSON 0.6.0
now
propertynames(b)
gives(:geometry, :bar, :foo)
.This is same what I get when tab expanding
There is
b.foo
but there is no
b.bar
The text was updated successfully, but these errors were encountered: