-
Notifications
You must be signed in to change notification settings - Fork 26
Improve performance and add additional benchmarks for the Java decoder #220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@mactrem great to see this coming together! Huge change! I won't have a change to fully review before the weekend but will look forward to taking a closer look next week.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a chance to quickly do a first pass review. Requesting some reworking of the code being benchmarked to be more comparable between MLT and MVT decoding (see inline comments).
Also:
- Would be great to pull in org.springmeyer.vector-tile-java via a package rather than vendoring: I think gradle supports pulling direct from GitHub right? If not, I can package it to maven. Let me know your preference
- I would really like to see, before merging, separation of the benchmark code and optimizations in different PRs. So we could have the benchmark land first and then see the benchmark results change in a followup PR to optimize the Java code. Does that sound good as the eventual path forward once you have time to work on this again?
public FeatureTable[] decodeMltZ4() { | ||
var mlTile = encodedMltTiles.get(4); | ||
var mltMetadata = tileMetadata.get(4); | ||
return MltDecoder.decodeMlTileVectorized(mlTile, mltMetadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mactrem if I understand correctly decodeMlTileVectorized
returns a FeatureTable[]
but does not actually return features or decode geometries until the iterator.next() is iterated through on every FeatureTable. If just getting the FeatureTable is what you intend to benchmark (rather than feature + geometry decoding), then the closest equivalent to that for MVT would be just doing:
VectorTile vectorTile = new VectorTile(pbf, pbf.length);
And nothing else. But I see that below the org.springmeyer.VectorTile
library is being used to iterate through and decode all features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FeatureTable
is the in-memory MLT format which i'm proposing as part of the spec. So basically the spec is divided into the storage and in-memory format like it is the case for example with Parquet and Arrow. This MLT in-memory format can be directly processed like it is the case with the MVT in-memory representation but in a more efficient way. For example you can directly filter the data, process the geometries or in some cases directly copy the geometries into GL buffers. So the geometries are fully decoded and usable by (next next generation) map renderer. By not calling loadGeometry the (most) important and dominant part of the decoding is left out (such as transforming the command/parameter integer into vertices, also delta, zigZag, varint decoding). Even if the data are lazy decoded, in my experience (nearly) all geometries get materialized at some point in time before rendering (at least in data/styles i'm using) and you pay the costs anyway. Or am I missing something in relation to the lazy decoding or do you have a different experience?
But of course I also understand the problem that we cannot currently use this column-oriented in-memory format directly in MapLibre, as this would mean too much refactoring effort and we therefore have to convert to another row-oriented in-memory representation. What do think if we you have two benchmarks, one which shows the potential of the format by decoding in the proposed in-memory format which is representative for next generation map renderer (or also for map renderer which using different in-memory representations) and benchmarks for the currently used MapLibre GL JS in-memory representation.
In general because of my research background i'm in particular interested in the "next generation" use case, so i'm also totally fine if we don't merge this benchmarks. On the other hand i also think just showing the second benchmarks where we have this additional transformation into a different in-memory representation doesn't fully show the potential of the format. What do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional comments on the proposed in-memory format. The in-memory format allows map rendering libraries on the CPU as well on the GPU random (constant time) access to every attribute in the tile. The step of transforming to a random access representation is also currently quite expensive and has to be optimized. So the MLT in-memory format has basically the same capabilities like the Arrow or Velox format but tailored for the map rendering use case. The GeoArrow format for example, which is basically a spatial extension to Arrow, can also be directly processed and rendered by Deck.GL see https://github.com/geoarrow/deck.gl-layers
var feature = layer.feature(i); | ||
var geometry = feature.loadGeometry(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two thoughts on this:
- Parity with MLT decoding in Java
See my comment above that I think this is doing significantly more decoding work than the MLT benchmark which currently appears to be calling MltDecoder.decodeMlTileVectorized
without actually iterating the features/geometries.
- Parity with JS
This is roughly equivalent to what I'm benchmarking against in JS. See here:
maplibre-tile-spec/js/bench/utils.ts
Lines 68 to 88 in 96852ba
const decode = async (impl, earcut: boolean) => { | |
const tile = await impl(); | |
const layerNames = Object.keys(tile.layers).sort((a, b) => a.localeCompare(b)); | |
let featureCount = 0; | |
let triangleCount = 0; | |
for (const layerName of layerNames) { | |
const layer = tile.layers[layerName]; | |
for (let i = 0; i < layer.length; i++) { | |
const feature = layer.feature(i); | |
const geometry = feature.loadGeometry(); | |
if (geometry.length > 0) { | |
featureCount++; | |
} | |
if (earcut && feature.type === GeometryType.Polygon) { | |
const triangles = await tessellate(geometry); | |
triangleCount += triangles.length; | |
} | |
} | |
} | |
return { featureCount, triangleCount }; | |
} |
A. I'm collecting the length of the top level geometry array as a cheap way to ensure that dead code elimination does not strike. I gather than jmh uses black hole techniques to avoid the JVM eliminating this code so that should not be needed? But maybe the java benchmarks should collect a count of something and return it just to be safe?
In JS I've also profiled to ensure that all these functions are being called as I would expect.
B. I'm sorting the layers first to workaround #190. I don't have evidence that parsing layers in different orders will result in different performance, but I could see it resulting in different memory allocations, so I think making sure the order is the same is a good idea to rule out any potential for variability due to that. Could you add sorting before the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's also the point of the proposed MLT storage and in-memory format and why they are so tightly coupled and may have better decoding performance. Because you have to do less work on decoding and you can also work on compressed data
} | ||
} | ||
|
||
return vectorTile.layers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose iterating all features in both the MVT and MLT benchmarks and returning a count of some kind.
yes, it is currently still work in progress. So my plan was to access it from maven. Do you think you could publish it to maven as it could be also interesting for other developers, as it shows clear performance advantage over the existing MVT libraries? |
Okay sounds good. I'll publish to maven and ping you back here when it is available. |
@springmeyer @nyurik i will now continue working on the integration of MLT in MapLibre GL JS. Can we merge this PR, so that i can build on top of that? |
47a4bb2
to
4382ae4
Compare
e146b62
to
5c847a0
Compare
fix fix refactor save state save state
5c847a0
to
c5b9f00
Compare
* Add index * Fix link * Fix date --------- Co-authored-by: duje <[email protected]>
Decoding benchmarks, based on a OpenMapTiles scheme tileset, of the MLT decoder against the Java port of the vector-tile-js library created by @springmeyer . The java port shows clear performance gains compared to the previously used Java libraries. The benchmarks now show a by factor between 2x (smaller tiles) and 6.5x (larger tiles) faster decoding performance of MLT compared to MVT. MLT where transcoded into the proposed in-memory format while MVT where decoded into the in-memory representation used by MapLibre. Currently still without using SIMD instructions in the MLT decoder.