Skip to content

Improve performance and add additional benchmarks for the Java decoder #220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mactrem
Copy link
Collaborator

@mactrem mactrem commented Jun 28, 2024

Decoding benchmarks, based on a OpenMapTiles scheme tileset, of the MLT decoder against the Java port of the vector-tile-js library created by @springmeyer . The java port shows clear performance gains compared to the previously used Java libraries. The benchmarks now show a by factor between 2x (smaller tiles) and 6.5x (larger tiles) faster decoding performance of MLT compared to MVT. MLT where transcoded into the proposed in-memory format while MVT where decoded into the in-memory representation used by MapLibre. Currently still without using SIMD instructions in the MLT decoder.

Benchmark Mode Cnt Score Error Units Ratio MVT/MLT
Decode Compressed MVT Z2 avgt 5 7.871 ±0.586 ms/op 6.62
Decode Uncompressed MVT Z2 avgt 5 4.304 ±0.157 ms/op 3.65
Decode Uncompressed MLT Z2 avgt 5 1.187 ±0.097 ms/op
Decode Compressed MVT Z3 avgt 5 5.530 ±0.315 ms/op 7.04
Decode Uncompressed MVT Z3 avgt 5 3.050 ±0.263 ms/op 3.89
Decode Uncompressed MLT Z3 avgt 5 0.785 ±0.059 ms/op
Decode Compressed MVT Z4 avgt 5 17.724 ±1.578 ms/op 11.20
Decode Uncompressed MVT Z4 avgt 5 10.352 ±0.469 ms/op 6.54
Decode Uncompressed MLT Z4 avgt 5 1.583 ±0.149 ms/op
Decode Compressed MVT Z5 avgt 5 15.161 ±8.143 ms/op 9.84
Decode Uncompressed MVT Z5 avgt 5 7.558 ±1.774 ms/op 4.90
Decode Uncompressed MLT Z5 avgt 5 1.541 ±0.042 ms/op
Decode Compressed MVT Z6 avgt 5 4.238 ±0.230 ms/op 5.07
Decode Uncompressed MVT Z6 avgt 5 2.620 ±0.201 ms/op 3.14
Decode Uncompressed MLT Z6 avgt 5 0.835 ±0.626 ms/op
Decode Compressed MVT Z7 avgt 5 8.825 ±0.991 ms/op 6.00
Decode Uncompressed MVT Z7 avgt 5 5.224 ±0.274 ms/op 3.55
Decode Uncompressed MLT Z7 avgt 5 1.470 ±0.755 ms/op
Decode Compressed MVT Z8 avgt 5 4.738 ±0.381 ms/op 6.33
Decode Uncompressed MVT Z8 avgt 5 3.023 ±0.400 ms/op 4.04
Decode Uncompressed MLT Z8 avgt 5 0.748 ±0.136 ms/op
Decode Compressed MVT Z9 avgt 5 6.589 ±0.469 ms/op 4.79
Decode Uncompressed MVT Z9 avgt 5 3.948 ±0.173 ms/op 2.87
Decode Uncompressed MLT Z9 avgt 5 1.377 ±0.218 ms/op
Decode Compressed MVT Z10 avgt 5 2.452 ±0.281 ms/op 3.50
Decode Uncompressed MVT Z10 avgt 5 1.399 ±0.231 ms/op 2.00
Decode Uncompressed MLT Z10 avgt 5 0.701 ±0.123 ms/op
Decode Compressed MVT Z11 avgt 5 1.817 ±0.524 ms/op 5.08
Decode Uncompressed MVT Z11 avgt 5 0.929 ±0.060 ms/op 2.59
Decode Uncompressed MLT Z11 avgt 5 0.358 ±0.026 ms/op
Decode Compressed MVT Z12 avgt 5 2.833 ±0.877 ms/op 4.51
Decode Uncompressed MVT Z12 avgt 5 1.498 ±0.148 ms/op 2.39
Decode Uncompressed MLT Z12 avgt 5 0.628 ±0.109 ms/op
Decode Compressed MVT Z13 avgt 5 1.446 ±0.159 ms/op 3.92
Decode Uncompressed MVT Z13 avgt 5 0.869 ±0.264 ms/op 2.36
Decode Uncompressed MLT Z13 avgt 5 0.369 ±0.052 ms/op
Decode Compressed MVT Z14 avgt 5 11.727 ±1.741 ms/op 5.45
Decode Uncompressed MVT Z14 avgt 5 6.254 ±1.322 ms/op 2.90
Decode Uncompressed MLT Z14 avgt 5 2.153 ±0.185 ms/op

@springmeyer
Copy link
Collaborator

springmeyer commented Jun 28, 2024

@mactrem great to see this coming together! Huge change! I won't have a change to fully review before the weekend but will look forward to taking a closer look next week. One thing now however: I recall asking you about the output format of jmh before. I do apologize, I seem to have forgot as I'm having trouble understanding the table above. What is faster? What is smaller? What is "score"? EDIT, never mind, found docs at https://medium.com/@AlexanderObregon/introduction-to-java-microbenchmarking-with-jmh-java-microbenchmark-harness-55af74b2fd38.

Maybe we should work to consolidate on a single reporting format between Java and JS benchmarks: I went with ops/sec which is documented at https://github.com/maplibre/maplibre-tile-spec/blob/main/js/bench/readme.md#benchmark-design, but I'm flexible. I also went with console output in JS but it would be easy to create an option to dump a table. Let's figure out how to be consistent across implementations. EDIT: looks like it will be easy to match formats, will just need a table view for the JS side of things...

Copy link
Collaborator

@springmeyer springmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a chance to quickly do a first pass review. Requesting some reworking of the code being benchmarked to be more comparable between MLT and MVT decoding (see inline comments).

Also:

  • Would be great to pull in org.springmeyer.vector-tile-java via a package rather than vendoring: I think gradle supports pulling direct from GitHub right? If not, I can package it to maven. Let me know your preference
  • I would really like to see, before merging, separation of the benchmark code and optimizations in different PRs. So we could have the benchmark land first and then see the benchmark results change in a followup PR to optimize the Java code. Does that sound good as the eventual path forward once you have time to work on this again?

public FeatureTable[] decodeMltZ4() {
var mlTile = encodedMltTiles.get(4);
var mltMetadata = tileMetadata.get(4);
return MltDecoder.decodeMlTileVectorized(mlTile, mltMetadata);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mactrem if I understand correctly decodeMlTileVectorized returns a FeatureTable[] but does not actually return features or decode geometries until the iterator.next() is iterated through on every FeatureTable. If just getting the FeatureTable is what you intend to benchmark (rather than feature + geometry decoding), then the closest equivalent to that for MVT would be just doing:

VectorTile vectorTile = new VectorTile(pbf, pbf.length);

And nothing else. But I see that below the org.springmeyer.VectorTile library is being used to iterate through and decode all features.

Copy link
Collaborator Author

@mactrem mactrem Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FeatureTable is the in-memory MLT format which i'm proposing as part of the spec. So basically the spec is divided into the storage and in-memory format like it is the case for example with Parquet and Arrow. This MLT in-memory format can be directly processed like it is the case with the MVT in-memory representation but in a more efficient way. For example you can directly filter the data, process the geometries or in some cases directly copy the geometries into GL buffers. So the geometries are fully decoded and usable by (next next generation) map renderer. By not calling loadGeometry the (most) important and dominant part of the decoding is left out (such as transforming the command/parameter integer into vertices, also delta, zigZag, varint decoding). Even if the data are lazy decoded, in my experience (nearly) all geometries get materialized at some point in time before rendering (at least in data/styles i'm using) and you pay the costs anyway. Or am I missing something in relation to the lazy decoding or do you have a different experience?

But of course I also understand the problem that we cannot currently use this column-oriented in-memory format directly in MapLibre, as this would mean too much refactoring effort and we therefore have to convert to another row-oriented in-memory representation. What do think if we you have two benchmarks, one which shows the potential of the format by decoding in the proposed in-memory format which is representative for next generation map renderer (or also for map renderer which using different in-memory representations) and benchmarks for the currently used MapLibre GL JS in-memory representation.
In general because of my research background i'm in particular interested in the "next generation" use case, so i'm also totally fine if we don't merge this benchmarks. On the other hand i also think just showing the second benchmarks where we have this additional transformation into a different in-memory representation doesn't fully show the potential of the format. What do you think about it?

Copy link
Collaborator Author

@mactrem mactrem Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional comments on the proposed in-memory format. The in-memory format allows map rendering libraries on the CPU as well on the GPU random (constant time) access to every attribute in the tile. The step of transforming to a random access representation is also currently quite expensive and has to be optimized. So the MLT in-memory format has basically the same capabilities like the Arrow or Velox format but tailored for the map rendering use case. The GeoArrow format for example, which is basically a spatial extension to Arrow, can also be directly processed and rendered by Deck.GL see https://github.com/geoarrow/deck.gl-layers

var feature = layer.feature(i);
var geometry = feature.loadGeometry();
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts on this:

  1. Parity with MLT decoding in Java

See my comment above that I think this is doing significantly more decoding work than the MLT benchmark which currently appears to be calling MltDecoder.decodeMlTileVectorized without actually iterating the features/geometries.

  1. Parity with JS

This is roughly equivalent to what I'm benchmarking against in JS. See here:

const decode = async (impl, earcut: boolean) => {
const tile = await impl();
const layerNames = Object.keys(tile.layers).sort((a, b) => a.localeCompare(b));
let featureCount = 0;
let triangleCount = 0;
for (const layerName of layerNames) {
const layer = tile.layers[layerName];
for (let i = 0; i < layer.length; i++) {
const feature = layer.feature(i);
const geometry = feature.loadGeometry();
if (geometry.length > 0) {
featureCount++;
}
if (earcut && feature.type === GeometryType.Polygon) {
const triangles = await tessellate(geometry);
triangleCount += triangles.length;
}
}
}
return { featureCount, triangleCount };
}
. The differences are:

A. I'm collecting the length of the top level geometry array as a cheap way to ensure that dead code elimination does not strike. I gather than jmh uses black hole techniques to avoid the JVM eliminating this code so that should not be needed? But maybe the java benchmarks should collect a count of something and return it just to be safe?
In JS I've also profiled to ensure that all these functions are being called as I would expect.

B. I'm sorting the layers first to workaround #190. I don't have evidence that parsing layers in different orders will result in different performance, but I could see it resulting in different memory allocations, so I think making sure the order is the same is a good idea to rule out any potential for variability due to that. Could you add sorting before the test?

Copy link
Collaborator Author

@mactrem mactrem Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's also the point of the proposed MLT storage and in-memory format and why they are so tightly coupled and may have better decoding performance. Because you have to do less work on decoding and you can also work on compressed data

}
}

return vectorTile.layers;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose iterating all features in both the MVT and MLT benchmarks and returning a count of some kind.

@mactrem
Copy link
Collaborator Author

mactrem commented Jun 29, 2024

  • Would be great to pull in org.springmeyer.vector-tile-java via a package rather than vendoring: I think gradle supports pulling direct from GitHub right? If not, I can package it to maven. Let me know your preference

yes, it is currently still work in progress. So my plan was to access it from maven. Do you think you could publish it to maven as it could be also interesting for other developers, as it shows clear performance advantage over the existing MVT libraries?

@springmeyer
Copy link
Collaborator

Okay sounds good. I'll publish to maven and ping you back here when it is available.

@mactrem
Copy link
Collaborator Author

mactrem commented Sep 25, 2024

@springmeyer @nyurik i will now continue working on the integration of MLT in MapLibre GL JS. Can we merge this PR, so that i can build on top of that?

@mactrem mactrem force-pushed the feature/add_vectorization branch from 47a4bb2 to 4382ae4 Compare October 18, 2024 10:52
@mactrem mactrem changed the title [WIP] Improve performance and add additional benchmarks for the Java decoder Improve performance and add additional benchmarks for the Java decoder Oct 18, 2024
@mactrem mactrem force-pushed the feature/add_vectorization branch 2 times, most recently from e146b62 to 5c847a0 Compare October 18, 2024 11:39
fix

fix

refactor

save state

save state
@mactrem mactrem force-pushed the feature/add_vectorization branch from 5c847a0 to c5b9f00 Compare October 18, 2024 12:03
louwers pushed a commit that referenced this pull request Feb 7, 2025
* Add index

* Fix link

* Fix date

---------

Co-authored-by: duje <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants