Improve performance and add additional benchmarks for the Java decoder #220

mactrem · 2024-06-28T18:38:44Z

Decoding benchmarks, based on a OpenMapTiles scheme tileset, of the MLT decoder against the Java port of the vector-tile-js library created by @springmeyer . The java port shows clear performance gains compared to the previously used Java libraries. The benchmarks now show a by factor between 2x (smaller tiles) and 6.5x (larger tiles) faster decoding performance of MLT compared to MVT. MLT where transcoded into the proposed in-memory format while MVT where decoded into the in-memory representation used by MapLibre. Currently still without using SIMD instructions in the MLT decoder.

Benchmark	Mode	Cnt	Score	Error	Units	Ratio MVT/MLT
Decode Compressed MVT Z2	avgt	5	7.871	±0.586	ms/op	6.62
Decode Uncompressed MVT Z2	avgt	5	4.304	±0.157	ms/op	3.65
Decode Uncompressed MLT Z2	avgt	5	1.187	±0.097	ms/op
Decode Compressed MVT Z3	avgt	5	5.530	±0.315	ms/op	7.04
Decode Uncompressed MVT Z3	avgt	5	3.050	±0.263	ms/op	3.89
Decode Uncompressed MLT Z3	avgt	5	0.785	±0.059	ms/op
Decode Compressed MVT Z4	avgt	5	17.724	±1.578	ms/op	11.20
Decode Uncompressed MVT Z4	avgt	5	10.352	±0.469	ms/op	6.54
Decode Uncompressed MLT Z4	avgt	5	1.583	±0.149	ms/op
Decode Compressed MVT Z5	avgt	5	15.161	±8.143	ms/op	9.84
Decode Uncompressed MVT Z5	avgt	5	7.558	±1.774	ms/op	4.90
Decode Uncompressed MLT Z5	avgt	5	1.541	±0.042	ms/op
Decode Compressed MVT Z6	avgt	5	4.238	±0.230	ms/op	5.07
Decode Uncompressed MVT Z6	avgt	5	2.620	±0.201	ms/op	3.14
Decode Uncompressed MLT Z6	avgt	5	0.835	±0.626	ms/op
Decode Compressed MVT Z7	avgt	5	8.825	±0.991	ms/op	6.00
Decode Uncompressed MVT Z7	avgt	5	5.224	±0.274	ms/op	3.55
Decode Uncompressed MLT Z7	avgt	5	1.470	±0.755	ms/op
Decode Compressed MVT Z8	avgt	5	4.738	±0.381	ms/op	6.33
Decode Uncompressed MVT Z8	avgt	5	3.023	±0.400	ms/op	4.04
Decode Uncompressed MLT Z8	avgt	5	0.748	±0.136	ms/op
Decode Compressed MVT Z9	avgt	5	6.589	±0.469	ms/op	4.79
Decode Uncompressed MVT Z9	avgt	5	3.948	±0.173	ms/op	2.87
Decode Uncompressed MLT Z9	avgt	5	1.377	±0.218	ms/op
Decode Compressed MVT Z10	avgt	5	2.452	±0.281	ms/op	3.50
Decode Uncompressed MVT Z10	avgt	5	1.399	±0.231	ms/op	2.00
Decode Uncompressed MLT Z10	avgt	5	0.701	±0.123	ms/op
Decode Compressed MVT Z11	avgt	5	1.817	±0.524	ms/op	5.08
Decode Uncompressed MVT Z11	avgt	5	0.929	±0.060	ms/op	2.59
Decode Uncompressed MLT Z11	avgt	5	0.358	±0.026	ms/op
Decode Compressed MVT Z12	avgt	5	2.833	±0.877	ms/op	4.51
Decode Uncompressed MVT Z12	avgt	5	1.498	±0.148	ms/op	2.39
Decode Uncompressed MLT Z12	avgt	5	0.628	±0.109	ms/op
Decode Compressed MVT Z13	avgt	5	1.446	±0.159	ms/op	3.92
Decode Uncompressed MVT Z13	avgt	5	0.869	±0.264	ms/op	2.36
Decode Uncompressed MLT Z13	avgt	5	0.369	±0.052	ms/op
Decode Compressed MVT Z14	avgt	5	11.727	±1.741	ms/op	5.45
Decode Uncompressed MVT Z14	avgt	5	6.254	±1.322	ms/op	2.90
Decode Uncompressed MLT Z14	avgt	5	2.153	±0.185	ms/op

springmeyer · 2024-06-28T21:45:36Z

@mactrem great to see this coming together! Huge change! I won't have a change to fully review before the weekend but will look forward to taking a closer look next week. One thing now however: I recall asking you about the output format of jmh before. I do apologize, I seem to have forgot as I'm having trouble understanding the table above. What is faster? What is smaller? What is "score"? EDIT, never mind, found docs at https://medium.com/@AlexanderObregon/introduction-to-java-microbenchmarking-with-jmh-java-microbenchmark-harness-55af74b2fd38.

Maybe we should work to consolidate on a single reporting format between Java and JS benchmarks: I went with ops/sec which is documented at https://github.com/maplibre/maplibre-tile-spec/blob/main/js/bench/readme.md#benchmark-design, but I'm flexible. I also went with console output in JS but it would be easy to create an option to dump a table. Let's figure out how to be consistent across implementations. EDIT: looks like it will be easy to match formats, will just need a table view for the JS side of things...

springmeyer

Had a chance to quickly do a first pass review. Requesting some reworking of the code being benchmarked to be more comparable between MLT and MVT decoding (see inline comments).

Also:

Would be great to pull in org.springmeyer.vector-tile-java via a package rather than vendoring: I think gradle supports pulling direct from GitHub right? If not, I can package it to maven. Let me know your preference
I would really like to see, before merging, separation of the benchmark code and optimizations in different PRs. So we could have the benchmark land first and then see the benchmark results change in a followup PR to optimize the Java code. Does that sound good as the eventual path forward once you have time to work on this again?

springmeyer · 2024-06-29T13:50:28Z

java/src/jmh/java/com/mlt/BingMapsDecoderBenchmark.java

+  public FeatureTable[] decodeMltZ4() {
+    var mlTile = encodedMltTiles.get(4);
+    var mltMetadata = tileMetadata.get(4);
+    return MltDecoder.decodeMlTileVectorized(mlTile, mltMetadata);


@mactrem if I understand correctly decodeMlTileVectorized returns a FeatureTable[] but does not actually return features or decode geometries until the iterator.next() is iterated through on every FeatureTable. If just getting the FeatureTable is what you intend to benchmark (rather than feature + geometry decoding), then the closest equivalent to that for MVT would be just doing:

VectorTile vectorTile = new VectorTile(pbf, pbf.length);

And nothing else. But I see that below the org.springmeyer.VectorTile library is being used to iterate through and decode all features.

The FeatureTable is the in-memory MLT format which i'm proposing as part of the spec. So basically the spec is divided into the storage and in-memory format like it is the case for example with Parquet and Arrow. This MLT in-memory format can be directly processed like it is the case with the MVT in-memory representation but in a more efficient way. For example you can directly filter the data, process the geometries or in some cases directly copy the geometries into GL buffers. So the geometries are fully decoded and usable by (next next generation) map renderer. By not calling loadGeometry the (most) important and dominant part of the decoding is left out (such as transforming the command/parameter integer into vertices, also delta, zigZag, varint decoding). Even if the data are lazy decoded, in my experience (nearly) all geometries get materialized at some point in time before rendering (at least in data/styles i'm using) and you pay the costs anyway. Or am I missing something in relation to the lazy decoding or do you have a different experience?

But of course I also understand the problem that we cannot currently use this column-oriented in-memory format directly in MapLibre, as this would mean too much refactoring effort and we therefore have to convert to another row-oriented in-memory representation. What do think if we you have two benchmarks, one which shows the potential of the format by decoding in the proposed in-memory format which is representative for next generation map renderer (or also for map renderer which using different in-memory representations) and benchmarks for the currently used MapLibre GL JS in-memory representation.
In general because of my research background i'm in particular interested in the "next generation" use case, so i'm also totally fine if we don't merge this benchmarks. On the other hand i also think just showing the second benchmarks where we have this additional transformation into a different in-memory representation doesn't fully show the potential of the format. What do you think about it?

Some additional comments on the proposed in-memory format. The in-memory format allows map rendering libraries on the CPU as well on the GPU random (constant time) access to every attribute in the tile. The step of transforming to a random access representation is also currently quite expensive and has to be optimized. So the MLT in-memory format has basically the same capabilities like the Arrow or Velox format but tailored for the map rendering use case. The GeoArrow format for example, which is basically a spatial extension to Arrow, can also be directly processed and rendered by Deck.GL see https://github.com/geoarrow/deck.gl-layers

springmeyer · 2024-06-29T14:01:13Z

java/src/main/java/com/mlt/converter/mvt/MvtUtils.java

+        var feature = layer.feature(i);
+        var geometry = feature.loadGeometry();
+      }
+    }


Two thoughts on this:

Parity with MLT decoding in Java

See my comment above that I think this is doing significantly more decoding work than the MLT benchmark which currently appears to be calling MltDecoder.decodeMlTileVectorized without actually iterating the features/geometries.

Parity with JS

This is roughly equivalent to what I'm benchmarking against in JS. See here:

maplibre-tile-spec/js/bench/utils.ts

Lines 68 to 88 in 96852ba

const decode = async (impl, earcut: boolean) => {

const tile = await impl();

const layerNames = Object.keys(tile.layers).sort((a, b) => a.localeCompare(b));

let featureCount = 0;

let triangleCount = 0;

for (const layerName of layerNames) {

const layer = tile.layers[layerName];

for (let i = 0; i < layer.length; i++) {

const feature = layer.feature(i);

const geometry = feature.loadGeometry();

if (geometry.length > 0) {

featureCount++;

}

if (earcut && feature.type === GeometryType.Polygon) {

const triangles = await tessellate(geometry);

triangleCount += triangles.length;

}

}

}

return { featureCount, triangleCount };

}

. The differences are:

A. I'm collecting the length of the top level geometry array as a cheap way to ensure that dead code elimination does not strike. I gather than jmh uses black hole techniques to avoid the JVM eliminating this code so that should not be needed? But maybe the java benchmarks should collect a count of something and return it just to be safe?
In JS I've also profiled to ensure that all these functions are being called as I would expect.

B. I'm sorting the layers first to workaround #190. I don't have evidence that parsing layers in different orders will result in different performance, but I could see it resulting in different memory allocations, so I think making sure the order is the same is a good idea to rule out any potential for variability due to that. Could you add sorting before the test?

But that's also the point of the proposed MLT storage and in-memory format and why they are so tightly coupled and may have better decoding performance. Because you have to do less work on decoding and you can also work on compressed data

springmeyer · 2024-06-29T14:01:47Z

java/src/main/java/com/mlt/converter/mvt/MvtUtils.java

+      }
+    }
+
+    return vectorTile.layers;


I propose iterating all features in both the MVT and MLT benchmarks and returning a count of some kind.

mactrem · 2024-06-29T15:07:37Z

Would be great to pull in org.springmeyer.vector-tile-java via a package rather than vendoring: I think gradle supports pulling direct from GitHub right? If not, I can package it to maven. Let me know your preference

yes, it is currently still work in progress. So my plan was to access it from maven. Do you think you could publish it to maven as it could be also interesting for other developers, as it shows clear performance advantage over the existing MVT libraries?

springmeyer · 2024-06-29T15:13:11Z

Okay sounds good. I'll publish to maven and ping you back here when it is available.

mactrem · 2024-09-25T06:03:12Z

@springmeyer @nyurik i will now continue working on the integration of MLT in MapLibre GL JS. Can we merge this PR, so that i can build on top of that?

fix fix refactor save state save state

* Add index * Fix link * Fix date --------- Co-authored-by: duje <[email protected]>

springmeyer requested changes Jun 29, 2024

View reviewed changes

mactrem force-pushed the feature/add_vectorization branch from 47a4bb2 to 4382ae4 Compare October 18, 2024 10:52

mactrem changed the title ~~[WIP] Improve performance and add additional benchmarks for the Java decoder~~ Improve performance and add additional benchmarks for the Java decoder Oct 18, 2024

mactrem force-pushed the feature/add_vectorization branch 2 times, most recently from e146b62 to 5c847a0 Compare October 18, 2024 11:39

initial status

c5b9f00

fix fix refactor save state save state

mactrem force-pushed the feature/add_vectorization branch from 5c847a0 to c5b9f00 Compare October 18, 2024 12:03

louwers pushed a commit that referenced this pull request Feb 7, 2025

Post elections (#220)

efdc8ee

* Add index * Fix link * Fix date --------- Co-authored-by: duje <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance and add additional benchmarks for the Java decoder #220

Improve performance and add additional benchmarks for the Java decoder #220

Uh oh!

mactrem commented Jun 28, 2024 •

edited

Loading

Uh oh!

springmeyer commented Jun 28, 2024 •

edited

Loading

Uh oh!

springmeyer left a comment

Uh oh!

springmeyer Jun 29, 2024

Uh oh!

mactrem Jun 29, 2024 •

edited

Loading

Uh oh!

mactrem Jun 29, 2024 •

edited

Loading

Uh oh!

springmeyer Jun 29, 2024

Uh oh!

mactrem Jun 29, 2024 •

edited

Loading

Uh oh!

springmeyer Jun 29, 2024

Uh oh!

mactrem commented Jun 29, 2024 •

edited

Loading

Uh oh!

springmeyer commented Jun 29, 2024

Uh oh!

mactrem commented Sep 25, 2024

Uh oh!

Uh oh!

	const decode = async (impl, earcut: boolean) => {
	const tile = await impl();
	const layerNames = Object.keys(tile.layers).sort((a, b) => a.localeCompare(b));
	let featureCount = 0;
	let triangleCount = 0;
	for (const layerName of layerNames) {
	const layer = tile.layers[layerName];
	for (let i = 0; i < layer.length; i++) {
	const feature = layer.feature(i);
	const geometry = feature.loadGeometry();
	if (geometry.length > 0) {
	featureCount++;
	}
	if (earcut && feature.type === GeometryType.Polygon) {
	const triangles = await tessellate(geometry);
	triangleCount += triangles.length;
	}
	}
	}
	return { featureCount, triangleCount };
	}

Improve performance and add additional benchmarks for the Java decoder #220

Are you sure you want to change the base?

Improve performance and add additional benchmarks for the Java decoder #220

Uh oh!

Conversation

mactrem commented Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

springmeyer commented Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

springmeyer left a comment

Choose a reason for hiding this comment

Uh oh!

springmeyer Jun 29, 2024

Choose a reason for hiding this comment

Uh oh!

mactrem Jun 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mactrem Jun 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

springmeyer Jun 29, 2024

Choose a reason for hiding this comment

Uh oh!

mactrem Jun 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

springmeyer Jun 29, 2024

Choose a reason for hiding this comment

Uh oh!

mactrem commented Jun 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

springmeyer commented Jun 29, 2024

Uh oh!

mactrem commented Sep 25, 2024

Uh oh!

Uh oh!

mactrem commented Jun 28, 2024 •

edited

Loading

springmeyer commented Jun 28, 2024 •

edited

Loading

mactrem Jun 29, 2024 •

edited

Loading

mactrem Jun 29, 2024 •

edited

Loading

mactrem Jun 29, 2024 •

edited

Loading

mactrem commented Jun 29, 2024 •

edited

Loading