Improve throughput of metrics collections in SimpleCollector #459

Falland · 2019-02-18T18:45:29Z

With this change the throughput of labels method call is improved by creation of overloaded versions of labels method for 1 - 4 arguments. Vararg method is still there for cases with more labels and for backward compatibility.
Performance tests are included in the commit to prove the improvement.

Note! The change is amending SimpleCollector children field type from ConcurrentHashMap<List, Child> to ConcurrentHashMap<LabelsTuple, Child> type, which might break some client code.

brian-brazil · 2019-02-19T11:00:47Z

What do the benchmarks show?

The change is amending SimpleCollector children field type from ConcurrentHashMap<List, Child> to ConcurrentHashMap<LabelsTuple, Child> type, which might break some client code.

Noone should be depending on that, SimpleCollector is considered internal.

Falland · 2019-02-19T13:29:55Z

The CreationBenchmark shows that creation of the Tuple objects is at least as fast as Arrays.asList() call. Noticed a copy-paste artifact in main method, I'll fix that later today.

The SearchBenchmark shows that search in HashMap with Tuple objects as a key is as fast if not faster than with List key.

But the difference is in equals implementation. Tuples (except MultipleLabels) do not create iterators for comparison, so they generate less garbage pressure. Also the overloaded method call does not create array object for VarArgs which reduces garbage pressure and saves some cycles.

I think I can add VarArg vs overloaded method call benchmarks to show the effect.

Regarding field type change, I just wanted to make it obvious. But I can remove this part from the comment.

brian-brazil · 2019-02-19T14:15:19Z

Can you share the benchmark results?

Falland · 2019-02-20T06:11:35Z

So I've added new benchmark, as I said The results are:
SimpleCollectorLabelsBenchmark:
Benchmark i.p.b.SimpleCollectorLabelsBenchmark.arrayCall1 thrpt i.p.b.SimpleCollectorLabelsBenchmark.arrayCall2 thrpt i.p.b.SimpleCollectorLabelsBenchmark.arrayCall3 thrpt i.p.b.SimpleCollectorLabelsBenchmark.arrayCall4 thrpt i.p.b.SimpleCollectorLabelsBenchmark.arrayCall5 thrpt i.p.b.SimpleCollectorLabelsBenchmark.baseline i.p.b.SimpleCollectorLabelsBenchmark.tupleCall1 thrpt i.p.b.SimpleCollectorLabelsBenchmark.tupleCall2 thrpt i.p.b.SimpleCollectorLabelsBenchmark.tupleCall3 thrpt i.p.b.SimpleCollectorLabelsBenchmark.tupleCall4 thrpt i.p.b.SimpleCollectorLabelsBenchmark.tupleCall5 thrpt i.p.b.SimpleCollectorLabelsBenchmark.arrayCall1 i.p.b.SimpleCollectorLabelsBenchmark.arrayCall2 i.p.b.SimpleCollectorLabelsBenchmark.arrayCall3 i.p.b.SimpleCollectorLabelsBenchmark.arrayCall4 i.p.b.SimpleCollectorLabelsBenchmark.arrayCall5 i.p.b.SimpleCollectorLabelsBenchmark.baseline i.p.b.SimpleCollectorLabelsBenchmark.tupleCall1 i.p.b.SimpleCollectorLabelsBenchmark.tupleCall2 i.p.b.SimpleCollectorLabelsBenchmark.tupleCall3 i.p.b.SimpleCollectorLabelsBenchmark.tupleCall4 i.p.b.SimpleCollectorLabelsBenchmark.tupleCall5 in previous comment.
Mode Samples Score Error Units
80 0.026 ± 0.002 ops/ns
80 0.024 ± 0.001 ops/ns
80 0.021 ± 0.001 ops/ns
80 0.019 ± 0.001 ops/ns
80 0.014 ± 0.001 ops/ns
thrpt 80 0.131 ± 0.001 ops/ns
80 0.082 ± 0.003 ops/ns
80 0.065 ± 0.005 ops/ns
80 0.077 ± 0.000 ops/ns
80 0.067 ± 0.000 ops/ns
80 0.015 ± 0.001 ops/ns
avgt 80 37.963 ± 1.966 ns/op
avgt 80 42.344 ± 2.171 ns/op
avgt 80 48.198 ± 3.474 ns/op
avgt 80 56.619 ± 3.971 ns/op
avgt 80 61.957 ± 3.670 ns/op
avgt 80 7.659 ± 0.061 ns/op
avgt 80 12.302 ± 0.562 ns/op
avgt 80 15.747 ± 1.161 ns/op
avgt 80 13.121 ± 0.134 ns/op
avgt 80 15.037 ± 0.139 ns/op
avgt 80 66.698 ± 6.103 ns/op

SearchBenchmark:
Benchmark (labelNamesCount) Mode Samples Score Error Units
i.p.b.SearchBenchmark.arraySearch 1 thrpt 80 0.028 ± 0.001 ops/ns
i.p.b.SearchBenchmark.arraySearch 2 thrpt 80 0.022 ± 0.000 ops/ns
i.p.b.SearchBenchmark.arraySearch 3 thrpt 80 0.018 ± 0.000 ops/ns
i.p.b.SearchBenchmark.arraySearch 4 thrpt 80 0.015 ± 0.000 ops/ns
i.p.b.SearchBenchmark.arraySearch 5 thrpt 80 0.013 ± 0.000 ops/ns
i.p.b.SearchBenchmark.baseline 1 thrpt 80 0.119 ± 0.003 ops/ns
i.p.b.SearchBenchmark.baseline 2 thrpt 80 0.119 ± 0.003 ops/ns
i.p.b.SearchBenchmark.baseline 3 thrpt 80 0.116 ± 0.002 ops/ns
i.p.b.SearchBenchmark.baseline 4 thrpt 80 0.117 ± 0.002 ops/ns
i.p.b.SearchBenchmark.baseline 5 thrpt 80 0.116 ± 0.002 ops/ns
i.p.b.SearchBenchmark.tupleSearch 1 thrpt 80 0.044 ± 0.000 ops/ns
i.p.b.SearchBenchmark.tupleSearch 2 thrpt 80 0.035 ± 0.000 ops/ns
i.p.b.SearchBenchmark.tupleSearch 3 thrpt 80 0.027 ± 0.000 ops/ns
i.p.b.SearchBenchmark.tupleSearch 4 thrpt 80 0.024 ± 0.000 ops/ns
i.p.b.SearchBenchmark.tupleSearch 5 thrpt 80 0.019 ± 0.001 ops/ns
i.p.b.SearchBenchmark.arraySearch 1 avgt 80 35.894 ± 0.309 ns/op
i.p.b.SearchBenchmark.arraySearch 2 avgt 80 45.558 ± 0.386 ns/op
i.p.b.SearchBenchmark.arraySearch 3 avgt 80 57.248 ± 1.409 ns/op
i.p.b.SearchBenchmark.arraySearch 4 avgt 80 67.549 ± 3.120 ns/op
i.p.b.SearchBenchmark.arraySearch 5 avgt 80 76.340 ± 0.600 ns/op
i.p.b.SearchBenchmark.baseline 1 avgt 80 8.557 ± 0.170 ns/op
i.p.b.SearchBenchmark.baseline 2 avgt 80 8.616 ± 0.149 ns/op
i.p.b.SearchBenchmark.baseline 3 avgt 80 8.625 ± 0.123 ns/op
i.p.b.SearchBenchmark.baseline 4 avgt 80 8.640 ± 0.138 ns/op
i.p.b.SearchBenchmark.baseline 5 avgt 80 8.745 ± 0.063 ns/op
i.p.b.SearchBenchmark.tupleSearch 1 avgt 80 22.626 ± 0.142 ns/op
i.p.b.SearchBenchmark.tupleSearch 2 avgt 80 28.977 ± 0.244 ns/op
i.p.b.SearchBenchmark.tupleSearch 3 avgt 80 36.333 ± 0.339 ns/op
i.p.b.SearchBenchmark.tupleSearch 4 avgt 80 41.864 ± 0.338 ns/op
i.p.b.SearchBenchmark.tupleSearch 5 avgt 80 49.992 ± 2.223 ns/op

CreationBenchmark:
Benchmark Mode Samples Score Error Units
i.p.b.CreationBenchmark.baseline thrpt 4 0.330 ± 0.658 ops/ns
i.p.b.CreationBenchmark.doubleLabel thrpt 4 0.259 ± 0.425 ops/ns
i.p.b.CreationBenchmark.listCreation thrpt 4 0.244 ± 0.163 ops/ns
i.p.b.CreationBenchmark.multipleLabels thrpt 4 0.156 ± 0.131 ops/ns
i.p.b.CreationBenchmark.quadrupleLabel thrpt 4 0.214 ± 0.310 ops/ns
i.p.b.CreationBenchmark.singleLabel thrpt 4 0.315 ± 0.476 ops/ns
i.p.b.CreationBenchmark.tripleLabel thrpt 4 0.251 ± 0.244 ops/ns
i.p.b.CreationBenchmark.baseline avgt 4 11.386 ± 5.510 ns/op
i.p.b.CreationBenchmark.doubleLabel avgt 4 15.443 ± 23.386 ns/op
i.p.b.CreationBenchmark.listCreation avgt 4 17.439 ± 18.226 ns/op
i.p.b.CreationBenchmark.multipleLabels avgt 4 26.452 ± 45.732 ns/op
i.p.b.CreationBenchmark.quadrupleLabel avgt 4 19.426 ± 27.598 ns/op
i.p.b.CreationBenchmark.singleLabel avgt 4 13.065 ± 11.769 ns/op
i.p.b.CreationBenchmark.tripleLabel avgt 4 16.742 ± 17.646 ns/op

This results show that tuples are ~3x time faster. And additionally there's less garbage pressure.

My machine is somewhat old (i5-2520M CPU @ 2.50GHz), I can try to find newer machine and run tests there

Falland · 2019-02-20T07:57:07Z

I've managed to run also on i7-6700T CPU @ 2.80GHz
SimpleCollectionLabelsBenchmark
Benchmark Mode Samples Score Error Units
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall1 thrpt 8 0.069 ± 0.063 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall2 thrpt 8 0.047 ± 0.044 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall3 thrpt 8 0.042 ± 0.044 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall4 thrpt 8 0.042 ± 0.051 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall5 thrpt 8 0.031 ± 0.012 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.baseline thrpt 8 0.291 ± 0.346 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall1 thrpt 8 0.208 ± 0.226 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall2 thrpt 8 0.134 ± 0.138 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall3 thrpt 8 0.161 ± 0.140 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall4 thrpt 8 0.108 ± 0.026 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall5 thrpt 8 0.030 ± 0.029 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall1 avgt 8 93.746 ± 27.907 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall2 avgt 8 100.710 ± 61.149 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall3 avgt 8 110.253 ± 57.991 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall4 avgt 8 128.649 ± 28.330 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall5 avgt 8 213.785 ± 129.305 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.baseline avgt 8 16.362 ± 7.389 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall1 avgt 8 26.895 ± 14.789 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall2 avgt 8 31.832 ± 20.529 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall3 avgt 8 26.871 ± 13.659 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall4 avgt 8 34.876 ± 10.481 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall5 avgt 8 130.360 ± 75.997 ns/op

SearchBenchmark:
Benchmark (labelNamesCount) Mode Samples Score Error Units
i.p.b.SearchBenchmark.arraySearch 1 thrpt 8 0.062 ± 0.079 ops/ns
i.p.b.SearchBenchmark.arraySearch 2 thrpt 8 0.037 ± 0.005 ops/ns
i.p.b.SearchBenchmark.arraySearch 3 thrpt 8 0.045 ± 0.047 ops/ns
i.p.b.SearchBenchmark.arraySearch 4 thrpt 8 0.037 ± 0.042 ops/ns
i.p.b.SearchBenchmark.arraySearch 5 thrpt 8 0.021 ± 0.005 ops/ns
i.p.b.SearchBenchmark.baseline 1 thrpt 8 0.291 ± 0.290 ops/ns
i.p.b.SearchBenchmark.baseline 2 thrpt 8 0.253 ± 0.121 ops/ns
i.p.b.SearchBenchmark.baseline 3 thrpt 8 0.290 ± 0.292 ops/ns
i.p.b.SearchBenchmark.baseline 4 thrpt 8 0.306 ± 0.284 ops/ns
i.p.b.SearchBenchmark.baseline 5 thrpt 8 0.281 ± 0.333 ops/ns
i.p.b.SearchBenchmark.tupleSearch 1 thrpt 8 0.085 ± 0.056 ops/ns
i.p.b.SearchBenchmark.tupleSearch 2 thrpt 8 0.052 ± 0.006 ops/ns
i.p.b.SearchBenchmark.tupleSearch 3 thrpt 8 0.062 ± 0.072 ops/ns
i.p.b.SearchBenchmark.tupleSearch 4 thrpt 8 0.050 ± 0.049 ops/ns
i.p.b.SearchBenchmark.tupleSearch 5 thrpt 8 0.042 ± 0.031 ops/ns
i.p.b.SearchBenchmark.arraySearch 1 avgt 8 79.659 ± 41.841 ns/op
i.p.b.SearchBenchmark.arraySearch 2 avgt 8 93.728 ± 37.567 ns/op
i.p.b.SearchBenchmark.arraySearch 3 avgt 8 113.678 ± 84.731 ns/op
i.p.b.SearchBenchmark.arraySearch 4 avgt 8 160.693 ± 88.091 ns/op
i.p.b.SearchBenchmark.arraySearch 5 avgt 8 149.526 ± 81.394 ns/op
i.p.b.SearchBenchmark.baseline 1 avgt 8 16.414 ± 9.125 ns/op
i.p.b.SearchBenchmark.baseline 2 avgt 8 15.617 ± 8.230 ns/op
i.p.b.SearchBenchmark.baseline 3 avgt 8 16.022 ± 6.462 ns/op
i.p.b.SearchBenchmark.baseline 4 avgt 8 15.938 ± 7.077 ns/op
i.p.b.SearchBenchmark.baseline 5 avgt 8 13.846 ± 10.344 ns/op
i.p.b.SearchBenchmark.tupleSearch 1 avgt 8 57.411 ± 29.927 ns/op
i.p.b.SearchBenchmark.tupleSearch 2 avgt 8 58.463 ± 44.570 ns/op
i.p.b.SearchBenchmark.tupleSearch 3 avgt 8 78.666 ± 28.918 ns/op
i.p.b.SearchBenchmark.tupleSearch 4 avgt 8 90.855 ± 47.938 ns/op
i.p.b.SearchBenchmark.tupleSearch 5 avgt 8 120.726 ± 62.394 ns/op

CreationBenchmark:
Benchmark Mode Samples Score Error Units
i.p.b.CreationBenchmark.baseline thrpt 4 0.454 ± 2.006 ops/ns
i.p.b.CreationBenchmark.doubleLabel thrpt 4 0.203 ± 0.421 ops/ns
i.p.b.CreationBenchmark.listCreation thrpt 4 0.160 ± 0.454 ops/ns
i.p.b.CreationBenchmark.multipleLabels thrpt 4 0.181 ± 0.413 ops/ns
i.p.b.CreationBenchmark.quadrupleLabel thrpt 4 0.185 ± 0.418 ops/ns
i.p.b.CreationBenchmark.singleLabel thrpt 4 0.340 ± 0.653 ops/ns
i.p.b.CreationBenchmark.tripleLabel thrpt 4 0.219 ± 0.684 ops/ns
i.p.b.CreationBenchmark.baseline avgt 4 17.966 ± 40.697 ns/op
i.p.b.CreationBenchmark.doubleLabel avgt 4 26.008 ± 61.156 ns/op
i.p.b.CreationBenchmark.listCreation avgt 4 25.439 ± 24.678 ns/op
i.p.b.CreationBenchmark.multipleLabels avgt 4 42.000 ± 91.016 ns/op
i.p.b.CreationBenchmark.quadrupleLabel avgt 4 30.455 ± 52.186 ns/op
i.p.b.CreationBenchmark.singleLabel avgt 4 19.178 ± 28.010 ns/op
i.p.b.CreationBenchmark.tripleLabel avgt 4 24.150 ± 64.414 ns/op

Here the improvement is even more evident.
P.S. I don't think Search and Creation Benchmarks make sense now, probably i should keep only SimpleCollectorLabelsBenchmark as it covers the bigger picture

brian-brazil · 2019-02-20T15:12:56Z

Do you have benchmarks with the increments? That's the most realistic.

Falland · 2019-02-20T16:50:03Z

Hi Brian, what do you mean by increments? But of course I can try to build another benchmark.

brian-brazil · 2019-02-20T18:38:15Z

Don't just fetch the Child, use it.

Falland · 2019-02-21T18:34:24Z

I've changed the benchmark and run it on my old machine, will add results from new machine tomorrow

new version
Benchmark Mode Samples Score Error Units
i.p.b.SimpleCollectorLabelsBenchmark.baseline thrpt 80 0.074 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels thrpt 80 0.013 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels thrpt 80 0.034 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel thrpt 80 0.058 ± 0.002 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels thrpt 80 0.039 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels thrpt 80 0.041 ± 0.002 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.baseline avgt 80 13.543 ± 0.073 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels avgt 80 77.987 ± 3.986 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels avgt 80 29.810 ± 1.092 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel avgt 80 17.273 ± 0.532 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels avgt 80 25.596 ± 0.850 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels avgt 80 24.743 ± 1.089 ns/op

old version
Benchmark Mode Samples Score Error Units
i.p.b.SimpleCollectorLabelsBenchmark.baseline thrpt 80 0.074 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels thrpt 80 0.014 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels thrpt 80 0.015 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel thrpt 80 0.023 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels thrpt 80 0.017 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels thrpt 80 0.022 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.baseline avgt 80 13.597 ± 0.076 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels avgt 80 71.111 ± 4.641 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels avgt 80 68.208 ± 4.215 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel avgt 80 43.849 ± 3.143 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels avgt 80 62.715 ± 4.900 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels avgt 80 51.612 ± 3.816 ns/op

Falland · 2019-02-22T08:01:34Z

I've run this on newer machine:
new version
Benchmark Mode Samples Score Error Units
i.p.b.SimpleCollectorLabelsBenchmark.baseline thrpt 8 0.143 ± 0.142 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels thrpt 8 0.024 ± 0.018 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels thrpt 8 0.053 ± 0.067 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel thrpt 8 0.083 ± 0.116 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels thrpt 8 0.061 ± 0.070 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels thrpt 8 0.056 ± 0.023 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.baseline avgt 8 37.850 ± 18.198 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels avgt 8 212.125 ± 71.695 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels avgt 8 95.351 ± 25.224 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel avgt 8 62.378 ± 27.395 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels avgt 8 78.572 ± 42.919 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels avgt 8 73.998 ± 40.895 ns/op

old version
Benchmark Mode Samples Score Error Units
i.p.b.SimpleCollectorLabelsBenchmark.baseline thrpt 8 0.147 ± 0.136 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels thrpt 8 0.028 ± 0.038 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels thrpt 8 0.016 ± 0.008 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel thrpt 8 0.039 ± 0.044 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels thrpt 8 0.033 ± 0.029 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels thrpt 8 0.029 ± 0.011 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.baseline avgt 8 36.050 ± 7.549 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels avgt 8 173.077 ± 115.384 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels avgt 8 210.191 ± 108.994 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel avgt 8 127.932 ± 20.187 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels avgt 8 108.859 ± 127.175 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels avgt 8 126.915 ± 100.411 ns/op

Falland · 2019-02-27T18:48:52Z

I've added one more method to allow no gc use for clients that really want it, the old api is still in place.
With new API clients can provide Labels object directly to the labels method, excluding internal object allocation.
If the allocation of a small objects is really not feasible, then clients can reuse Labels objects to lookup Child with labels method.

Here are the updated results with no GC method.
Benchmark Mode Samples Score Error Units
i.p.b.SimpleCollectorLabelsBenchmark.baseline thrpt 80 0.074 ± 0.000 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels thrpt 80 0.002 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabelsNoGC thrpt 80 0.073 ± 0.000 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels thrpt 80 0.035 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fourLabelsNoGC thrpt 80 0.048 ± 0.002 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel thrpt 80 0.054 ± 0.003 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.oneLabelNoGC thrpt 80 0.072 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels thrpt 80 0.037 ± 0.002 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.threeLabelsNoGC thrpt 80 0.072 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels thrpt 80 0.043 ± 0.002 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.twoLabelsNoGC thrpt 80 0.073 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.baseline avgt 80 13.602 ± 0.091 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels avgt 80 2212.967 ± 1187.174 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabelsNoGC avgt 80 13.677 ± 0.099 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels avgt 80 28.888 ± 1.065 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fourLabelsNoGC avgt 80 21.480 ± 1.889 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel avgt 80 18.630 ± 1.059 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.oneLabelNoGC avgt 80 13.875 ± 0.141 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels avgt 80 27.301 ± 1.309 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.threeLabelsNoGC avgt 80 13.856 ± 0.130 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels avgt 80 24.067 ± 1.095 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.twoLabelsNoGC avgt 80 13.842 ± 0.114 ns/op

brian-brazil · 2019-02-27T18:55:52Z

If they can call OneLabel, then they can also cache the child and avoid this codepath all together.

Falland · 2019-02-27T19:32:55Z

Yes, you are right. Then probably this method is not needed. My assumption was that this allocation is not a problem for most of the users. Only for rare cases where the allocation is really critical users can build caching on their side to reduce it more. In the end java is the language with managed memory, for full GC free set-up one should choose different language probably.

Falland · 2019-02-27T19:33:49Z

Do you want me to remove the method? Then I will make all the Labels classes package private as well.

brian-brazil · 2019-02-27T19:40:26Z

I don't see a need for the method personally.

Falland · 2019-02-27T19:45:55Z

Ok, I don't think it is important either. And the fact that I've struggled to explain the use case in javadoc is a good sign that it's not very obvious. Let me remove it.

Falland · 2019-02-28T13:52:44Z

Hi Brian, I was thinking about the caching Child in client code. I think you are right, this should be the the recommendation for the most latency critical java clients. But I still believe that my pull request makes sense and pushes the need of building this sophisticated client caching layer down to very small amount of all the use cases. Especially having that the code change is not that big and does not affect the clarity too much.

brian-brazil · 2019-02-28T13:57:45Z

I didn't say otherwise, I've just not had time yet to dig into all these proposals and see which is best. If yours was the only one, it'd probably be merged by now.

This is an optimization of the SimpleCollector.labels(...) lookups with a similar goal to prometheus#445 and prometheus#459. It has some things in common with those PRs (including overridden fixed-args versions) but aims to provide best of all worlds - zero garbage and higher throughput for all label counts, without any reliance on thread reuse. To achieve this, ConcurrentHashMap is abandoned in favour of a custom copy-on-write linear-probe hashtable. Benchmark results Before: Benchmark Mode Cnt Score Error Units baseline thrpt 20 84731357.558 ± 535745.023 ops/s oneLabel thrpt 20 36415789.294 ± 441116.974 ops/s twoLabels thrpt 20 33301282.259 ± 313669.132 ops/s threeLabels thrpt 20 24560630.904 ± 2247040.286 ops/s fourLabels thrpt 20 24424456.896 ± 288989.596 ops/s fiveLabels thrpt 20 18356036.944 ± 949244.712 ops/s After: Benchmark Mode Cnt Score Error Units baseline thrpt 20 84866162.495 ± 823753.503 ops/s oneLabel thrpt 20 84554174.645 ± 804735.949 ops/s twoLabels thrpt 20 85004332.529 ± 689559.035 ops/s threeLabels thrpt 20 73395533.440 ± 3022384.940 ops/s fourLabels thrpt 20 68736143.734 ± 1872048.923 ops/s fiveLabels thrpt 20 53482207.003 ± 488751.990 ops/s This benchmark, like the prior ones, only tests with a single sequence of labels for each count. It would be good to extend it to cover cases where the map is populated with a larger number of children. Signed-off-by: nickhill <[email protected]>

Falland force-pushed the master branch from 2b2d3aa to 0f3f25e Compare February 18, 2019 18:46

Falland force-pushed the master branch 2 times, most recently from faaf264 to 1681e2c Compare February 20, 2019 06:08

Falland force-pushed the master branch from 1681e2c to d7f1520 Compare February 21, 2019 18:21

njhill mentioned this pull request Feb 23, 2019

Yet another labels-to-Child lookup optimization #460

Closed

Falland force-pushed the master branch from d7f1520 to a99a38f Compare February 27, 2019 18:47

Falland force-pushed the master branch from a99a38f to 91b152d Compare February 27, 2019 19:23

Falland force-pushed the master branch 2 times, most recently from 41e6239 to b5833ff Compare February 27, 2019 19:51

Falland closed this Jun 9, 2019

Falland force-pushed the master branch from fbe7eb0 to 4e0e752 Compare June 9, 2019 15:59

brian-brazil mentioned this pull request Jun 19, 2019

Zero gc labels lookup #486

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve throughput of metrics collections in SimpleCollector #459

Improve throughput of metrics collections in SimpleCollector #459

Falland commented Feb 18, 2019

brian-brazil commented Feb 19, 2019

Falland commented Feb 19, 2019

brian-brazil commented Feb 19, 2019

Falland commented Feb 20, 2019 •

edited

Loading

Falland commented Feb 20, 2019

brian-brazil commented Feb 20, 2019

Falland commented Feb 20, 2019

brian-brazil commented Feb 20, 2019

Falland commented Feb 21, 2019

Falland commented Feb 22, 2019

Falland commented Feb 27, 2019 •

edited

Loading

brian-brazil commented Feb 27, 2019

Falland commented Feb 27, 2019

Falland commented Feb 27, 2019

brian-brazil commented Feb 27, 2019

Falland commented Feb 27, 2019

Falland commented Feb 28, 2019

brian-brazil commented Feb 28, 2019

Improve throughput of metrics collections in SimpleCollector #459

Improve throughput of metrics collections in SimpleCollector #459

Conversation

Falland commented Feb 18, 2019

brian-brazil commented Feb 19, 2019

Falland commented Feb 19, 2019

brian-brazil commented Feb 19, 2019

Falland commented Feb 20, 2019 • edited Loading

Falland commented Feb 20, 2019

brian-brazil commented Feb 20, 2019

Falland commented Feb 20, 2019

brian-brazil commented Feb 20, 2019

Falland commented Feb 21, 2019

Falland commented Feb 22, 2019

Falland commented Feb 27, 2019 • edited Loading

brian-brazil commented Feb 27, 2019

Falland commented Feb 27, 2019

Falland commented Feb 27, 2019

brian-brazil commented Feb 27, 2019

Falland commented Feb 27, 2019

Falland commented Feb 28, 2019

brian-brazil commented Feb 28, 2019

Falland commented Feb 20, 2019 •

edited

Loading

Falland commented Feb 27, 2019 •

edited

Loading