Skip to content

Improve throughput of metrics collections in SimpleCollector #459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from

Conversation

Falland
Copy link

@Falland Falland commented Feb 18, 2019

With this change the throughput of labels method call is improved by creation of overloaded versions of labels method for 1 - 4 arguments. Vararg method is still there for cases with more labels and for backward compatibility.
Performance tests are included in the commit to prove the improvement.

Note! The change is amending SimpleCollector children field type from ConcurrentHashMap<List, Child> to ConcurrentHashMap<LabelsTuple, Child> type, which might break some client code.

@brian-brazil
Copy link
Contributor

What do the benchmarks show?

The change is amending SimpleCollector children field type from ConcurrentHashMap<List, Child> to ConcurrentHashMap<LabelsTuple, Child> type, which might break some client code.

Noone should be depending on that, SimpleCollector is considered internal.

@Falland
Copy link
Author

Falland commented Feb 19, 2019

The CreationBenchmark shows that creation of the Tuple objects is at least as fast as Arrays.asList() call. Noticed a copy-paste artifact in main method, I'll fix that later today.

The SearchBenchmark shows that search in HashMap with Tuple objects as a key is as fast if not faster than with List key.

But the difference is in equals implementation. Tuples (except MultipleLabels) do not create iterators for comparison, so they generate less garbage pressure. Also the overloaded method call does not create array object for VarArgs which reduces garbage pressure and saves some cycles.

I think I can add VarArg vs overloaded method call benchmarks to show the effect.

Regarding field type change, I just wanted to make it obvious. But I can remove this part from the comment.

@brian-brazil
Copy link
Contributor

Can you share the benchmark results?

@Falland Falland force-pushed the master branch 2 times, most recently from faaf264 to 1681e2c Compare February 20, 2019 06:08
@Falland
Copy link
Author

Falland commented Feb 20, 2019

So I've added new benchmark, as I said in previous comment.
The results are:
SimpleCollectorLabelsBenchmark:
Benchmark Mode Samples Score Error Units
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall1 thrpt 80 0.026 ± 0.002 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall2 thrpt 80 0.024 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall3 thrpt 80 0.021 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall4 thrpt 80 0.019 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall5 thrpt 80 0.014 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.baseline thrpt 80 0.131 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall1 thrpt 80 0.082 ± 0.003 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall2 thrpt 80 0.065 ± 0.005 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall3 thrpt 80 0.077 ± 0.000 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall4 thrpt 80 0.067 ± 0.000 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall5 thrpt 80 0.015 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall1 avgt 80 37.963 ± 1.966 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall2 avgt 80 42.344 ± 2.171 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall3 avgt 80 48.198 ± 3.474 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall4 avgt 80 56.619 ± 3.971 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall5 avgt 80 61.957 ± 3.670 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.baseline avgt 80 7.659 ± 0.061 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall1 avgt 80 12.302 ± 0.562 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall2 avgt 80 15.747 ± 1.161 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall3 avgt 80 13.121 ± 0.134 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall4 avgt 80 15.037 ± 0.139 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall5 avgt 80 66.698 ± 6.103 ns/op

SearchBenchmark:
Benchmark (labelNamesCount) Mode Samples Score Error Units
i.p.b.SearchBenchmark.arraySearch 1 thrpt 80 0.028 ± 0.001 ops/ns
i.p.b.SearchBenchmark.arraySearch 2 thrpt 80 0.022 ± 0.000 ops/ns
i.p.b.SearchBenchmark.arraySearch 3 thrpt 80 0.018 ± 0.000 ops/ns
i.p.b.SearchBenchmark.arraySearch 4 thrpt 80 0.015 ± 0.000 ops/ns
i.p.b.SearchBenchmark.arraySearch 5 thrpt 80 0.013 ± 0.000 ops/ns
i.p.b.SearchBenchmark.baseline 1 thrpt 80 0.119 ± 0.003 ops/ns
i.p.b.SearchBenchmark.baseline 2 thrpt 80 0.119 ± 0.003 ops/ns
i.p.b.SearchBenchmark.baseline 3 thrpt 80 0.116 ± 0.002 ops/ns
i.p.b.SearchBenchmark.baseline 4 thrpt 80 0.117 ± 0.002 ops/ns
i.p.b.SearchBenchmark.baseline 5 thrpt 80 0.116 ± 0.002 ops/ns
i.p.b.SearchBenchmark.tupleSearch 1 thrpt 80 0.044 ± 0.000 ops/ns
i.p.b.SearchBenchmark.tupleSearch 2 thrpt 80 0.035 ± 0.000 ops/ns
i.p.b.SearchBenchmark.tupleSearch 3 thrpt 80 0.027 ± 0.000 ops/ns
i.p.b.SearchBenchmark.tupleSearch 4 thrpt 80 0.024 ± 0.000 ops/ns
i.p.b.SearchBenchmark.tupleSearch 5 thrpt 80 0.019 ± 0.001 ops/ns
i.p.b.SearchBenchmark.arraySearch 1 avgt 80 35.894 ± 0.309 ns/op
i.p.b.SearchBenchmark.arraySearch 2 avgt 80 45.558 ± 0.386 ns/op
i.p.b.SearchBenchmark.arraySearch 3 avgt 80 57.248 ± 1.409 ns/op
i.p.b.SearchBenchmark.arraySearch 4 avgt 80 67.549 ± 3.120 ns/op
i.p.b.SearchBenchmark.arraySearch 5 avgt 80 76.340 ± 0.600 ns/op
i.p.b.SearchBenchmark.baseline 1 avgt 80 8.557 ± 0.170 ns/op
i.p.b.SearchBenchmark.baseline 2 avgt 80 8.616 ± 0.149 ns/op
i.p.b.SearchBenchmark.baseline 3 avgt 80 8.625 ± 0.123 ns/op
i.p.b.SearchBenchmark.baseline 4 avgt 80 8.640 ± 0.138 ns/op
i.p.b.SearchBenchmark.baseline 5 avgt 80 8.745 ± 0.063 ns/op
i.p.b.SearchBenchmark.tupleSearch 1 avgt 80 22.626 ± 0.142 ns/op
i.p.b.SearchBenchmark.tupleSearch 2 avgt 80 28.977 ± 0.244 ns/op
i.p.b.SearchBenchmark.tupleSearch 3 avgt 80 36.333 ± 0.339 ns/op
i.p.b.SearchBenchmark.tupleSearch 4 avgt 80 41.864 ± 0.338 ns/op
i.p.b.SearchBenchmark.tupleSearch 5 avgt 80 49.992 ± 2.223 ns/op

CreationBenchmark:
Benchmark Mode Samples Score Error Units
i.p.b.CreationBenchmark.baseline thrpt 4 0.330 ± 0.658 ops/ns
i.p.b.CreationBenchmark.doubleLabel thrpt 4 0.259 ± 0.425 ops/ns
i.p.b.CreationBenchmark.listCreation thrpt 4 0.244 ± 0.163 ops/ns
i.p.b.CreationBenchmark.multipleLabels thrpt 4 0.156 ± 0.131 ops/ns
i.p.b.CreationBenchmark.quadrupleLabel thrpt 4 0.214 ± 0.310 ops/ns
i.p.b.CreationBenchmark.singleLabel thrpt 4 0.315 ± 0.476 ops/ns
i.p.b.CreationBenchmark.tripleLabel thrpt 4 0.251 ± 0.244 ops/ns
i.p.b.CreationBenchmark.baseline avgt 4 11.386 ± 5.510 ns/op
i.p.b.CreationBenchmark.doubleLabel avgt 4 15.443 ± 23.386 ns/op
i.p.b.CreationBenchmark.listCreation avgt 4 17.439 ± 18.226 ns/op
i.p.b.CreationBenchmark.multipleLabels avgt 4 26.452 ± 45.732 ns/op
i.p.b.CreationBenchmark.quadrupleLabel avgt 4 19.426 ± 27.598 ns/op
i.p.b.CreationBenchmark.singleLabel avgt 4 13.065 ± 11.769 ns/op
i.p.b.CreationBenchmark.tripleLabel avgt 4 16.742 ± 17.646 ns/op

This results show that tuples are ~3x time faster. And additionally there's less garbage pressure.

My machine is somewhat old (i5-2520M CPU @ 2.50GHz), I can try to find newer machine and run tests there

@Falland
Copy link
Author

Falland commented Feb 20, 2019

I've managed to run also on i7-6700T CPU @ 2.80GHz
SimpleCollectionLabelsBenchmark
Benchmark Mode Samples Score Error Units
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall1 thrpt 8 0.069 ± 0.063 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall2 thrpt 8 0.047 ± 0.044 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall3 thrpt 8 0.042 ± 0.044 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall4 thrpt 8 0.042 ± 0.051 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall5 thrpt 8 0.031 ± 0.012 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.baseline thrpt 8 0.291 ± 0.346 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall1 thrpt 8 0.208 ± 0.226 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall2 thrpt 8 0.134 ± 0.138 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall3 thrpt 8 0.161 ± 0.140 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall4 thrpt 8 0.108 ± 0.026 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall5 thrpt 8 0.030 ± 0.029 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall1 avgt 8 93.746 ± 27.907 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall2 avgt 8 100.710 ± 61.149 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall3 avgt 8 110.253 ± 57.991 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall4 avgt 8 128.649 ± 28.330 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.arrayCall5 avgt 8 213.785 ± 129.305 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.baseline avgt 8 16.362 ± 7.389 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall1 avgt 8 26.895 ± 14.789 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall2 avgt 8 31.832 ± 20.529 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall3 avgt 8 26.871 ± 13.659 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall4 avgt 8 34.876 ± 10.481 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.tupleCall5 avgt 8 130.360 ± 75.997 ns/op

SearchBenchmark:
Benchmark (labelNamesCount) Mode Samples Score Error Units
i.p.b.SearchBenchmark.arraySearch 1 thrpt 8 0.062 ± 0.079 ops/ns
i.p.b.SearchBenchmark.arraySearch 2 thrpt 8 0.037 ± 0.005 ops/ns
i.p.b.SearchBenchmark.arraySearch 3 thrpt 8 0.045 ± 0.047 ops/ns
i.p.b.SearchBenchmark.arraySearch 4 thrpt 8 0.037 ± 0.042 ops/ns
i.p.b.SearchBenchmark.arraySearch 5 thrpt 8 0.021 ± 0.005 ops/ns
i.p.b.SearchBenchmark.baseline 1 thrpt 8 0.291 ± 0.290 ops/ns
i.p.b.SearchBenchmark.baseline 2 thrpt 8 0.253 ± 0.121 ops/ns
i.p.b.SearchBenchmark.baseline 3 thrpt 8 0.290 ± 0.292 ops/ns
i.p.b.SearchBenchmark.baseline 4 thrpt 8 0.306 ± 0.284 ops/ns
i.p.b.SearchBenchmark.baseline 5 thrpt 8 0.281 ± 0.333 ops/ns
i.p.b.SearchBenchmark.tupleSearch 1 thrpt 8 0.085 ± 0.056 ops/ns
i.p.b.SearchBenchmark.tupleSearch 2 thrpt 8 0.052 ± 0.006 ops/ns
i.p.b.SearchBenchmark.tupleSearch 3 thrpt 8 0.062 ± 0.072 ops/ns
i.p.b.SearchBenchmark.tupleSearch 4 thrpt 8 0.050 ± 0.049 ops/ns
i.p.b.SearchBenchmark.tupleSearch 5 thrpt 8 0.042 ± 0.031 ops/ns
i.p.b.SearchBenchmark.arraySearch 1 avgt 8 79.659 ± 41.841 ns/op
i.p.b.SearchBenchmark.arraySearch 2 avgt 8 93.728 ± 37.567 ns/op
i.p.b.SearchBenchmark.arraySearch 3 avgt 8 113.678 ± 84.731 ns/op
i.p.b.SearchBenchmark.arraySearch 4 avgt 8 160.693 ± 88.091 ns/op
i.p.b.SearchBenchmark.arraySearch 5 avgt 8 149.526 ± 81.394 ns/op
i.p.b.SearchBenchmark.baseline 1 avgt 8 16.414 ± 9.125 ns/op
i.p.b.SearchBenchmark.baseline 2 avgt 8 15.617 ± 8.230 ns/op
i.p.b.SearchBenchmark.baseline 3 avgt 8 16.022 ± 6.462 ns/op
i.p.b.SearchBenchmark.baseline 4 avgt 8 15.938 ± 7.077 ns/op
i.p.b.SearchBenchmark.baseline 5 avgt 8 13.846 ± 10.344 ns/op
i.p.b.SearchBenchmark.tupleSearch 1 avgt 8 57.411 ± 29.927 ns/op
i.p.b.SearchBenchmark.tupleSearch 2 avgt 8 58.463 ± 44.570 ns/op
i.p.b.SearchBenchmark.tupleSearch 3 avgt 8 78.666 ± 28.918 ns/op
i.p.b.SearchBenchmark.tupleSearch 4 avgt 8 90.855 ± 47.938 ns/op
i.p.b.SearchBenchmark.tupleSearch 5 avgt 8 120.726 ± 62.394 ns/op

CreationBenchmark:
Benchmark Mode Samples Score Error Units
i.p.b.CreationBenchmark.baseline thrpt 4 0.454 ± 2.006 ops/ns
i.p.b.CreationBenchmark.doubleLabel thrpt 4 0.203 ± 0.421 ops/ns
i.p.b.CreationBenchmark.listCreation thrpt 4 0.160 ± 0.454 ops/ns
i.p.b.CreationBenchmark.multipleLabels thrpt 4 0.181 ± 0.413 ops/ns
i.p.b.CreationBenchmark.quadrupleLabel thrpt 4 0.185 ± 0.418 ops/ns
i.p.b.CreationBenchmark.singleLabel thrpt 4 0.340 ± 0.653 ops/ns
i.p.b.CreationBenchmark.tripleLabel thrpt 4 0.219 ± 0.684 ops/ns
i.p.b.CreationBenchmark.baseline avgt 4 17.966 ± 40.697 ns/op
i.p.b.CreationBenchmark.doubleLabel avgt 4 26.008 ± 61.156 ns/op
i.p.b.CreationBenchmark.listCreation avgt 4 25.439 ± 24.678 ns/op
i.p.b.CreationBenchmark.multipleLabels avgt 4 42.000 ± 91.016 ns/op
i.p.b.CreationBenchmark.quadrupleLabel avgt 4 30.455 ± 52.186 ns/op
i.p.b.CreationBenchmark.singleLabel avgt 4 19.178 ± 28.010 ns/op
i.p.b.CreationBenchmark.tripleLabel avgt 4 24.150 ± 64.414 ns/op

Here the improvement is even more evident.
P.S. I don't think Search and Creation Benchmarks make sense now, probably i should keep only SimpleCollectorLabelsBenchmark as it covers the bigger picture

@brian-brazil
Copy link
Contributor

Do you have benchmarks with the increments? That's the most realistic.

@Falland
Copy link
Author

Falland commented Feb 20, 2019

Hi Brian, what do you mean by increments? But of course I can try to build another benchmark.

@brian-brazil
Copy link
Contributor

Don't just fetch the Child, use it.

@Falland
Copy link
Author

Falland commented Feb 21, 2019

I've changed the benchmark and run it on my old machine, will add results from new machine tomorrow

new version
Benchmark Mode Samples Score Error Units
i.p.b.SimpleCollectorLabelsBenchmark.baseline thrpt 80 0.074 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels thrpt 80 0.013 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels thrpt 80 0.034 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel thrpt 80 0.058 ± 0.002 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels thrpt 80 0.039 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels thrpt 80 0.041 ± 0.002 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.baseline avgt 80 13.543 ± 0.073 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels avgt 80 77.987 ± 3.986 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels avgt 80 29.810 ± 1.092 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel avgt 80 17.273 ± 0.532 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels avgt 80 25.596 ± 0.850 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels avgt 80 24.743 ± 1.089 ns/op

old version
Benchmark Mode Samples Score Error Units
i.p.b.SimpleCollectorLabelsBenchmark.baseline thrpt 80 0.074 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels thrpt 80 0.014 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels thrpt 80 0.015 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel thrpt 80 0.023 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels thrpt 80 0.017 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels thrpt 80 0.022 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.baseline avgt 80 13.597 ± 0.076 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels avgt 80 71.111 ± 4.641 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels avgt 80 68.208 ± 4.215 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel avgt 80 43.849 ± 3.143 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels avgt 80 62.715 ± 4.900 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels avgt 80 51.612 ± 3.816 ns/op

@Falland
Copy link
Author

Falland commented Feb 22, 2019

I've run this on newer machine:
new version
Benchmark Mode Samples Score Error Units
i.p.b.SimpleCollectorLabelsBenchmark.baseline thrpt 8 0.143 ± 0.142 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels thrpt 8 0.024 ± 0.018 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels thrpt 8 0.053 ± 0.067 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel thrpt 8 0.083 ± 0.116 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels thrpt 8 0.061 ± 0.070 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels thrpt 8 0.056 ± 0.023 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.baseline avgt 8 37.850 ± 18.198 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels avgt 8 212.125 ± 71.695 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels avgt 8 95.351 ± 25.224 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel avgt 8 62.378 ± 27.395 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels avgt 8 78.572 ± 42.919 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels avgt 8 73.998 ± 40.895 ns/op

old version
Benchmark Mode Samples Score Error Units
i.p.b.SimpleCollectorLabelsBenchmark.baseline thrpt 8 0.147 ± 0.136 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels thrpt 8 0.028 ± 0.038 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels thrpt 8 0.016 ± 0.008 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel thrpt 8 0.039 ± 0.044 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels thrpt 8 0.033 ± 0.029 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels thrpt 8 0.029 ± 0.011 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.baseline avgt 8 36.050 ± 7.549 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels avgt 8 173.077 ± 115.384 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels avgt 8 210.191 ± 108.994 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel avgt 8 127.932 ± 20.187 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels avgt 8 108.859 ± 127.175 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels avgt 8 126.915 ± 100.411 ns/op

@Falland
Copy link
Author

Falland commented Feb 27, 2019

I've added one more method to allow no gc use for clients that really want it, the old api is still in place.
With new API clients can provide Labels object directly to the labels method, excluding internal object allocation.
If the allocation of a small objects is really not feasible, then clients can reuse Labels objects to lookup Child with labels method.

Here are the updated results with no GC method.
Benchmark Mode Samples Score Error Units
i.p.b.SimpleCollectorLabelsBenchmark.baseline thrpt 80 0.074 ± 0.000 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels thrpt 80 0.002 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabelsNoGC thrpt 80 0.073 ± 0.000 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels thrpt 80 0.035 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.fourLabelsNoGC thrpt 80 0.048 ± 0.002 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel thrpt 80 0.054 ± 0.003 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.oneLabelNoGC thrpt 80 0.072 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels thrpt 80 0.037 ± 0.002 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.threeLabelsNoGC thrpt 80 0.072 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels thrpt 80 0.043 ± 0.002 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.twoLabelsNoGC thrpt 80 0.073 ± 0.001 ops/ns
i.p.b.SimpleCollectorLabelsBenchmark.baseline avgt 80 13.602 ± 0.091 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabels avgt 80 2212.967 ± 1187.174 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fiveLabelsNoGC avgt 80 13.677 ± 0.099 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fourLabels avgt 80 28.888 ± 1.065 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.fourLabelsNoGC avgt 80 21.480 ± 1.889 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.oneLabel avgt 80 18.630 ± 1.059 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.oneLabelNoGC avgt 80 13.875 ± 0.141 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.threeLabels avgt 80 27.301 ± 1.309 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.threeLabelsNoGC avgt 80 13.856 ± 0.130 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.twoLabels avgt 80 24.067 ± 1.095 ns/op
i.p.b.SimpleCollectorLabelsBenchmark.twoLabelsNoGC avgt 80 13.842 ± 0.114 ns/op

@brian-brazil
Copy link
Contributor

If they can call OneLabel, then they can also cache the child and avoid this codepath all together.

@Falland
Copy link
Author

Falland commented Feb 27, 2019

Yes, you are right. Then probably this method is not needed. My assumption was that this allocation is not a problem for most of the users. Only for rare cases where the allocation is really critical users can build caching on their side to reduce it more. In the end java is the language with managed memory, for full GC free set-up one should choose different language probably.

@Falland
Copy link
Author

Falland commented Feb 27, 2019

Do you want me to remove the method? Then I will make all the Labels classes package private as well.

@brian-brazil
Copy link
Contributor

I don't see a need for the method personally.

@Falland
Copy link
Author

Falland commented Feb 27, 2019

Ok, I don't think it is important either. And the fact that I've struggled to explain the use case in javadoc is a good sign that it's not very obvious. Let me remove it.

@Falland Falland force-pushed the master branch 2 times, most recently from 41e6239 to b5833ff Compare February 27, 2019 19:51
@Falland
Copy link
Author

Falland commented Feb 28, 2019

Hi Brian, I was thinking about the caching Child in client code. I think you are right, this should be the the recommendation for the most latency critical java clients. But I still believe that my pull request makes sense and pushes the need of building this sophisticated client caching layer down to very small amount of all the use cases. Especially having that the code change is not that big and does not affect the clarity too much.

@brian-brazil
Copy link
Contributor

I didn't say otherwise, I've just not had time yet to dig into all these proposals and see which is best. If yours was the only one, it'd probably be merged by now.

@Falland Falland closed this Jun 9, 2019
njhill added a commit to njhill/client_java that referenced this pull request Nov 6, 2019
This is an optimization of the SimpleCollector.labels(...) lookups with
a similar goal to prometheus#445 and prometheus#459.

It has some things in common with those PRs (including overridden
fixed-args versions) but aims to provide best of all worlds - zero
garbage and higher throughput for all label counts, without any reliance
on thread reuse.

To achieve this, ConcurrentHashMap is abandoned in favour of a custom
copy-on-write linear-probe hashtable.

Benchmark results

Before:

Benchmark     Mode  Cnt         Score         Error  Units
baseline     thrpt   20  84731357.558 ±  535745.023  ops/s
oneLabel     thrpt   20  36415789.294 ±  441116.974  ops/s
twoLabels    thrpt   20  33301282.259 ±  313669.132  ops/s
threeLabels  thrpt   20  24560630.904 ± 2247040.286  ops/s
fourLabels   thrpt   20  24424456.896 ±  288989.596  ops/s
fiveLabels   thrpt   20  18356036.944 ±  949244.712  ops/s

After:

Benchmark     Mode  Cnt         Score         Error  Units
baseline     thrpt   20  84866162.495 ±  823753.503  ops/s
oneLabel     thrpt   20  84554174.645 ±  804735.949  ops/s
twoLabels    thrpt   20  85004332.529 ±  689559.035  ops/s
threeLabels  thrpt   20  73395533.440 ± 3022384.940  ops/s
fourLabels   thrpt   20  68736143.734 ± 1872048.923  ops/s
fiveLabels   thrpt   20  53482207.003 ±  488751.990  ops/s

This benchmark, like the prior ones, only tests with a single sequence
of labels for each count. It would be good to extend it to cover cases
where the map is populated with a larger number of children.

Signed-off-by: nickhill <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants