Skip to content

Reduced garbage produced on label lookup #445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

franz1981
Copy link

@franz1981 franz1981 commented Dec 13, 2018

Introduced pooling of label Names to reduce garbage
in the hot path, updated benchmarks to measure it,
improved SimpleCollector creation when are used
labels with no label names or with a single element.
Introduced a new ArrayList implementation with
faster hashCode/equals to allow faster lookups.

@franz1981 franz1981 force-pushed the zero_gc_labels_lookup branch from 0f730c1 to b1717c9 Compare December 13, 2018 16:20
@brian-brazil
Copy link
Contributor

Can you share your benchmark results? I'm particularly interested if the single label case optimistion is worth it.

@franz1981
Copy link
Author

Yes, sure!
I just have the case with 2 label names ie pooled/not pooled that is quite similar.
Wait a sec and I will post it 👍

@@ -164,9 +228,10 @@ protected SimpleCollector(Builder b) {
checkMetricName(fullname);
if (b.help.isEmpty()) throw new IllegalStateException("Help hasn't been set.");
help = b.help;
labelNames = Arrays.asList(b.labelNames);
labelNames = b.labelNames.length == 0 ? Collections.<String>emptyList() :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only called on creation of a metric, it's not on a hot path so I'd not complicate things.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LabelNames is not used in other contexts? If not, I can remove this one :)
Re the bench I'm running the single threaded to avoid the contention to make less reproducible the results (due to the contention: if I have made it faster, it would hammer more often any contended data structure)

Copy link
Author

@franz1981 franz1981 Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to run the benchmarks in a quiet machine because I'm getting very unstable results!
I will try to run it ASAP and will write them here :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one here should be created and stay alive until the process terminates.

@franz1981 franz1981 force-pushed the zero_gc_labels_lookup branch from b1717c9 to 96063c2 Compare December 13, 2018 19:12
@franz1981
Copy link
Author

franz1981 commented Dec 14, 2018

MASTER:

Benchmark                                                                                                  Mode  Cnt     Score     Error   Units
SummaryBenchmark.prometheusSimpleHistogramBenchmark                                                        avgt    4    44.351 ±  15.006   ns/op
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.alloc.rate                                         avgt    4   984.933 ± 327.968  MB/sec
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.alloc.rate.norm                                    avgt    4    48.000 ±   0.001    B/op
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.churn.PS_Eden_Space                                avgt    4   984.451 ± 248.087  MB/sec
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.churn.PS_Eden_Space.norm                           avgt    4    48.000 ±   3.983    B/op
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.churn.PS_Survivor_Space                            avgt    4     0.069 ±   0.071  MB/sec
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.churn.PS_Survivor_Space.norm                       avgt    4     0.003 ±   0.003    B/op
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.count                                              avgt    4   276.000            counts
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.time                                               avgt    4   318.000                ms
SummaryBenchmark.prometheusSimpleHistogramPooledLabelNamesBenchmark                                        avgt    4    45.660 ±  11.235   ns/op
SummaryBenchmark.prometheusSimpleHistogramPooledLabelNamesBenchmark:·gc.alloc.rate                         avgt    4   477.863 ± 114.849  MB/sec
SummaryBenchmark.prometheusSimpleHistogramPooledLabelNamesBenchmark:·gc.alloc.rate.norm                    avgt    4    24.000 ±   0.001    B/op
SummaryBenchmark.prometheusSimpleHistogramPooledLabelNamesBenchmark:·gc.churn.PS_Eden_Space                avgt    4   476.789 ± 105.479  MB/sec
SummaryBenchmark.prometheusSimpleHistogramPooledLabelNamesBenchmark:·gc.churn.PS_Eden_Space.norm           avgt    4    23.949 ±   1.579    B/op
SummaryBenchmark.prometheusSimpleHistogramPooledLabelNamesBenchmark:·gc.churn.PS_Survivor_Space            avgt    4     0.015 ±   0.042  MB/sec
SummaryBenchmark.prometheusSimpleHistogramPooledLabelNamesBenchmark:·gc.churn.PS_Survivor_Space.norm       avgt    4     0.001 ±   0.002    B/op
SummaryBenchmark.prometheusSimpleHistogramPooledLabelNamesBenchmark:·gc.count                              avgt    4   262.000            counts
SummaryBenchmark.prometheusSimpleHistogramPooledLabelNamesBenchmark:·gc.time                               avgt    4   308.000                ms
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark                                                   avgt    4    86.875 ±   2.624   ns/op
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.alloc.rate                                    avgt    4   501.808 ±  15.102  MB/sec
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.alloc.rate.norm                               avgt    4    48.000 ±   0.001    B/op
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.churn.PS_Eden_Space                           avgt    4   502.020 ±  17.032  MB/sec
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.churn.PS_Eden_Space.norm                      avgt    4    48.021 ±   1.611    B/op
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.churn.PS_Survivor_Space                       avgt    4     0.061 ±   0.110  MB/sec
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.churn.PS_Survivor_Space.norm                  avgt    4     0.006 ±   0.011    B/op
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.count                                         avgt    4   324.000            counts
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.time                                          avgt    4   311.000                ms
SummaryBenchmark.prometheusSimpleHistogramTimerPooledLabelNamesBenchmark                                   avgt    4    85.491 ±   0.773   ns/op
SummaryBenchmark.prometheusSimpleHistogramTimerPooledLabelNamesBenchmark:·gc.alloc.rate                    avgt    4   254.962 ±   2.280  MB/sec
SummaryBenchmark.prometheusSimpleHistogramTimerPooledLabelNamesBenchmark:·gc.alloc.rate.norm               avgt    4    24.000 ±   0.001    B/op
SummaryBenchmark.prometheusSimpleHistogramTimerPooledLabelNamesBenchmark:·gc.churn.PS_Eden_Space           avgt    4   254.942 ±   3.027  MB/sec
SummaryBenchmark.prometheusSimpleHistogramTimerPooledLabelNamesBenchmark:·gc.churn.PS_Eden_Space.norm      avgt    4    23.998 ±   0.341    B/op
SummaryBenchmark.prometheusSimpleHistogramTimerPooledLabelNamesBenchmark:·gc.churn.PS_Survivor_Space       avgt    4     0.017 ±   0.060  MB/sec
SummaryBenchmark.prometheusSimpleHistogramTimerPooledLabelNamesBenchmark:·gc.churn.PS_Survivor_Space.norm  avgt    4     0.002 ±   0.006    B/op
SummaryBenchmark.prometheusSimpleHistogramTimerPooledLabelNamesBenchmark:·gc.count                         avgt    4   298.000            counts
SummaryBenchmark.prometheusSimpleHistogramTimerPooledLabelNamesBenchmark:·gc.time                          avgt    4   309.000                ms
SummaryBenchmark.prometheusSimpleSummaryBenchmark                                                          avgt    4    38.521 ±   6.949   ns/op
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.alloc.rate                                           avgt    4  1132.375 ± 204.787  MB/sec
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.alloc.rate.norm                                      avgt    4    48.000 ±   0.001    B/op
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.churn.PS_Eden_Space                                  avgt    4  1131.893 ± 184.905  MB/sec
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.churn.PS_Eden_Space.norm                             avgt    4    47.982 ±   0.990    B/op
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.churn.PS_Survivor_Space                              avgt    4     0.077 ±   0.029  MB/sec
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.churn.PS_Survivor_Space.norm                         avgt    4     0.003 ±   0.001    B/op
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.count                                                avgt    4   339.000            counts
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.time                                                 avgt    4   328.000                ms
SummaryBenchmark.prometheusSimpleSummaryPooledLabelNamesBenchmark                                          avgt    4    38.991 ±   9.484   ns/op
SummaryBenchmark.prometheusSimpleSummaryPooledLabelNamesBenchmark:·gc.alloc.rate                           avgt    4   559.594 ± 131.568  MB/sec
SummaryBenchmark.prometheusSimpleSummaryPooledLabelNamesBenchmark:·gc.alloc.rate.norm                      avgt    4    24.000 ±   0.001    B/op
SummaryBenchmark.prometheusSimpleSummaryPooledLabelNamesBenchmark:·gc.churn.PS_Eden_Space                  avgt    4   558.727 ± 135.295  MB/sec
SummaryBenchmark.prometheusSimpleSummaryPooledLabelNamesBenchmark:·gc.churn.PS_Eden_Space.norm             avgt    4    23.962 ±   0.852    B/op
SummaryBenchmark.prometheusSimpleSummaryPooledLabelNamesBenchmark:·gc.churn.PS_Survivor_Space              avgt    4     0.014 ±   0.033  MB/sec
SummaryBenchmark.prometheusSimpleSummaryPooledLabelNamesBenchmark:·gc.churn.PS_Survivor_Space.norm         avgt    4     0.001 ±   0.001    B/op
SummaryBenchmark.prometheusSimpleSummaryPooledLabelNamesBenchmark:·gc.count                                avgt    4   333.000            counts
SummaryBenchmark.prometheusSimpleSummaryPooledLabelNamesBenchmark:·gc.time                                 avgt    4   335.000                ms

THIS PR:

PR:

Benchmark                                                                                     Mode  Cnt     Score      Error   Units
SummaryBenchmark.prometheusSimpleHistogramBenchmark                                           avgt    4    44.648 ±    1.256   ns/op
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.alloc.rate                            avgt    4   488.204 ±   13.789  MB/sec
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.alloc.rate.norm                       avgt    4    24.000 ±    0.001    B/op
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.churn.PS_Eden_Space                   avgt    4   488.006 ±   27.472  MB/sec
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.churn.PS_Eden_Space.norm              avgt    4    23.990 ±    1.099    B/op
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.churn.PS_Survivor_Space               avgt    4     0.015 ±    0.042  MB/sec
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.churn.PS_Survivor_Space.norm          avgt    4     0.001 ±    0.002    B/op
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.count                                 avgt    4   349.000             counts
SummaryBenchmark.prometheusSimpleHistogramBenchmark:·gc.time                                  avgt    4   341.000                 ms
SummaryBenchmark.prometheusSimpleHistogramPooledLabelNamesBenchmark                           avgt    4    44.826 ±    0.786   ns/op
SummaryBenchmark.prometheusSimpleHistogramPooledLabelNamesBenchmark:·gc.alloc.rate            avgt    4    ≈ 10⁻⁴             MB/sec
SummaryBenchmark.prometheusSimpleHistogramPooledLabelNamesBenchmark:·gc.alloc.rate.norm       avgt    4    ≈ 10⁻⁶               B/op
SummaryBenchmark.prometheusSimpleHistogramPooledLabelNamesBenchmark:·gc.count                 avgt    4       ≈ 0             counts
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark                                      avgt    4    88.715 ±    1.072   ns/op
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.alloc.rate                       avgt    4   245.693 ±    2.951  MB/sec
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.alloc.rate.norm                  avgt    4    24.000 ±    0.001    B/op
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.churn.PS_Eden_Space              avgt    4   245.905 ±    2.749  MB/sec
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.churn.PS_Eden_Space.norm         avgt    4    24.021 ±    0.215    B/op
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.churn.PS_Survivor_Space          avgt    4     0.016 ±    0.066  MB/sec
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.churn.PS_Survivor_Space.norm     avgt    4     0.002 ±    0.006    B/op
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.count                            avgt    4   343.000             counts
SummaryBenchmark.prometheusSimpleHistogramTimerBenchmark:·gc.time                             avgt    4   327.000                 ms
SummaryBenchmark.prometheusSimpleHistogramTimerPooledLabelNamesBenchmark                      avgt    4    90.161 ±    2.881   ns/op
SummaryBenchmark.prometheusSimpleHistogramTimerPooledLabelNamesBenchmark:·gc.alloc.rate       avgt    4    ≈ 10⁻⁴             MB/sec
SummaryBenchmark.prometheusSimpleHistogramTimerPooledLabelNamesBenchmark:·gc.alloc.rate.norm  avgt    4    ≈ 10⁻⁵               B/op
SummaryBenchmark.prometheusSimpleHistogramTimerPooledLabelNamesBenchmark:·gc.count            avgt    4       ≈ 0             counts
SummaryBenchmark.prometheusSimpleSummaryBenchmark                                             avgt    4    43.756 ±    1.004   ns/op
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.alloc.rate                              avgt    4   498.160 ±   11.422  MB/sec
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.alloc.rate.norm                         avgt    4    24.000 ±    0.001    B/op
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.churn.PS_Eden_Space                     avgt    4   497.915 ±   53.488  MB/sec
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.churn.PS_Eden_Space.norm                avgt    4    23.988 ±    2.172    B/op
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.churn.PS_Survivor_Space                 avgt    4     0.018 ±    0.079  MB/sec
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.churn.PS_Survivor_Space.norm            avgt    4     0.001 ±    0.004    B/op
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.count                                   avgt    4   273.000             counts
SummaryBenchmark.prometheusSimpleSummaryBenchmark:·gc.time                                    avgt    4   297.000                 ms
SummaryBenchmark.prometheusSimpleSummaryPooledLabelNamesBenchmark                             avgt    4    42.355 ±    3.602   ns/op
SummaryBenchmark.prometheusSimpleSummaryPooledLabelNamesBenchmark:·gc.alloc.rate              avgt    4    ≈ 10⁻⁴             MB/sec
SummaryBenchmark.prometheusSimpleSummaryPooledLabelNamesBenchmark:·gc.alloc.rate.norm         avgt    4    ≈ 10⁻⁶               B/op
SummaryBenchmark.prometheusSimpleSummaryPooledLabelNamesBenchmark:·gc.count                   avgt    4       ≈ 0             counts

Looking at the number and thanks to the changes to speed-up ArrayList:.equals/hashCode the performance are similar (often a lil' worst TBH) to the original version, but with cached labels names (String[]) it is possible to achieve 0 garbage on the hot path, that help to avoid latency spikes on the rest of application that use it.
In the other cases the garbage produced is ~ cutted in half.
I will come soon with a bench that cover the single label case too 👍

@franz1981
Copy link
Author

While Re label names lookup perf:
On Master:

Benchmark                                                                             (labelNamesCount)  Mode  Cnt     Score     Error   Units
LabelNamesLookupBenchmark.labelNamesLookupBenchmark                                                   1  avgt    8    27.037 ±   2.878   ns/op
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.alloc.rate                                    1  avgt    8  1616.415 ± 156.981  MB/sec
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.alloc.rate.norm                               1  avgt    8    48.000 ±   0.001    B/op
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.churn.PS_Eden_Space                           1  avgt    8  1616.002 ± 149.652  MB/sec
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.churn.PS_Eden_Space.norm                      1  avgt    8    47.993 ±   0.352    B/op
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.churn.PS_Survivor_Space                       1  avgt    8     0.071 ±   0.034  MB/sec
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.churn.PS_Survivor_Space.norm                  1  avgt    8     0.002 ±   0.001    B/op
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.count                                         1  avgt    8   614.000            counts
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.time                                          1  avgt    8   639.000                ms
LabelNamesLookupBenchmark.labelNamesLookupBenchmark                                                   2  avgt    8    29.165 ±   3.770   ns/op
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.alloc.rate                                    2  avgt    8   750.260 ±  93.636  MB/sec
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.alloc.rate.norm                               2  avgt    8    24.000 ±   0.001    B/op
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.churn.PS_Eden_Space                           2  avgt    8   748.556 ±  91.635  MB/sec
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.churn.PS_Eden_Space.norm                      2  avgt    8    23.948 ±   0.457    B/op
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.churn.PS_Survivor_Space                       2  avgt    8     0.018 ±   0.011  MB/sec
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.churn.PS_Survivor_Space.norm                  2  avgt    8     0.001 ±   0.001    B/op
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.count                                         2  avgt    8   632.000            counts
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.time                                          2  avgt    8   622.000                ms

With this PR:

Benchmark                                                                (labelNamesCount)  Mode  Cnt   Score    Error   Units
LabelNamesLookupBenchmark.labelNamesLookupBenchmark                                      1  avgt    8  25.569 ±  2.521   ns/op
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.alloc.rate                       1  avgt    8  ≈ 10⁻⁴           MB/sec
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.alloc.rate.norm                  1  avgt    8  ≈ 10⁻⁶             B/op
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.count                            1  avgt    8     ≈ 0           counts
LabelNamesLookupBenchmark.labelNamesLookupBenchmark                                      2  avgt    8  41.671 ±  4.040   ns/op
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.alloc.rate                       2  avgt    8  ≈ 10⁻⁴           MB/sec
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.alloc.rate.norm                  2  avgt    8  ≈ 10⁻⁶             B/op
LabelNamesLookupBenchmark.labelNamesLookupBenchmark:·gc.count                            2  avgt    8     ≈ 0           counts

We just pay more cost CPU-wise on the label names count = 2, but much less GC (0), while with a single label name the perf are improved with no GC at all

@franz1981 franz1981 force-pushed the zero_gc_labels_lookup branch from 96063c2 to e6eb364 Compare December 14, 2018 09:45
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why is there still garbage for some of the benchmarks?

A few are also a bit slower, is that just noise (4 isn't many tests) or real?

package io.prometheus.benchmark;

import io.prometheus.client.Histogram;
import org.openjdk.jmh.annotations.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be explicit in imports

@franz1981
Copy link
Author

franz1981 commented Dec 14, 2018

Hmm, why is there still garbage for some of the benchmarks?

The ones having garbage are the ones with:

  • a new Timer allocated each time
  • vargargs String[] not pooled benchs

A few are also a bit slower, is that just noise (4 isn't many tests) or real?

Nope, I have verified several times and are slower for real, but with good reasons...
The original version was not creating a deep copy of the String[], because Arrays.asList is actually wrapping the original String[] ie there is a chance that a change on String[] would affect the pooled instance too. That absense of copy is the main reason of the performance difference, because the new version perform a copy by adding one by one the Strings (in a pooled ArrayList).
Hence we have a trade-off here: a somehow more complex code that will run faster/cheaper for single label name vs a small perf difference (that would became ~0 under contention) but less/no GC.
I leave to you the choice :P
IMHO I prefer less garbage created on the same path that has to measure anything to make it possible to use the lib in very hot paths.

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to do.

One optimisation would be to move the label validation into tryCreateChild, as it's not needed on every lookup.

Child c2 = newChild();
Child tmp = children.putIfAbsent(labels, c2);
if (tmp == null) {
labelNamesPool.set(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you throwaway the pool here?

Copy link
Author

@franz1981 franz1981 Dec 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if labels is being put into children is better to not sharing it to others that will come later by mean of the pool: the risk is to have the same list in children and pooled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment on that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added 👍

@franz1981 franz1981 force-pushed the zero_gc_labels_lookup branch from e6eb364 to ac931a4 Compare December 14, 2018 16:12
Introduced pooling of label Names to reduce garbage
in the hot path, updated benchmarks to measure it,
improved SimpleCollector creation when are used
labels with no label names or with a single element.
Introduced a new ArrayList implementation with
faster hashCode/equals to allow faster lookups.

Signed-off-by: Francesco Nigro <[email protected]>
@franz1981 franz1981 force-pushed the zero_gc_labels_lookup branch from ac931a4 to 03576e6 Compare December 18, 2018 15:25
@franz1981 franz1981 closed this Feb 26, 2019
njhill added a commit to njhill/client_java that referenced this pull request Nov 6, 2019
This is an optimization of the SimpleCollector.labels(...) lookups with
a similar goal to prometheus#445 and prometheus#459.

It has some things in common with those PRs (including overridden
fixed-args versions) but aims to provide best of all worlds - zero
garbage and higher throughput for all label counts, without any reliance
on thread reuse.

To achieve this, ConcurrentHashMap is abandoned in favour of a custom
copy-on-write linear-probe hashtable.

Benchmark results

Before:

Benchmark     Mode  Cnt         Score         Error  Units
baseline     thrpt   20  84731357.558 ±  535745.023  ops/s
oneLabel     thrpt   20  36415789.294 ±  441116.974  ops/s
twoLabels    thrpt   20  33301282.259 ±  313669.132  ops/s
threeLabels  thrpt   20  24560630.904 ± 2247040.286  ops/s
fourLabels   thrpt   20  24424456.896 ±  288989.596  ops/s
fiveLabels   thrpt   20  18356036.944 ±  949244.712  ops/s

After:

Benchmark     Mode  Cnt         Score         Error  Units
baseline     thrpt   20  84866162.495 ±  823753.503  ops/s
oneLabel     thrpt   20  84554174.645 ±  804735.949  ops/s
twoLabels    thrpt   20  85004332.529 ±  689559.035  ops/s
threeLabels  thrpt   20  73395533.440 ± 3022384.940  ops/s
fourLabels   thrpt   20  68736143.734 ± 1872048.923  ops/s
fiveLabels   thrpt   20  53482207.003 ±  488751.990  ops/s

This benchmark, like the prior ones, only tests with a single sequence
of labels for each count. It would be good to extend it to cover cases
where the map is populated with a larger number of children.

Signed-off-by: nickhill <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants