-
Notifications
You must be signed in to change notification settings - Fork 815
Improve throughput of metrics collections in SimpleCollector #459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What do the benchmarks show?
Noone should be depending on that, SimpleCollector is considered internal. |
The CreationBenchmark shows that creation of the Tuple objects is at least as fast as Arrays.asList() call. Noticed a copy-paste artifact in main method, I'll fix that later today. The SearchBenchmark shows that search in HashMap with Tuple objects as a key is as fast if not faster than with List key. But the difference is in equals implementation. Tuples (except MultipleLabels) do not create iterators for comparison, so they generate less garbage pressure. Also the overloaded method call does not create array object for VarArgs which reduces garbage pressure and saves some cycles. I think I can add VarArg vs overloaded method call benchmarks to show the effect. Regarding field type change, I just wanted to make it obvious. But I can remove this part from the comment. |
Can you share the benchmark results? |
faaf264
to
1681e2c
Compare
So I've added new benchmark, as I said in previous comment. SearchBenchmark: CreationBenchmark: This results show that tuples are ~3x time faster. And additionally there's less garbage pressure. My machine is somewhat old (i5-2520M CPU @ 2.50GHz), I can try to find newer machine and run tests there |
I've managed to run also on i7-6700T CPU @ 2.80GHz SearchBenchmark: CreationBenchmark: Here the improvement is even more evident. |
Do you have benchmarks with the increments? That's the most realistic. |
Hi Brian, what do you mean by increments? But of course I can try to build another benchmark. |
Don't just fetch the Child, use it. |
I've changed the benchmark and run it on my old machine, will add results from new machine tomorrow new version old version |
I've run this on newer machine: old version |
I've added one more method to allow no gc use for clients that really want it, the old api is still in place. Here are the updated results with no GC method. |
If they can call OneLabel, then they can also cache the child and avoid this codepath all together. |
Yes, you are right. Then probably this method is not needed. My assumption was that this allocation is not a problem for most of the users. Only for rare cases where the allocation is really critical users can build caching on their side to reduce it more. In the end java is the language with managed memory, for full GC free set-up one should choose different language probably. |
Do you want me to remove the method? Then I will make all the Labels classes package private as well. |
I don't see a need for the method personally. |
Ok, I don't think it is important either. And the fact that I've struggled to explain the use case in javadoc is a good sign that it's not very obvious. Let me remove it. |
41e6239
to
b5833ff
Compare
Hi Brian, I was thinking about the caching Child in client code. I think you are right, this should be the the recommendation for the most latency critical java clients. But I still believe that my pull request makes sense and pushes the need of building this sophisticated client caching layer down to very small amount of all the use cases. Especially having that the code change is not that big and does not affect the clarity too much. |
I didn't say otherwise, I've just not had time yet to dig into all these proposals and see which is best. If yours was the only one, it'd probably be merged by now. |
This is an optimization of the SimpleCollector.labels(...) lookups with a similar goal to prometheus#445 and prometheus#459. It has some things in common with those PRs (including overridden fixed-args versions) but aims to provide best of all worlds - zero garbage and higher throughput for all label counts, without any reliance on thread reuse. To achieve this, ConcurrentHashMap is abandoned in favour of a custom copy-on-write linear-probe hashtable. Benchmark results Before: Benchmark Mode Cnt Score Error Units baseline thrpt 20 84731357.558 ± 535745.023 ops/s oneLabel thrpt 20 36415789.294 ± 441116.974 ops/s twoLabels thrpt 20 33301282.259 ± 313669.132 ops/s threeLabels thrpt 20 24560630.904 ± 2247040.286 ops/s fourLabels thrpt 20 24424456.896 ± 288989.596 ops/s fiveLabels thrpt 20 18356036.944 ± 949244.712 ops/s After: Benchmark Mode Cnt Score Error Units baseline thrpt 20 84866162.495 ± 823753.503 ops/s oneLabel thrpt 20 84554174.645 ± 804735.949 ops/s twoLabels thrpt 20 85004332.529 ± 689559.035 ops/s threeLabels thrpt 20 73395533.440 ± 3022384.940 ops/s fourLabels thrpt 20 68736143.734 ± 1872048.923 ops/s fiveLabels thrpt 20 53482207.003 ± 488751.990 ops/s This benchmark, like the prior ones, only tests with a single sequence of labels for each count. It would be good to extend it to cover cases where the map is populated with a larger number of children. Signed-off-by: nickhill <[email protected]>
With this change the throughput of labels method call is improved by creation of overloaded versions of labels method for 1 - 4 arguments. Vararg method is still there for cases with more labels and for backward compatibility.
Performance tests are included in the commit to prove the improvement.
Note! The change is amending SimpleCollector children field type from ConcurrentHashMap<List, Child> to ConcurrentHashMap<LabelsTuple, Child> type, which might break some client code.