Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize metrics tracking on ingester v2Push() errors #3969

Merged
merged 6 commits into from
Mar 18, 2021

Conversation

pracucci
Copy link
Contributor

What this PR does:
I'm profiling the ingester v2Push() to investigate why we see increased CPU and memory on an high number of errors occurring in the write path (eg. out of order samples, out of bound samples, per-user/metric series limit reached). I've improved the benchmark we already had and I've found out several inefficiencies.

This is the first PR to fix a single issue: metrics tracking. I will follow-up with dedicated PRs for other improvements.

Benchmark:

name                                                                                             old time/op    new time/op    delta
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_no_concurrency-12                  435µs ± 1%     297µs ± 3%  -31.65%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_low_concurrency-12                12.6ms ± 5%     8.1ms ±15%  -35.29%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_high_concurrency-12                144ms ±14%      81ms ±12%  -44.10%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_no_concurrency-12                  466µs ± 4%     300µs ± 2%  -35.72%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_low_concurrency-12                12.0ms ± 4%     6.6ms ±15%  -45.38%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_high_concurrency-12                134ms ± 8%      73ms ± 5%  -45.54%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_low_concurrency-12       39.1ms ± 2%    31.1ms ± 6%  -20.48%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_high_concurrency-12       393ms ± 7%     359ms ±10%   -8.75%  (p=0.032 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_no_concurrency-12        1.40ms ± 9%    1.22ms ± 3%  -13.02%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_no_concurrency-12      1.53ms ±13%    1.48ms ± 9%     ~     (p=0.310 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_low_concurrency-12      107ms ± 4%     105ms ±11%     ~     (p=1.000 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_high_concurrency-12     1.15s ± 7%     1.04s ± 7%   -9.40%  (p=0.032 n=5+5)

name                                                                                             old alloc/op   new alloc/op   delta
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_no_concurrency-12                  131kB ± 0%      99kB ± 0%  -24.54%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_low_concurrency-12                13.9MB ± 1%    10.4MB ± 0%  -25.30%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_high_concurrency-12                198MB ±24%     141MB ±13%  -28.54%  (p=0.032 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_no_concurrency-12                 34.4kB ± 0%     2.1kB ± 1%  -93.79%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_low_concurrency-12                3.90MB ± 4%    0.35MB ±27%  -91.00%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_high_concurrency-12               96.5MB ±47%    43.0MB ± 2%  -55.42%  (p=0.016 n=5+4)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_low_concurrency-12       71.1MB ± 0%    67.6MB ± 0%   -4.96%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_high_concurrency-12       841MB ±17%     824MB ±16%     ~     (p=1.000 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_no_concurrency-12         687kB ± 0%     655kB ± 0%   -4.71%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_no_concurrency-12       688kB ± 0%     656kB ± 0%   -4.71%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_low_concurrency-12     71.2MB ± 7%    67.6MB ± 6%     ~     (p=0.056 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_high_concurrency-12    1.09GB ±38%    0.68GB ± 2%  -37.50%  (p=0.016 n=5+4)

name                                                                                             old allocs/op  new allocs/op  delta
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_no_concurrency-12                  3.04k ± 0%     2.04k ± 0%  -32.89%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_low_concurrency-12                  307k ± 0%      206k ± 0%  -32.91%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_high_concurrency-12                3.27M ± 5%     2.18M ± 3%  -33.12%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_no_concurrency-12                  1.04k ± 0%     0.04k ± 0%  -96.15%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_low_concurrency-12                  105k ± 1%        4k ± 8%  -95.82%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_high_concurrency-12                1.24M ±12%     0.17M ± 3%  -85.95%  (p=0.016 n=5+4)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_low_concurrency-12        1.01M ± 0%     0.91M ± 0%   -9.97%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_high_concurrency-12       10.5M ± 5%      9.6M ± 5%   -8.92%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_no_concurrency-12         10.0k ± 0%      9.0k ± 0%   -9.96%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_no_concurrency-12       10.0k ± 0%      9.0k ± 0%   -9.95%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_low_concurrency-12      1.01M ± 2%     0.91M ± 2%   -9.97%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_high_concurrency-12     11.3M ±13%      9.0M ± 0%  -20.14%  (p=0.016 n=5+4)

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Marco Pracucci <[email protected]>
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find!

@pracucci pracucci merged commit 22f6690 into cortexproject:master Mar 18, 2021
@pracucci pracucci deleted the benchmark-ingester-on-errors branch March 18, 2021 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants