Optimize ingester Push() on blocks storage when the per-user or per-metric series limit is reached #3971

pracucci · 2021-03-18T11:42:00Z

What this PR does:
Following up the PR #3969, in this PR I'm optimizing the v2Push() when an high volume of per-user/series limit is reached.

In the profile I've seen much CPU and memory is consumed formatting the Limiter error. The idea in this PR is to return static errors from Limiter, offer a FormatError() function and then format only the 1st partial error in v2Push().

Metadata and chunks storage hasn't been changed.

Benchmark:

The benchmarks are on top of #3969 (I merged #3969 first and then worked on this PR):

_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_no_concurrency-12        1.19ms ± 0%    0.63ms ± 1%  -46.91%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_low_concurrency-12       29.9ms ± 8%    16.4ms ±11%  -45.24%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_high_concurrency-12       360ms ±14%     194ms ± 9%  -46.12%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_high_concurrency-12     1.04s ± 2%     0.33s ± 4%  -68.81%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_no_concurrency-12      1.31ms ± 1%    0.69ms ± 2%  -47.38%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_low_concurrency-12      105ms ±11%      30ms ± 7%  -71.19%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_no_concurrency-12                  362µs ± 3%     298µs ± 1%  -17.65%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_low_concurrency-12                8.54ms ± 6%    8.54ms ±15%     ~     (p=0.690 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_high_concurrency-12               86.1ms ± 2%    87.9ms ±13%     ~     (p=0.841 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_no_concurrency-12                  351µs ±11%     304µs ± 7%  -13.45%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_low_concurrency-12                6.70ms ±10%    7.53ms ±15%     ~     (p=0.095 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_high_concurrency-12               74.3ms ±12%    72.9ms ±11%     ~     (p=0.690 n=5+5)

name                                                                                             old alloc/op   new alloc/op   delta
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_no_concurrency-12         655kB ± 0%     407kB ± 0%  -37.91%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_low_concurrency-12       67.3MB ± 0%    41.6MB ± 0%  -38.17%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_high_concurrency-12       861MB ± 8%     495MB ± 9%  -42.51%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_high_concurrency-12     686MB ± 2%     430MB ± 6%  -37.33%  (p=0.016 n=4+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_no_concurrency-12       655kB ± 0%     406kB ± 0%  -38.06%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_low_concurrency-12     67.2MB ± 3%    41.2MB ± 2%  -38.61%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_no_concurrency-12                 99.0kB ± 0%    99.0kB ± 0%     ~     (p=0.103 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_low_concurrency-12                10.4MB ± 1%    10.4MB ± 1%     ~     (p=0.841 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_high_concurrency-12                106MB ± 2%     133MB ±20%     ~     (p=0.063 n=4+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_no_concurrency-12                 2.15kB ± 1%    2.15kB ± 1%     ~     (p=1.000 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_low_concurrency-12                 447kB ±52%     433kB ±50%     ~     (p=1.000 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_high_concurrency-12               29.5MB ±86%   26.9MB ±103%     ~     (p=1.000 n=5+5)

name                                                                                             old allocs/op  new allocs/op  delta
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_no_concurrency-12         9.03k ± 0%     5.04k ± 0%  -44.26%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_low_concurrency-12         910k ± 0%      507k ± 0%  -44.24%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_high_concurrency-12       9.73M ± 2%     5.33M ± 3%  -45.18%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_high_concurrency-12     9.05M ± 0%     5.10M ± 2%  -43.69%  (p=0.016 n=4+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_no_concurrency-12       9.04k ± 0%     5.04k ± 0%  -44.23%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_low_concurrency-12       909k ± 1%      507k ± 1%  -44.28%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_no_concurrency-12                  2.04k ± 0%     2.04k ± 0%     ~     (all equal)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_low_concurrency-12                  206k ± 0%      206k ± 0%     ~     (p=0.841 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_high_concurrency-12                2.06M ± 0%     2.15M ± 4%     ~     (p=0.063 n=4+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_no_concurrency-12                   40.0 ± 0%      40.0 ± 0%     ~     (all equal)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_low_concurrency-12                 4.74k ±16%     4.67k ±16%     ~     (p=1.000 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_high_concurrency-12                 128k ±69%      119k ±81%     ~     (p=1.000 n=5+5)

Which issue(s) this PR fixes:
N/A

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Marco Pracucci <[email protected]>

pstibrany

LGTM. FormatError and global errors is not the idiomatic way of doing it, but using global error values helps to avoid allocation unless necessary. Good job!

pstibrany · 2021-03-18T12:37:27Z

pkg/ingester/ingester_v2.go

-				// Caused by limits.
-				if firstPartialErr == nil {
-					firstPartialErr = ve
+			if cause == errMaxSeriesPerUserLimitExceeded || cause == errMaxSeriesPerMetricLimitExceeded {


We can simplify the logic on lines 787 – 820 with:

updateFirstPartial := func(errFn func() error) { if firstPartialErr == nil { firstPartialErr = errFn() } } switch cause { case storage.ErrOutOfBounds: sampleOutOfBoundsCount++ updateFirstPartial(func() error { return wrappedTSDBIngestErr(err, model.Time(s.TimestampMs), ts.Labels) }) continue case storage.ErrOutOfOrderSample: sampleOutOfOrderCount++ updateFirstPartial(func() error { return wrappedTSDBIngestErr(err, model.Time(s.TimestampMs), ts.Labels) }) continue case storage.ErrDuplicateSampleForTimestamp: newValueForTimestampCount++ updateFirstPartial(func() error { return wrappedTSDBIngestErr(err, model.Time(s.TimestampMs), ts.Labels) }) continue case errMaxSeriesPerUserLimitExceeded: perUserSeriesLimitCount++ updateFirstPartial(func() error { return makeLimitError(perUserSeriesLimit, i.limiter.FormatError(userID, cause)) }) continue case errMaxSeriesPerMetricLimitExceeded: perMetricSeriesLimitCount++ updateFirstPartial(func() error { return makeMetricLimitError(perMetricSeriesLimit, copiedLabels, i.limiter.FormatError(userID, cause)) }) continue }

Good idea, done.

pracucci · 2021-03-18T11:42:57Z

pkg/ingester/ingester_v2.go

-			}
-
-			var ve *validationError
-			if errors.As(cause, &ve) {


Removed because, following the code flow, I think the only errors that could be returned are per-user and per-metric series limit.

Signed-off-by: Marco Pracucci <[email protected]>

pstibrany

LGTM, great job!

Optimize the case the per-user or per-metric series limit is reached

0a06620

Signed-off-by: Marco Pracucci <[email protected]>

pracucci requested a review from pstibrany March 18, 2021 11:42

pull-request-size bot added the size/L label Mar 18, 2021

pracucci added 2 commits March 18, 2021 12:43

Removed else if

707609e

Signed-off-by: Marco Pracucci <[email protected]>

Added PR number to CHANGELOG entry

f89d628

Signed-off-by: Marco Pracucci <[email protected]>

pstibrany approved these changes Mar 18, 2021

View reviewed changes

pracucci commented Mar 18, 2021

View reviewed changes

Simplify code

c322dd6

Signed-off-by: Marco Pracucci <[email protected]>

pstibrany approved these changes Mar 18, 2021

View reviewed changes

pracucci merged commit c28d326 into cortexproject:master Mar 18, 2021

pracucci deleted the optimize-limit-reached branch March 18, 2021 15:33

pracucci mentioned this pull request Mar 22, 2021

Optimize distributor push on error #3990

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize ingester Push() on blocks storage when the per-user or per-metric series limit is reached #3971

Optimize ingester Push() on blocks storage when the per-user or per-metric series limit is reached #3971

pracucci commented Mar 18, 2021 •

edited

Loading

pstibrany left a comment

pstibrany Mar 18, 2021

pracucci Mar 18, 2021

pracucci Mar 18, 2021

pstibrany left a comment

Optimize ingester Push() on blocks storage when the per-user or per-metric series limit is reached #3971

Optimize ingester Push() on blocks storage when the per-user or per-metric series limit is reached #3971

Conversation

pracucci commented Mar 18, 2021 • edited Loading

pstibrany left a comment

Choose a reason for hiding this comment

pstibrany Mar 18, 2021

Choose a reason for hiding this comment

pracucci Mar 18, 2021

Choose a reason for hiding this comment

pracucci Mar 18, 2021

Choose a reason for hiding this comment

pstibrany left a comment

Choose a reason for hiding this comment

pracucci commented Mar 18, 2021 •

edited

Loading