Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics for enqueue/flush reason, and dequeue/flush outcome. #2818

Merged
merged 7 commits into from
Jul 28, 2020
Merged

Metrics for enqueue/flush reason, and dequeue/flush outcome. #2818

merged 7 commits into from
Jul 28, 2020

Conversation

pstibrany
Copy link
Contributor

@pstibrany pstibrany commented Jun 30, 2020

What this PR does: Re-added cortex_ingester_flush_reasons under the name cortex_ingester_enqueued_series_total.
Added cortex_ingester_dequeued_series_total, with outcome label. Outcome is a super set of flush reason.

Follow-up to #2802 and #2811.

Which issue(s) this PR fixes:
Fixes #2287

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pstibrany pstibrany requested a review from bboreham June 30, 2020 15:13
@@ -187,8 +189,16 @@ func newIngesterMetrics(r prometheus.Registerer, createMetricsConflictingWithTSD
Name: "cortex_ingester_memory_chunks",
Help: "The total number of chunks in memory.",
}),
seriesEnqueuedForFlush: promauto.With(r).NewCounterVec(prometheus.CounterOpts{
Name: "cortex_ingester_enqueued_series_total",
Copy link
Contributor

@pracucci pracucci Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've the feeling the metric name is too much generic. Have you considered something like cortex_ingester_series_flushing_enqueued_total?

Same comment applies to cortex_ingester_dequeued_series_total

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with cortex_ingester_flushing_enqueued_series_total to keep series (unit of the metric) close to the end. Same for dequeued.

pstibrany added 4 commits July 3, 2020 17:08
…_enqueued_series_total.

Added cortex_ingester_dequeued_series_total, with outcome label.

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Are metrics easy to test in existing unit tests?

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks even better to me :)

@pracucci
Copy link
Contributor

@bboreham Do you have any comment on the updated changes?

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@@ -146,6 +160,7 @@ func (i *Ingester) sweepSeries(userID string, fp model.Fingerprint, series *memo

flushQueueIndex := int(uint64(fp) % uint64(i.cfg.ConcurrentFlushes))
if i.flushQueues[flushQueueIndex].Enqueue(&flushOp{firstTime, userID, fp, immediate}) {
i.metrics.seriesEnqueuedForFlush.WithLabelValues(flush.String()).Inc()
util.Event().Log("msg", "add to flush queue", "userID", userID, "reason", flush, "firstTime", firstTime, "fp", fp, "series", series.metric, "nlabels", len(series.metric), "queue", flushQueueIndex)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small note: should we have a metric for the 'else' case here? Which would mean the series is already queued.
I can't see that it's tremendously interesting, but if it was way out of line with expectations that could be interesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could signal that chunks are not flushed fast enough, but one can also see that from queue length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a counter for chunks removed from the flush queue but not flushed
3 participants