-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics for enqueue/flush reason, and dequeue/flush outcome. #2818
Conversation
pkg/ingester/metrics.go
Outdated
@@ -187,8 +189,16 @@ func newIngesterMetrics(r prometheus.Registerer, createMetricsConflictingWithTSD | |||
Name: "cortex_ingester_memory_chunks", | |||
Help: "The total number of chunks in memory.", | |||
}), | |||
seriesEnqueuedForFlush: promauto.With(r).NewCounterVec(prometheus.CounterOpts{ | |||
Name: "cortex_ingester_enqueued_series_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've the feeling the metric name is too much generic. Have you considered something like cortex_ingester_series_flushing_enqueued_total
?
Same comment applies to cortex_ingester_dequeued_series_total
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going with cortex_ingester_flushing_enqueued_series_total
to keep series
(unit of the metric) close to the end. Same for dequeued.
…_enqueued_series_total. Added cortex_ingester_dequeued_series_total, with outcome label. Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Are metrics easy to test in existing unit tests?
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks even better to me :)
@bboreham Do you have any comment on the updated changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks!
@@ -146,6 +160,7 @@ func (i *Ingester) sweepSeries(userID string, fp model.Fingerprint, series *memo | |||
|
|||
flushQueueIndex := int(uint64(fp) % uint64(i.cfg.ConcurrentFlushes)) | |||
if i.flushQueues[flushQueueIndex].Enqueue(&flushOp{firstTime, userID, fp, immediate}) { | |||
i.metrics.seriesEnqueuedForFlush.WithLabelValues(flush.String()).Inc() | |||
util.Event().Log("msg", "add to flush queue", "userID", userID, "reason", flush, "firstTime", firstTime, "fp", fp, "series", series.metric, "nlabels", len(series.metric), "queue", flushQueueIndex) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small note: should we have a metric for the 'else' case here? Which would mean the series is already queued.
I can't see that it's tremendously interesting, but if it was way out of line with expectations that could be interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could signal that chunks are not flushed fast enough, but one can also see that from queue length.
What this PR does: Re-added
cortex_ingester_flush_reasons
under the namecortex_ingester_enqueued_series_total
.Added
cortex_ingester_dequeued_series_total
, with outcome label. Outcome is a super set of flush reason.Follow-up to #2802 and #2811.
Which issue(s) this PR fixes:
Fixes #2287
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]