-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new query stats metrics to track prometheus querystats #6228
Add new query stats metrics to track prometheus querystats #6228
Conversation
0a571de
to
d4ddae7
Compare
pkg/frontend/transport/handler.go
Outdated
|
||
h.queryPeakSamples = promauto.With(reg).NewCounterVec(prometheus.CounterOpts{ | ||
Name: "cortex_query_peak_samples_total", | ||
Help: "Highest count of samples considered to execute a query.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain the usecase of tracking peak samples using a counter metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think like promQL: sum by (user) (irate(cortex_query_peak_samples_total[5m]))
, the cortex user can track the rate of how many samples are loaded in memory for each tenant's overall queries.
One usecase is:
- The cortex user can track query memory usage (used by loaded samples) for each tenant and turn promQL to reduce query memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. For a counter metric I feel it is more useful to just track total samples, not peak samples.
Would it be more useful to use native histogram instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think the histogram is the best.
afea4c7
to
1e37e70
Compare
@yeya24 |
0cfa84b
to
797b6ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think I am ok. Would be good to have another approval from maintainer before merging this.
797b6ee
to
1ac4463
Compare
@alanprot @friedrichg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome tests. Small questions related to naming
CHANGELOG.md
Outdated
@@ -8,6 +8,7 @@ | |||
* [FEATURE] Ruler: Minimize chances of missed rule group evaluations that can occur due to OOM kills, bad underlying nodes, or due to an unhealthy ruler that appears in the ring as healthy. This feature is enabled via `-ruler.enable-ha-evaluation` flag. #6129 | |||
* [FEATURE] Store Gateway: Add an in-memory chunk cache. #6245 | |||
* [FEATURE] Chunk Cache: Support multi level cache and add metrics. #6249 | |||
* [ENHANCEMENT] Query Frontend: Add new query stats metrics `cortex_query_total_queryable_samples_total` and `cortex_query_peak_samples` to track totalQueryableSamples and peakSample per user. #6228 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the metric cortex_query_samples_processed_total or cortex_query_total_queryable_samples_total?
Can you double check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching it. cortex_query_samples_processed_total
is right.
pkg/querier/stats/stats.proto
Outdated
@@ -39,4 +39,10 @@ message Stats { | |||
// The total size of postings touched in store gateway for a specific query, in bytes. | |||
// Only successful requests from querier to store gateway are included. | |||
uint64 store_gateway_touched_posting_bytes = 12; | |||
// The total number of samples scanned while evaluating a query. | |||
// Equal to TotalSamples in https://github.com/prometheus/prometheus/blob/main/util/stats/query_stats.go | |||
uint64 processed_samples = 13; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand where the word "processed" comes from. Can you explain?
Maybe scanned_samples sounds better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Processed
means the total number of considered samples when the query is executed.
But now, I feel like scanned
is more intuitive to understand. Thanks.
2853868
to
593582c
Compare
pkg/frontend/transport/handler.go
Outdated
querySeconds *prometheus.CounterVec | ||
querySeries *prometheus.CounterVec | ||
queryFetchedSamples *prometheus.CounterVec | ||
queryTotalQueryableSamples *prometheus.CounterVec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queryTotalQueryableSamples *prometheus.CounterVec | |
queryScannedSamples *prometheus.CounterVec |
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good for consistency. Thanks you!
593582c
to
a097955
Compare
Signed-off-by: SungJin1212 <[email protected]>
a097955
to
6c77531
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
What this PR does:
Add new query stats metrics
cortex_query_samples_scanned_total
andcortex_query_peak_samples
.These track
scannedSamples
andPeakSamples
for each in https://github.com/prometheus/prometheus/blob/main/util/stats/query_stats.goWhich issue(s) this PR fixes:
Fixes
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]