-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Support ingesting exemplars into TSDB when blocks storage is enabled #4104
Conversation
…to tsdb when blocks storage is enabled. Signed-off-by: Martin Disibio <[email protected]>
Signed-off-by: Martin Disibio <[email protected]>
Signed-off-by: Martin Disibio <[email protected]>
…rded per reason Signed-off-by: Martin Disibio <[email protected]>
53819ac
to
9f0b9d8
Compare
@@ -26,7 +26,8 @@ message WriteResponse {} | |||
message TimeSeries { | |||
repeated LabelPair labels = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "LabelAdapter"]; | |||
// Sorted by time, oldest sample first. | |||
repeated Sample samples = 2 [(gogoproto.nullable) = false]; | |||
repeated Sample samples = 2 [(gogoproto.nullable) = false]; | |||
repeated Exemplar exemplars = 3 [(gogoproto.nullable) = false]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check if we need to sort the exemplars by timestamp as well (see the comment above the samples line)
pkg/ingester/ingester_v2.go
Outdated
|
||
// app.AppendExemplar currently doesn't create the series, it must | ||
// already exist. If it does not then drop. TODO(mdisibio) - better way to handle? | ||
if ref == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In prometheus we skip the exemplar and increment a counter, probably best to just do the same here. Because right now a TimeSeries
only contains a sample OR an exdemplar, continuing here is valid imo. If we included samples and exemplars in the same TimeSeries
I think the only case where we could reach this line (post appending the sample) and still not have a valid reference ID is if appending the sample itself failed, which afaict only happens in the event of an invalid labelset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is true. We have two choices, we could count them as failed or discarded due to validation. In this case I think failed is more appropriate because there isn't anything wrong with the exemplar itself, it is more that it couldn't be ingested due to a limitation in the current tsdb implementation. If/when tsdb AppendExemplar is updated to create the series then the same data would be ingested successfully.
Update: Went with failed approach.
e := exemplar.Exemplar{ | ||
Value: ex.Value, | ||
Ts: ex.TimestampMs, | ||
HasTs: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're ingesting via remote write, as long as all exemplars have the same value for this field (whether it's true or false) we would be able to dedupe properly
|
||
labelSetLen := 0 | ||
for _, l := range e.Labels { | ||
labelSetLen += len(l.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to update our exemplar memory estimate since the "
, ,
, and =
aren't included in the character count for the label set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checked the cortex exemplars doc, looks like you got it right there, just have to double check the prometheus doc :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was going off https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#exemplars, and there is a link to this at the top of validate.go where the const is defined. Will add some comments here too.
Signed-off-by: Martin Disibio <[email protected]>
Signed-off-by: Martin Disibio <[email protected]>
…lars Signed-off-by: Martin Disibio <[email protected]>
Signed-off-by: Martin Disibio <[email protected]>
Moved source branch to the |
What this PR does:
Support for an in-memory buffer of exemplars was added to TSDB recently. This PR takes the first steps to supporting the same in cortex's ingest path by enabling the feature in TSDB and storing exemplars from the remote write data. Future PRs will add query support and integration with per-tenant limits.
This PR is marked WIP because there are several parts that could use some consideration:
This is enabled with a new
-blocks-storage.tsdb.max-exemplars=<n>
command line argument. This is available only to the ingester, but ideally there is a way to have the distributor be aware and skip validation of exemplars (currently it always validates any exemplars even if discarded by the ingester). Is there a recommend config location to have the param shared between both distributors and ingesters?Exemplars are counted in the rate limiting in the distributor. This seems good since exemplars have processing overhead, but wanted to double check if there is something else that should be done.
There are 5 exemplar metrics in TSDB and they are exposed for per-tenant. However, a new
cortex_ingester_ingested_exemplars_total
global metric was also added which follows the pattern for samples, so although partially redundant seems worthwhile.The PR for remote write of exemplars in Prometheus (Add Exemplar Remote Write support prometheus/prometheus#8296) is not merged yet, so the proto is still subject to change.
Which issue(s) this PR fixes:
n/a
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]