Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Support ingesting exemplars into TSDB when blocks storage is enabled #4104

Closed
wants to merge 8 commits into from

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Apr 22, 2021

What this PR does:
Support for an in-memory buffer of exemplars was added to TSDB recently. This PR takes the first steps to supporting the same in cortex's ingest path by enabling the feature in TSDB and storing exemplars from the remote write data. Future PRs will add query support and integration with per-tenant limits.

This PR is marked WIP because there are several parts that could use some consideration:

  1. This is enabled with a new -blocks-storage.tsdb.max-exemplars=<n> command line argument. This is available only to the ingester, but ideally there is a way to have the distributor be aware and skip validation of exemplars (currently it always validates any exemplars even if discarded by the ingester). Is there a recommend config location to have the param shared between both distributors and ingesters?

  2. Exemplars are counted in the rate limiting in the distributor. This seems good since exemplars have processing overhead, but wanted to double check if there is something else that should be done.

  3. There are 5 exemplar metrics in TSDB and they are exposed for per-tenant. However, a new cortex_ingester_ingested_exemplars_total global metric was also added which follows the pattern for samples, so although partially redundant seems worthwhile.

  4. The PR for remote write of exemplars in Prometheus (Add Exemplar Remote Write support prometheus/prometheus#8296) is not merged yet, so the proto is still subject to change.

Which issue(s) this PR fixes:
n/a

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

…to tsdb when blocks storage is enabled.

Signed-off-by: Martin Disibio <[email protected]>
Signed-off-by: Martin Disibio <[email protected]>
@@ -26,7 +26,8 @@ message WriteResponse {}
message TimeSeries {
repeated LabelPair labels = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "LabelAdapter"];
// Sorted by time, oldest sample first.
repeated Sample samples = 2 [(gogoproto.nullable) = false];
repeated Sample samples = 2 [(gogoproto.nullable) = false];
repeated Exemplar exemplars = 3 [(gogoproto.nullable) = false];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check if we need to sort the exemplars by timestamp as well (see the comment above the samples line)


// app.AppendExemplar currently doesn't create the series, it must
// already exist. If it does not then drop. TODO(mdisibio) - better way to handle?
if ref == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In prometheus we skip the exemplar and increment a counter, probably best to just do the same here. Because right now a TimeSeries only contains a sample OR an exdemplar, continuing here is valid imo. If we included samples and exemplars in the same TimeSeries I think the only case where we could reach this line (post appending the sample) and still not have a valid reference ID is if appending the sample itself failed, which afaict only happens in the event of an invalid labelset.

Copy link
Contributor Author

@mdisibio mdisibio Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is true. We have two choices, we could count them as failed or discarded due to validation. In this case I think failed is more appropriate because there isn't anything wrong with the exemplar itself, it is more that it couldn't be ingested due to a limitation in the current tsdb implementation. If/when tsdb AppendExemplar is updated to create the series then the same data would be ingested successfully.

Update: Went with failed approach.

e := exemplar.Exemplar{
Value: ex.Value,
Ts: ex.TimestampMs,
HasTs: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're ingesting via remote write, as long as all exemplars have the same value for this field (whether it's true or false) we would be able to dedupe properly


labelSetLen := 0
for _, l := range e.Labels {
labelSetLen += len(l.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to update our exemplar memory estimate since the ", ,, and = aren't included in the character count for the label set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checked the cortex exemplars doc, looks like you got it right there, just have to double check the prometheus doc :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was going off https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#exemplars, and there is a link to this at the top of validate.go where the const is defined. Will add some comments here too.

@mdisibio
Copy link
Contributor Author

Moved source branch to the grafana/cortex repo so a new PR was opened here: #4124

@mdisibio mdisibio closed this Apr 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants