Skip to content

handle conflits in prw v2 #39570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

perebaj
Copy link
Contributor

@perebaj perebaj commented Apr 22, 2025

Description

Handle conflicts in prw v2

Link to tracking issue

Partially fixes #33661

Copy link
Contributor

@jmichalek132 jmichalek132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thank you for picking this up.,

Copy link
Member

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, nice job. I have a comment on comments :) Also it would be nice to test the collision handling somehow, either find a conflict (I've started to run a simple finder on 4 cores, maybe get lucky) or make the signature function be possible to influence.

if len(ts1.LabelsRefs) != len(ts2.LabelsRefs) {
return false
}
// As the labels are sorted as name, value, name, value, ... we can compare the labels by index jumping 2 steps at a time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz update the comment on (new) line 134:

// TODO: Read the PRW spec to see if labels need to be sorted. If it is, then we need to sort in export code. If not, we can sort in the test. (@dashpole have more context on this)

since we're going to depend on this, it should say something like "We need to sort labels for..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't know if I got it. Here we are comparing the LabelsRefs from different TSs, that are a list of integer arranged to represent: name, value, name, value....

The sort that we are doing on L132-L135 is related to sort the []prompb.Label. The confusion here was made because I used the sort word?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the sort word is a bit ambiguous here, let's say

Suggested change
// As the labels are sorted as name, value, name, value, ... we can compare the labels by index jumping 2 steps at a time
// As the labels are ordered as name, value, name, value, ... we can compare the labels by index jumping 2 steps at a time

But anyway, isSameMetricV2 will only work correctly if the order of labels is consistent, since otherwise it will say "not the same" for {a="1", b="2"} vs {b="2", a="1"} label sets.

On lines 132-135 we're sorting the labels []prompb.Label , but then we're converting these labels into the references on lines 137-143. So the order of the references is the same as the order of the sorted labels. Which means that we ensure consistent order for isSameMetricV2 with the sort on 132-135.

I did take a look at where the labels come from and it's createAttributes which will return consistent ordering probably - config changes not withstanding. So we could probably get away without making the sort, but I feel like that's brittle.

@@ -131,9 +143,39 @@ func (c *prometheusConverterV2) addSample(sample *writev2.Sample, lbls []prompb.
off = c.symbolTable.Symbolize(l.Value)
buf = append(buf, off)
}
ts := writev2.TimeSeries{

sig := timeSeriesSignature(lbls)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a future PR: timeSeriesSignature also does a sort by label name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I added it here because we are triggering a flaky test. I added more details here. I was trying to debug it with @dashpole before his leaving. This is a TODO task to me figure out how to fix🙃

require.Len(t, converter.unique, 1)
require.Len(t, converter.unique[timeSeriesSignature(labels)].Samples, 2)
})
// TODO: Test 3 Conflict - different metrics with same hash
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the current set of tests were a little bit poor. Struggling to implement this one... Maybe these that I added could be a good start point. Let me know what you think.

Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see a bug in that the case is not handled when a conflicting time series already exists. Please instead follow the example of the PRW v1 code, and introduce a method getOrCreateTimeSeries for obtaining the time series to add the sample to. Implementing this method correctly should solve the bug and make for more understandable/performant code.

@@ -131,9 +141,40 @@ func (c *prometheusConverterV2) addSample(sample *writev2.Sample, lbls []prompb.
off = c.symbolTable.Symbolize(l.Value)
buf = append(buf, off)
}
ts := writev2.TimeSeries{

sig := timeSeriesSignature(lbls)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should here follow the same logic as in prometheusConverter.addSample, and use a method getOrCreateTimeSeries to obtain the time series to add the sample to. Creating a new time series for every sample even if the time series already exists is bad for performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Let's keep the code consistent between v1 and v2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this review. Now I had time to come back to it.

Fixed here: 2afc6dc

// if the time series is already in the unique map, check if it is the same metric
if !isSameMetricV2(existingTS, ts) {
// if the time series is not the same metric, add it to the conflicts map
c.conflicts[sig] = append(c.conflicts[sig], ts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug since you're not checking whether c.conflicts[sig] already has a time series with the same labels.

@perebaj
Copy link
Contributor Author

perebaj commented May 13, 2025

I'm leaving this PR in standby, meanwhile I'm focused on the receiver... I have plans to come back to it soon.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jun 10, 2025
@jmichalek132
Copy link
Contributor

Hey @perebaj I was wondering if you have an idea when you will come back to this pr :) ?

@github-actions github-actions bot removed the Stale label Jun 22, 2025
@perebaj
Copy link
Contributor Author

perebaj commented Jun 24, 2025

Hey @perebaj I was wondering if you have an idea when you will come back to this pr :) ?

Hey, Im getting some trouble to finish this one. Before that, my entire focus will be histograms...

Would you like to continue this one?

@jmichalek132
Copy link
Contributor

jmichalek132 commented Jun 25, 2025

Hey @perebaj I was wondering if you have an idea when you will come back to this pr :) ?

Hey, Im getting some trouble to finish this one. Before that, my entire focus will be histograms...

Would you like to continue this one?

If you don't get to it within the next two weeks (I have some travel planned) I can pick it up when I am back :).

@perebaj perebaj marked this pull request as ready for review June 30, 2025 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[exporter/prometheusremotewrite] Support Prometheus Remote-Write v2
8 participants