-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Combine Appender.Add and AddFast into a single Append method. #8489
Conversation
ae843e2
to
9db7cf0
Compare
This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends. This makes the API easier to consume and implement. In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values. Signed-off-by: Tom Wilkie <[email protected]>
Yeah, no impact on benchmarks - 4.49s vs 4.91s is within the noise for my computer.
|
/prombench master |
Thanks! I think it makes sense as Add was calling AddFast anyway. cc @bwplotka as another consumer of the TSDB |
/prombench cancel Prombench is OK. |
Benchmark cancel is in progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This change will introduce a major performance penalty in Cortex ingesters. The way it works in cortex is:
After this change, we don't know if labels will be retained or not by |
I am happy to revert this back for now and reconsider how can it be merged into a single method without hurting Cortex perf. |
To give more context, when we introduced the ref cache in Cortex ingesters (using labels as YOLO strings to lookup the cache), memory allocations were reduced by -25%: |
…#8489)" This reverts commit 7369561. Signed-off-by: Ganesh Vernekar <[email protected]>
PR to revert this #8519 |
Can you store another copy of the labels in the ref cache? This is effectively what Prometheus does (store a copy of the labels in the scape cache, and use them on subsequent calls to Add). |
Good point. Yes, we should be able to solve it this way. |
Let me know if this works! If not we can find another way. I'd note that if you know the ref, you can pass null for the labels... but this is abusing the code a bit... |
Yes, this is impacting Thanos a bit as well, I believe this PR could be done more carefully instead of merging so quickly. Let me schedule some experiment with this on Thanos side. |
The main branch tests are not passing due to the fact that prometheus#8489 was not rebased on top of prometheus#8007. Signed-off-by: Julien Pivotto <[email protected]>
Things needed to be updated to accomodate this change in interface: prometheus/prometheus#8489 Signed-off-by: Goutham Veeramachaneni <[email protected]>
I hadn't noticed this change, but came up with an idea to remove the refcache - see #8600 and cortexproject/cortex#3951 |
* Update Prometheus vendoring Things needed to be updated to accomodate this change in interface: prometheus/prometheus#8489 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update to main again To pull in prometheus/prometheus#8558 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Handle the case of ref changing. Signed-off-by: Goutham Veeramachaneni <[email protected]> * Put things back. Was removed in error Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update Prometheus again OOh, Prometheus is exciting again :D Signed-off-by: Goutham Veeramachaneni <[email protected]>
Am I right in thinking the only way a reference can become invalid is via head compaction? |
Yes |
* Update Prometheus vendoring Things needed to be updated to accomodate this change in interface: prometheus/prometheus#8489 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update to main again To pull in prometheus/prometheus#8558 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Handle the case of ref changing. Signed-off-by: Goutham Veeramachaneni <[email protected]> * Put things back. Was removed in error Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update Prometheus again OOh, Prometheus is exciting again :D Signed-off-by: Goutham Veeramachaneni <[email protected]> Former-commit-id: 3fedc11
* Update Prometheus vendoring Things needed to be updated to accomodate this change in interface: prometheus/prometheus#8489 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update to main again To pull in prometheus/prometheus#8558 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Handle the case of ref changing. Signed-off-by: Goutham Veeramachaneni <[email protected]> * Put things back. Was removed in error Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update Prometheus again OOh, Prometheus is exciting again :D Signed-off-by: Goutham Veeramachaneni <[email protected]>
* Update Prometheus vendoring Things needed to be updated to accomodate this change in interface: prometheus/prometheus#8489 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update to main again To pull in prometheus/prometheus#8558 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Handle the case of ref changing. Signed-off-by: Goutham Veeramachaneni <[email protected]> * Put things back. Was removed in error Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update Prometheus again OOh, Prometheus is exciting again :D Signed-off-by: Goutham Veeramachaneni <[email protected]>
* Update Prometheus vendoring Things needed to be updated to accomodate this change in interface: prometheus/prometheus#8489 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update to main again To pull in prometheus/prometheus#8558 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Handle the case of ref changing. Signed-off-by: Goutham Veeramachaneni <[email protected]> * Put things back. Was removed in error Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update Prometheus again OOh, Prometheus is exciting again :D Signed-off-by: Goutham Veeramachaneni <[email protected]>
* Update Prometheus vendoring Things needed to be updated to accomodate this change in interface: prometheus/prometheus#8489 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update to main again To pull in prometheus/prometheus#8558 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Handle the case of ref changing. Signed-off-by: Goutham Veeramachaneni <[email protected]> * Put things back. Was removed in error Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update Prometheus again OOh, Prometheus is exciting again :D Signed-off-by: Goutham Veeramachaneni <[email protected]>
* Update Prometheus vendoring Things needed to be updated to accomodate this change in interface: prometheus/prometheus#8489 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update to main again To pull in prometheus/prometheus#8558 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Handle the case of ref changing. Signed-off-by: Goutham Veeramachaneni <[email protected]> * Put things back. Was removed in error Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update Prometheus again OOh, Prometheus is exciting again :D Signed-off-by: Goutham Veeramachaneni <[email protected]>
* Update Prometheus vendoring Things needed to be updated to accomodate this change in interface: prometheus/prometheus#8489 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update to main again To pull in prometheus/prometheus#8558 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Handle the case of ref changing. Signed-off-by: Goutham Veeramachaneni <[email protected]> * Put things back. Was removed in error Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update Prometheus again OOh, Prometheus is exciting again :D Signed-off-by: Goutham Veeramachaneni <[email protected]>
* Update Prometheus vendoring Things needed to be updated to accomodate this change in interface: prometheus/prometheus#8489 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update to main again To pull in prometheus/prometheus#8558 Signed-off-by: Goutham Veeramachaneni <[email protected]> * Handle the case of ref changing. Signed-off-by: Goutham Veeramachaneni <[email protected]> * Put things back. Was removed in error Signed-off-by: Goutham Veeramachaneni <[email protected]> * Update Prometheus again OOh, Prometheus is exciting again :D Signed-off-by: Goutham Veeramachaneni <[email protected]>
for _, ts := range req.Timeseries { | ||
labels := labelProtosToLabels(ts.Labels) | ||
for _, s := range ts.Samples { | ||
_, err = app.Add(labels, s.Timestamp, s.Value) | ||
_, err = app.Append(0, labels, s.Timestamp, s.Value) | ||
if err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hello, I have a question, why don't reuse 'SeriesRef' returned by 'app.Append'? If the 'SeriesRef' invalid (via compactHead gc), will a new memSeries be created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each time this line is entered, labels
is different. We can’t reuse the value from the previous loop iteration, and there is nowhere convenient to store it for longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for _, ts := range req.Timeseries {
labels := labelProtosToLabels(ts.Labels)
// default to 0,need to search `memSeries` by labels at first traverse
var ref storage.SeriesRef
// this inner for loop, labels are always the same
// this `ts` may have multiple samples, `ref` can be reused to avoid searching `memSeries` each traverse
for _, s := range ts.Samples {
ref, err = app.Append(ref, labels, s.Timestamp, s.Value)
if err != nil {
switch errors.Cause(err) {
case storage.ErrOutOfOrderSample, storage.ErrOutOfBounds, storage.ErrDuplicateSampleForTimestamp:
level.Error(h.logger).Log("msg", "Out of order sample from remote write", "err", err.Error(), "series", labels.String(), "timestamp", s.Timestamp)
}
return err
}
}
will there be any problem? multiple samples of a time series have the same labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see what you mean.
It’s not typical to send multiple samples for the same timeseries, but your version is more efficient if it did happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

In this scenario, time series data is generated by aggregating logs and written to the prometheus through remote write, so there will be multiple samples;
I grabbed the cpu profile through pprof, and it looks like labelProtosToLabels and seriesHashmap account for about 7.61 percent of the total. If reuse is SeriesRefit looks like seriesHashmap account for about 3.51 percent of the total. If reuse is SeriesRef, it should be much better. Can I submit a pr to optimize this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like labelProtosToLabels and seriesHashmap account for about 7.61 percent of the total. If reuse is SeriesRef, it should be much better.
I don't see where labelProtosToLabels
would change.
I will note that remote-write handling has been optimised in various downstream projects (Cortex, Thanos, Mimir), using many tricks to avoid allocating memory.
This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends. Effectively we merge the two function into one that is given both the labels and any cached ref. If the cached ref is valid, that is used - otherwise a (potentially new) series and ref is found/created using the labels.
This makes the API easier to consume and implement. In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values.
I don't believe this should have any performance impact, and am running benchmarks to confirm right now.
Signed-off-by: Tom Wilkie [email protected]