-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce ingester memory by using TSDB's index #3951
Conversation
This looks like nice improvement, but I think we should get the change into Prometheus first, to avoid using forked version. |
Yeah I thought I would get some traction here before opening the Prometheus PR. |
I opened the Prometheus PR; let's see what comments it gets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ed0a59e
to
569eeaf
Compare
Updated after Prometheus PR merged. |
e5cd7ce
to
a4a468a
Compare
pkg/ingester/ingester_v2.go
Outdated
// The labels must be sorted (in our case, it's guaranteed a write request | ||
// has sorted labels once hit the ingester). | ||
cachedRef, copiedLabels, cachedRefExists := db.refCache.Ref(startAppend, cortexpb.FromLabelAdaptersToLabels(ts.Labels)) | ||
|
||
// Look up a reference for this series. Holding the appendLock ensures that no compaction will happen while we use it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holding the appendLock ensures that no compaction will happen while we use it.
Are you talking about acquireAppendLock()
we have in Cortex ingester? It just ensures the forced compaction will not happen while there are on-going appends, but not the regular head compaction (see the userDB.Compact()
call in compactBlocks()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if TSDB GetRef()
would also returned interned series labels so we'll just re-pass them to the Append()
? It would also allow us to reuse the same interned strings in the active series tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I meant what acquireAppendLock()
does. Well that's a bit of a problem then.
Making GetRef()
return labels would work, but shifts a bit more responsibility on to TSDB to work in a particular way.
E.g. it couldn't store the strings as a highly compressed trie.
I guess that's not a huge problem; Prometheus can just break the interface and then Cortex needs to find another solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a new version with GetRef()
returning labels.
0377e54
to
130b2b7
Compare
48570c3
to
bef9df0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @bboreham ! Faster with less code ❤️ I just have a comment about a possible edge case.
I'm running this branch in production now; Go heap is 20% smaller as predicted. |
52a7946
to
bc9e4fa
Compare
So we can check the efficiency of changes to `v2Push()`. Broadly copied from `BenchmarkIngesterPush()`. Signed-off-by: Bryan Boreham <[email protected]>
prometheus/prometheus#8600 adds a method to `TSDB.Appender` which allows us to save building a parallel cache, reducing ingester heap by about 20%. We depend on values from GetRef() remaining valid while v2Push() uses them. Currently the only way a ref can be invalidated is by a head compaction, which cannot happen while v2Push() holds the append lock. Signed-off-by: Bryan Boreham <[email protected]>
Now we only need to make a copy if GetRef() returns zero Note Prometheus update brings in JSON marshalling of model.Duration prometheus/common#280 Signed-off-by: Bryan Boreham <[email protected]>
Signed-off-by: Bryan Boreham <[email protected]>
Pinned gRPC and other dependencies changed by the update in Prometheus to avoid taking so much change on this PR. Signed-off-by: Bryan Boreham <[email protected]>
bc9e4fa
to
29c20d2
Compare
I've now pinned all the changed dependencies except Prometheus in this PR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid work @bboreham! Sorry for my late reply 👏
What this PR does:
RefCache
is doing almost exactly the same work as TSDB, with a sharded map of series by labels.prometheus/prometheus#8600 and prometheus/prometheus#8641 add a method to
TSDB.Appender
which allows us to save building a parallel cache, reducing ingester heap by about 20%.We depend on values from
GetRef()
being valid to retain elsewhere.Also added a benchmark, broadly copied from
BenchmarkIngesterPush()
.Tests show the memory reduction is around 20% when TSDB's index has been created from the WAL, in which case
RefCache
has a copy of all the labels for series that have received more data after restart.Not much CPU reduction overall, since series are typically looked up many more times than they are added.
Checklist
CHANGELOG.md
updated