-
Notifications
You must be signed in to change notification settings - Fork 999
Level 0 merge #4697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/v22.3.0
Are you sure you want to change the base?
Level 0 merge #4697
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR specializes level 0 Bucket merges to operate entirely in-memory, reducing addBatch time under high load. Key changes include:
- Updating upgrade tests to reflect new merge counter expectations.
- Adding in-memory merge support in LiveBucket (including new state mEntries and associated APIs) and updating related merging logic.
- Adjusting tests (BucketManagerTests, BucketIndexTests) and index construction (InMemoryIndex) to account for in-memory entries.
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
src/herder/test/UpgradesTests.cpp | Updates in upgrade test assertions to reflect revised merge counter values. |
src/bucket/test/BucketManagerTests.cpp | Adds conditional index file existence checks based on configured page size. |
src/bucket/test/BucketIndexTests.cpp | Modifies cache hit/miss expectations to subtract in-memory entry counts. |
src/bucket/LiveBucket{.h,.cpp} | Introduces new in-memory merging support and related APIs (hasInMemoryEntries etc.). |
src/bucket/InMemoryIndex{.h,.cpp} | Extends in-memory index construction to support new merge behaviors. |
src/bucket/BucketOutputIterator{.h,.cpp} | Updates getBucket to optionally use in-memory index state during bucket adoption. |
src/bucket/BucketListBase{.h,.cpp} | Introduces prepareFirstLevel for level 0 in-memory merges with fallback logic. |
src/bucket/BucketBase{.h,.cpp} | Refactors mergeInternal and related merging logic to support new in-memory flows. |
src/bucket/BucketMergeAdapter.h and HotArchiveBucket{.h,.cpp} | Minor updates to support passing putFunc instead of output iterators. |
Comments suppressed due to low confidence (2)
src/bucket/HotArchiveBucket.h:67
- Typo in comment: 'te' should be 'the'.
// This function always writes te given entry to the output
src/bucket/test/BucketIndexTests.cpp:312
- Ensure that subtracting 'sumOfInMemoryEntries' from 'numLoads' accurately accounts for in-memory entries in cache hit/miss metrics, and that tests remain robust across different environments.
REQUIRE(hitMeter.count() == startingHitCount + numLoads - sumOfInMemoryEntries);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, I'm a fan of this change! I added a few suggestions for safer code plus a couple of clarification questions/requests for comments. Could you also share a bit more about performance gains you've observed while running this? I imagine fully in-memory addBatch flow would reduce most of addBatch overhead.
@@ -2103,13 +2103,12 @@ TEST_CASE("upgrade to version 11", "[upgrades]") | |||
// Check several subtle characteristics of the post-upgrade | |||
// environment: | |||
// - Old-protocol merges stop happening (there should have | |||
// been 6 before the upgrade, but we re-use a merge we did | |||
// at ledger 1 for ledger 2 spill, so the counter is at 5) | |||
// been 6 before the upgrade) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this test changed. We should still be able to re-use merges, shouldn't we?
@@ -234,7 +234,15 @@ TEST_CASE_VERSIONS("bucketmanager ownership", "[bucket][bucketmanager]") | |||
std::string indexFilename = | |||
app->getBucketManager().bucketIndexFilename(b->getHash()); | |||
CHECK(fs::exists(filename)); | |||
CHECK(fs::exists(indexFilename)); | |||
|
|||
if (b->getIndexForTesting().getPageSize() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why did this change?
std::vector<BucketInputIterator<BucketT>> const& shadowIterators) | ||
{ | ||
// Don't count shadow metrics for Hot Archive BucketList | ||
if constexpr (std::is_same_v<BucketT, HotArchiveBucket>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be calling this function on HotArchiveBuckets at all, right? (shadows were dropped in protocol 12). Could we enforce that? I realize this was here before, but since we're already making changes, this is a good opportunity to harden the code.
@@ -36,10 +36,87 @@ InMemoryBucketState::scan(IterT start, LedgerKey const& searchKey) const | |||
return {IndexReturnT(), mEntries.begin()}; | |||
} | |||
|
|||
InMemoryIndex::InMemoryIndex(BucketManager& bm, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's a lot of code duplication with the other InMemoryIndex constructor, would it be possible to refactor this? (Maybe MergeAdapter is really just an iterator rather than a container for two bucket inputs?)
constexpr std::streamoff xdrOverheadBetweenEntries = 4; | ||
|
||
// 4 bytes of BucketEntry overhead for METAENTRY | ||
constexpr std::streamoff xdrOverheadForMetaEntry = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much do we win from in-memory indexing at commit time? The manual offset managing worries me a bit, as it's error-prone. We now need to maintain two different ways of managing offsets.
{ | ||
// If we have an in-memory merged result, use that | ||
setCurr(mNextCurrInMemory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add a level 0 assert here as well?
@@ -363,6 +363,7 @@ template <class BucketT> class BucketLevel | |||
FutureBucket<BucketT> mNextCurr; | |||
std::shared_ptr<BucketT> mCurr; | |||
std::shared_ptr<BucketT> mSnap; | |||
std::shared_ptr<BucketT> mNextCurrInMemory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use std::variant to represent "next curr" in a single variable? (it'd be safer and less error-prone)
auto snap = LiveBucket::fresh( | ||
app.getBucketManager(), currLedgerProtocol, inputVectors..., | ||
countMergeEvents, app.getClock().getIOContext(), doFsync, | ||
/*storeInMemory=*/true, /*shouldIndex=*/false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldIndex
is false here because indexing is redundant: we merge this newly created bucket right away and create a new (persistent) index anyway. Is that correct? I think a comment with some explanation would be nice here.
out.put(e); | ||
} | ||
|
||
// Store the merged entries in memory in the new bucket in case this is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about this comment: why do we merge level 1 in-memory?
Description
Resolves #4696
This change special cases level 0 Bucket merges to happen all in-memory, without file IO. To accomplish this, all level 0 buckets retain an in-memory vector of their contents. Instead of using file iterators to perform merges, these in-memory vectors are used. I've also reduced the number of times we index buckets during addBatch and use the in-memory vectors for indexing as well.
This reduces addBatch time by about 50% under high load as seen in max tps tests.
Checklist
clang-format
v8.0.0 (viamake format
or the Visual Studio extension)