Skip to content

feat: 8841: Need a way to change the number of object key buckets in MerkleDb #18910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
May 27, 2025

Conversation

artemananiev
Copy link
Contributor

@artemananiev artemananiev commented Apr 18, 2025

Fix summary:

  • A new method is introduced to HalfDiskHashMap: resizeIfNeeded()
  • This method is called after every flush to MerkleDb data source. It checks if the average bucket entry count in the HDHM exceeds a threshold, and doubles the number of buckets if needed
  • A new method is introduced to Bucket: sanitize()
  • This method checks bucket's index and also removes all stale entries - entries that were valid for this bucket before HDHM resize, but now are in a different bucket
  • Changed the way how MerkleDb path and bucket index sizes are calculated, when a new MerkleDb instance is created
  • Many VirtualMap size hints in Hedera services are updated to fit better into this new way
  • New unit tests are introduced to cover HDHM.resizeIfNeeded()

Fixes: #8841
Signed-off-by: Artem Ananev [email protected]

@artemananiev artemananiev added Platform Virtual Map Platform Data Structures Platform Tickets pertaining to the platform Scalability Issues related to the scalability problems labels Apr 18, 2025
@artemananiev artemananiev self-assigned this Apr 18, 2025
@jasperpotts
Copy link
Contributor

I think I get the idea. I was puzzled how you doubled the size of index with concurrent access but I think I understand index expansion only happens on startup? And because it is fast it should not be a big impact on start-up time. I wonder if there is a way to expand while reading from disk without being much slower. Rather than needing enough RAM for new and old indexes.

It would be really nice to key this off heuristics so that we record amount of buckets that have overflowed or number of buckets that have reached >= 90% full. Then if more then 20% of buckets have passed that threshold we double map size on next restart. That way there is no operations monitoring and config changes needed to maintain size. This would be great for non-mainnet networks as they could be configured with minimal sizes and expand as needed, as long as they are restarted. We could add atomic reference to index and do compare and swap to allow for live index doubling but not sure if that is too performance expensive.

@jasperpotts
Copy link
Contributor

I think we should strive for goal of automatic management, minimum operations effort needed. Or what Oracle would call "Autonomous" 😀

@artemananiev
Copy link
Contributor Author

It would be really nice to key this off heuristics so that we record amount of buckets that have overflowed or number of buckets that have reached >= 90% full. Then if more then 20% of buckets have passed that threshold we double map size on next restart. That way there is no operations monitoring and config changes needed to maintain size. This would be great for non-mainnet networks as they could be configured with minimal sizes and expand as needed, as long as they are restarted. We could add atomic reference to index and do compare and swap to allow for live index doubling but not sure if that is too performance expensive.

Automatic index resizing on startup looks doable, but I'd like to track it in a separate ticket. We currently don't collect stats on individual buckets, for example, the number of buckets >= 90% full. All we have is total number of entries and current number of buckets. With really bad hashing function, it's possible that the map is only 25% full, but more than 20% buckets have too many entries.

@artemananiev
Copy link
Contributor Author

I think I get the idea. I was puzzled how you doubled the size of index with concurrent access but I think I understand index expansion only happens on startup? And because it is fast it should not be a big impact on start-up time. I wonder if there is a way to expand while reading from disk without being much slower. Rather than needing enough RAM for new and old indexes.

Yes, the idea is to resize at startup only. There will be no single point in time, when both old and new indexes are in memory. HDHM should detect that a resize is needed before its index is loaded from a snapshot.

Resizing on the fly is more complicated, but doesn't provide any benefits. When a bucket is loaded from disk (e.g. for update during flush), and it isn't found - is it because it's an empty bucket, or because its entries are still in the N-1-bit "old" bucket? Same checks need to be done on reads, too. When a bucket is finally updated, its data location in the index is set, and according to our current implementation the whole index up to the point is allocated in memory anyway.

Copy link

codacy-production bot commented May 15, 2025

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.05% (target: -1.00%) 84.92%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (1e9b28d) 100740 73848 73.31%
Head commit (27a133f) 100830 (+90) 73963 (+115) 73.35% (+0.05%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#18910) 179 152 84.92%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Copy link

codecov bot commented May 15, 2025

Codecov Report

Attention: Patch coverage is 77.65363% with 40 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...wirlds/merkledb/files/hashmap/HalfDiskHashMap.java 81.57% 11 Missing and 3 partials ⚠️
...ava/com/swirlds/merkledb/files/hashmap/Bucket.java 78.57% 9 Missing and 3 partials ⚠️
...java/com/swirlds/merkledb/MerkleDbTableConfig.java 70.00% 2 Missing and 4 partials ⚠️
...wirlds/merkledb/MerkleDbCompactionCoordinator.java 0.00% 1 Missing and 1 partial ⚠️
.../java/com/swirlds/merkledb/MerkleDbDataSource.java 75.00% 1 Missing and 1 partial ⚠️
...ava/com/swirlds/merkledb/files/DataFileCommon.java 0.00% 2 Missing ⚠️
...rlds/demo/migration/MigrationTestingToolState.java 0.00% 1 Missing ⚠️
...m/swirlds/merkledb/files/hashmap/ParsedBucket.java 90.00% 0 Missing and 1 partial ⚠️

Impacted file tree graph

@@             Coverage Diff              @@
##               main   #18910      +/-   ##
============================================
+ Coverage     69.34%   69.39%   +0.04%     
- Complexity    23296    23327      +31     
============================================
  Files          2648     2647       -1     
  Lines        100835   100925      +90     
  Branches      10435    10448      +13     
============================================
+ Hits          69926    70035     +109     
+ Misses        26940    26915      -25     
- Partials       3969     3975       +6     
Files with missing lines Coverage Δ Complexity Δ
...hedera/node/app/hints/schemas/V059HintsSchema.java 100.00% <ø> (ø) 4.00 <0.00> (ø)
...hedera/node/app/hints/schemas/V060HintsSchema.java 84.21% <ø> (ø) 8.00 <0.00> (ø)
...e/consensus/impl/schemas/V0490ConsensusSchema.java 100.00% <ø> (ø) 4.00 <0.00> (ø)
...app/service/file/impl/schemas/V0490FileSchema.java 86.54% <ø> (ø) 56.00 <0.00> (ø)
...ice/schedule/impl/schemas/V0490ScheduleSchema.java 100.00% <ø> (ø) 7.00 <0.00> (ø)
...ice/schedule/impl/schemas/V0570ScheduleSchema.java 100.00% <ø> (ø) 8.00 <0.00> (ø)
...ice/contract/impl/schemas/V0490ContractSchema.java 100.00% <ø> (ø) 6.00 <0.00> (ø)
...p/service/token/impl/schemas/V0490TokenSchema.java 95.45% <ø> (ø) 4.00 <0.00> (ø)
...p/service/token/impl/schemas/V0530TokenSchema.java 100.00% <ø> (ø) 3.00 <0.00> (ø)
...va/com/swirlds/merkledb/config/MerkleDbConfig.java 92.30% <ø> (ø) 5.00 <0.00> (ø)
... and 11 more

... and 11 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@artemananiev artemananiev requested review from akugal, thenswan and anthony-swirldslabs and removed request for jasperpotts May 17, 2025 00:11
@artemananiev artemananiev added this to the v0.63 milestone May 17, 2025
@artemananiev artemananiev changed the title draft: 8841: Need a way to change the number of object key buckets in MerkleDb feat: 8841: Need a way to change the number of object key buckets in MerkleDb May 17, 2025
@artemananiev artemananiev marked this pull request as ready for review May 17, 2025 00:12
@artemananiev artemananiev requested review from a team and tinker-michaelj as code owners May 17, 2025 00:12
Signed-off-by: Artem Ananev <[email protected]>
Signed-off-by: Artem Ananev <[email protected]>
OlegMazurov
OlegMazurov previously approved these changes May 23, 2025
Copy link
Contributor

@Neeharika-Sompalli Neeharika-Sompalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from execution changes. Thanks @artemananiev!

Copy link
Contributor

@jasperpotts jasperpotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in principle, reviewed design and had light look at code. Very cleaver idea :-)

Signed-off-by: Artem Ananev <[email protected]>
Copy link
Contributor

@tinker-michaelj tinker-michaelj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tyvm @artemananiev !

Copy link
Contributor

@poulok poulok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved for consensus module related changes

@artemananiev artemananiev merged commit d5acbd8 into main May 27, 2025
56 of 57 checks passed
@artemananiev artemananiev deleted the 8841-M-hdhm-bucket-index-resize branch May 27, 2025 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Platform Data Structures Platform Virtual Map Platform Tickets pertaining to the platform Scalability Issues related to the scalability problems
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need a way to change the number of object key buckets in MerkleDb
8 participants