-
Notifications
You must be signed in to change notification settings - Fork 160
feat: 8841: Need a way to change the number of object key buckets in MerkleDb #18910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… MerkleDb Signed-off-by: Artem Ananev <[email protected]>
I think I get the idea. I was puzzled how you doubled the size of index with concurrent access but I think I understand index expansion only happens on startup? And because it is fast it should not be a big impact on start-up time. I wonder if there is a way to expand while reading from disk without being much slower. Rather than needing enough RAM for new and old indexes. It would be really nice to key this off heuristics so that we record amount of buckets that have overflowed or number of buckets that have reached >= 90% full. Then if more then 20% of buckets have passed that threshold we double map size on next restart. That way there is no operations monitoring and config changes needed to maintain size. This would be great for non-mainnet networks as they could be configured with minimal sizes and expand as needed, as long as they are restarted. We could add atomic reference to index and do compare and swap to allow for live index doubling but not sure if that is too performance expensive. |
I think we should strive for goal of automatic management, minimum operations effort needed. Or what Oracle would call "Autonomous" 😀 |
Automatic index resizing on startup looks doable, but I'd like to track it in a separate ticket. We currently don't collect stats on individual buckets, for example, the number of buckets >= 90% full. All we have is total number of entries and current number of buckets. With really bad hashing function, it's possible that the map is only 25% full, but more than 20% buckets have too many entries. |
Yes, the idea is to resize at startup only. There will be no single point in time, when both old and new indexes are in memory. HDHM should detect that a resize is needed before its index is loaded from a snapshot. Resizing on the fly is more complicated, but doesn't provide any benefits. When a bucket is loaded from disk (e.g. for update during flush), and it isn't found - is it because it's an empty bucket, or because its entries are still in the N-1-bit "old" bucket? Same checks need to be done on reads, too. When a bucket is finally updated, its data location in the index is set, and according to our current implementation the whole index up to the point is allocated in memory anyway. |
Signed-off-by: Artem Ananev <[email protected]>
Signed-off-by: Artem Ananev <[email protected]>
Signed-off-by: Artem Ananev <[email protected]>
Signed-off-by: Artem Ananev <[email protected]>
Signed-off-by: Artem Ananev <[email protected]>
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
… size hints Signed-off-by: Artem Ananev <[email protected]>
…HM.GOOD_AVERAGE_BUCKET_ENTRY_COUNT Signed-off-by: Artem Ananev <[email protected]>
Signed-off-by: Artem Ananev <[email protected]>
Signed-off-by: Artem Ananev <[email protected]>
...m-sdk/swirlds-merkledb/src/main/java/com/swirlds/merkledb/files/hashmap/HalfDiskHashMap.java
Outdated
Show resolved
Hide resolved
platform-sdk/swirlds-merkledb/src/main/java/com/swirlds/merkledb/MerkleDbTableConfig.java
Show resolved
Hide resolved
...vice-impl/src/main/java/com/hedera/node/app/service/token/impl/schemas/V0490TokenSchema.java
Show resolved
Hide resolved
platform-sdk/swirlds-merkledb/src/main/java/com/swirlds/merkledb/config/MerkleDbConfig.java
Show resolved
Hide resolved
Signed-off-by: Artem Ananev <[email protected]>
...ervice-impl/src/main/java/com/hedera/node/app/service/file/impl/schemas/V0490FileSchema.java
Outdated
Show resolved
Hide resolved
...sdk/swirlds-merkledb/src/timingSensitive/java/com/swirlds/merkledb/MerkleDbSnapshotTest.java
Show resolved
Hide resolved
platform-sdk/swirlds-benchmarks/src/jmh/java/com/swirlds/benchmark/VirtualMapBaseBench.java
Outdated
Show resolved
Hide resolved
platform-sdk/swirlds-merkledb/src/main/java/com/swirlds/merkledb/files/hashmap/Bucket.java
Show resolved
Hide resolved
platform-sdk/swirlds-merkledb/src/main/java/com/swirlds/merkledb/files/hashmap/Bucket.java
Outdated
Show resolved
Hide resolved
platform-sdk/swirlds-merkledb/src/main/java/com/swirlds/merkledb/files/hashmap/Bucket.java
Show resolved
Hide resolved
platform-sdk/swirlds-merkledb/src/main/java/com/swirlds/merkledb/files/hashmap/Bucket.java
Outdated
Show resolved
Hide resolved
...m-sdk/swirlds-merkledb/src/main/java/com/swirlds/merkledb/files/hashmap/HalfDiskHashMap.java
Outdated
Show resolved
Hide resolved
...m-sdk/swirlds-merkledb/src/main/java/com/swirlds/merkledb/files/hashmap/HalfDiskHashMap.java
Outdated
Show resolved
Hide resolved
...m-sdk/swirlds-merkledb/src/main/java/com/swirlds/merkledb/MerkleDbCompactionCoordinator.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Artem Ananev <[email protected]>
Signed-off-by: Artem Ananev <[email protected]>
platform-sdk/swirlds-merkledb/src/main/java/com/swirlds/merkledb/files/hashmap/Bucket.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from execution changes. Thanks @artemananiev!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in principle, reviewed design and had light look at code. Very cleaver idea :-)
Signed-off-by: Artem Ananev <[email protected]>
27a133f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tyvm @artemananiev !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved for consensus module related changes
Fix summary:
HalfDiskHashMap
:resizeIfNeeded()
Bucket
:sanitize()
HDHM.resizeIfNeeded()
Fixes: #8841
Signed-off-by: Artem Ananev [email protected]