-
Notifications
You must be signed in to change notification settings - Fork 293
clean up and commit state trie for snapsync #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -126,6 +126,31 @@ func (s *Store) RebuildEvmSnapshot(root common.Hash) { | |||
s.Snaps.Rebuild(root) | |||
} | |||
|
|||
// Commit changes. | |||
func (s *Store) CleanCommit(block iblockproc.BlockState) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may change it so that CleanCommit
would get executed inside evmstore.Store.Commit
if flush=true
. It seems that we should either have 2 functions (CleanCommit
and Commit
) but without the flush
argument, or only Commit
function with the flush
argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But with the current code if flush=true
mean we does not clean any things but commit everything to the db like archive mode
gossip/evmstore/store.go
Outdated
current := uint64(block.LastBlock.Idx) | ||
// Garbage collect all below the current block | ||
for !s.triegc.Empty() { | ||
root, number := s.triegc.Pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we dereference all roots but the latest one, then it conflicts a little with evmstore.Store.Flush
. evmstore.Store.Flush
tries to commit head
, head-1
, head-31
roots. All of them but head
will already be dereferenced, so we can simplify it and commit only the head
inside Flush
. Flush
also won't need the getBlock
argument anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we dereference all roots bit the latest one, it will make them inaccessible by API calls (if I understand correctly). Then, we don't respect the TriesInMemory
constant. As far as I understand, we should either make TriesInMemory
=1 or not dereference roots inside CleanCommit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uprendis Yes, I think in our case, we only need to commit only the head
inside Flush
due to go-opera don't need to re-org if I understand correctly. We might change TriesInMemory
=1. I will consolidate all the related things and commit the code afterward. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uprendis by the way, on the 2nd thought, if we periodic call the commit every 10 seconds with period: 10 * time.Millisecond
in the PeriodicFlusher
then I think we don't have any effective way to pruning the state on mem-cache. Or I mis-understand smth on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't prune every 10 ms, we check if a commit is needed every 10ms. flushable.Store tracks the size of in-memory diff. We sum those sizes for every DB and compare it to a configured threshold. Practically we commit every ~30-100 blocks on mainnet.
But yes, it does mean that geth's pruning strategy may be not very effective for us, because it's based only on avoiding writing MPT data (unless node stops, or TrieTimeLimit
passed since prev flush, or Cap
is called and dirties map is too large). All we can archive with geth's approach is that we will flush every ~30-100th root rather than every root (which is still better than nothing, but not ideal in terms of data pruning). To make a more greedy pruning, we would need to not only avoid writing MPT data, but also erase previously written MPT data if it's not referenced anymore by latest root(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it seems that the s.dbs.NotFlushedSizeEst()
quickly return a big number then the commit happen very frequency
I tried to set MaxNonFlushedSize=256
but the NotFlushedSizeEst quickly become larger than that then the commit is call in every seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uprendis I debug log and found that with current snapsync, the latest block state is always at block number 1
// GetBlockState retrieves the latest block state
func (s *Store) GetBlockState() iblockproc.BlockState {
return *s.getBlockEpochState().BlockState
}
So the below commitEVM invoke has no meaning for state commit for the current code on develop branch
// TODO: prune old MPTs in beginnings of committed sections
if !s.store.cfg.EVM.Cache.TrieDirtyDisabled {
s.store.commitEVM(true)
}
And further more I think we also cannot make the trie state pruning by using the current triegc due to in that way, we need to record every block's state root into the triegc queue then doing the pruning later. but with snapsync, we didn't record the state root into the triegc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debug log and found that with current snapsync, the latest block state is always at block number 1
Yeah, that's right. While snapsync hasn't finished, current block number won't move. After snapsync, we'll jump from an epoch which we had before the snapsync, to the new epoch
Yes, Service.commit during snapsync will only flush the content of flushable.Store's. s.store.commitEVM will practically do nothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And further more I think we also cannot make the trie state pruning by using the current triegc due to in that way, we need to record every block's state root into the triegc queue then doing the pruning later. but with snapsync, we didn't record the state root into the triegc.
Hmm, I don't quite understand. After snapsync, the MPTs will already be pruned, because we downloaded only latest roots. After snapsync, triegc should work (not effectively as it'll commit every ~30-100th root)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't quite understand. After snapsync, the MPTs will already be pruned, because we downloaded only latest roots. After snapsync, triegc should work (not effectively as it'll commit every ~30-100th root)?
Yes, I were thinking about when the snapsync in-progress so I though like that. After snapsync it be work like you said.
cmd/opera/launcher/launcher.go
Outdated
@@ -116,6 +116,7 @@ func initFlags() { | |||
validatorPubkeyFlag, | |||
validatorPasswordFlag, | |||
SyncModeFlag, | |||
utils.GCModeFlag, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make a copy of utils.GCModeFlag
but with default value = archive
. It's because some existing nodes rely on the absence of GC, and will be overwhelmed if we implicitly enable GC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I got it
it might prune the old state trie more effectively on the disk also by running with Fantom-foundation/go-ethereum#29 |
@uprendis I found that currently, we have a cap call on every commitEVM with very low max default size, it's just 22 MB Line 244 in 10ca669
So the pruning cannot reach the most effective. Of course, we can increase it relatively with the total if we synced mainnet from the latest snapshot at block How do you think about it and can we increase the default value to 1024 MB or at least to 512 MB like default trie dirty size? |
Got you! Let's keep MaxNonFlushedSize the same but introduce a separate config field for |
Upd: we already have the |
okay, I change the code like in this commit hadv@48e8d28, refactoring the
Yes, I will check the difference sizes for it. Actually, I already tested some difference values at default, 256 MB and 1024 MB. 256 MB seems good enough with |
@uprendis Basically, the more cache we use, the more dirty trie we can clean. But base on the below metrics table I think that the default value for old code (MaxNonFlushedSize)
new code (TrieDirtyLimit)
|
Previously, by default opera only support
archive
gcmode, this PR will enable the--gcmode
for opera node with two new options after introducing thesnapsync
--gcmode=light
: prune trie state in memory cache--gcmode=full
: prune trie state both in memory cache and in persistent dbAnd add new command to run compact db on non-running node.