Skip to content

clean up and commit state trie for snapsync #210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Jun 11, 2022

Conversation

hadv
Copy link
Contributor

@hadv hadv commented Dec 29, 2021

Previously, by default opera only support archive gcmode, this PR will enable the --gcmode for opera node with two new options after introducing the snapsync

  1. --gcmode=light: prune trie state in memory cache
  2. --gcmode=full: prune trie state both in memory cache and in persistent db

And add new command to run compact db on non-running node.

@hadv hadv requested review from uprendis and rus-alex December 29, 2021 04:08
@hadv hadv self-assigned this Dec 29, 2021
@hadv hadv requested a review from andrecronje as a code owner December 29, 2021 04:08
@hadv hadv requested a review from cyberbono3 December 29, 2021 04:09
@@ -126,6 +126,31 @@ func (s *Store) RebuildEvmSnapshot(root common.Hash) {
s.Snaps.Rebuild(root)
}

// Commit changes.
func (s *Store) CleanCommit(block iblockproc.BlockState) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may change it so that CleanCommit would get executed inside evmstore.Store.Commit if flush=true. It seems that we should either have 2 functions (CleanCommit and Commit) but without the flush argument, or only Commit function with the flush argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But with the current code if flush=true mean we does not clean any things but commit everything to the db like archive mode

current := uint64(block.LastBlock.Idx)
// Garbage collect all below the current block
for !s.triegc.Empty() {
root, number := s.triegc.Pop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we dereference all roots but the latest one, then it conflicts a little with evmstore.Store.Flush. evmstore.Store.Flush tries to commit head, head-1, head-31 roots. All of them but head will already be dereferenced, so we can simplify it and commit only the head inside Flush. Flush also won't need the getBlock argument anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we dereference all roots bit the latest one, it will make them inaccessible by API calls (if I understand correctly). Then, we don't respect the TriesInMemory constant. As far as I understand, we should either make TriesInMemory=1 or not dereference roots inside CleanCommit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uprendis Yes, I think in our case, we only need to commit only the head inside Flush due to go-opera don't need to re-org if I understand correctly. We might change TriesInMemory=1. I will consolidate all the related things and commit the code afterward. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uprendis by the way, on the 2nd thought, if we periodic call the commit every 10 seconds with period: 10 * time.Millisecond in the PeriodicFlusher then I think we don't have any effective way to pruning the state on mem-cache. Or I mis-understand smth on here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't prune every 10 ms, we check if a commit is needed every 10ms. flushable.Store tracks the size of in-memory diff. We sum those sizes for every DB and compare it to a configured threshold. Practically we commit every ~30-100 blocks on mainnet.
But yes, it does mean that geth's pruning strategy may be not very effective for us, because it's based only on avoiding writing MPT data (unless node stops, or TrieTimeLimit passed since prev flush, or Cap is called and dirties map is too large). All we can archive with geth's approach is that we will flush every ~30-100th root rather than every root (which is still better than nothing, but not ideal in terms of data pruning). To make a more greedy pruning, we would need to not only avoid writing MPT data, but also erase previously written MPT data if it's not referenced anymore by latest root(s)

Copy link
Contributor Author

@hadv hadv Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it seems that the s.dbs.NotFlushedSizeEst() quickly return a big number then the commit happen very frequency

I tried to set MaxNonFlushedSize=256 but the NotFlushedSizeEst quickly become larger than that then the commit is call in every seconds.

Copy link
Contributor Author

@hadv hadv Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uprendis I debug log and found that with current snapsync, the latest block state is always at block number 1

// GetBlockState retrieves the latest block state
func (s *Store) GetBlockState() iblockproc.BlockState {
	return *s.getBlockEpochState().BlockState
}

So the below commitEVM invoke has no meaning for state commit for the current code on develop branch

	// TODO: prune old MPTs in beginnings of committed sections
	if !s.store.cfg.EVM.Cache.TrieDirtyDisabled {
		s.store.commitEVM(true)
	}

And further more I think we also cannot make the trie state pruning by using the current triegc due to in that way, we need to record every block's state root into the triegc queue then doing the pruning later. but with snapsync, we didn't record the state root into the triegc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debug log and found that with current snapsync, the latest block state is always at block number 1

Yeah, that's right. While snapsync hasn't finished, current block number won't move. After snapsync, we'll jump from an epoch which we had before the snapsync, to the new epoch
Yes, Service.commit during snapsync will only flush the content of flushable.Store's. s.store.commitEVM will practically do nothing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And further more I think we also cannot make the trie state pruning by using the current triegc due to in that way, we need to record every block's state root into the triegc queue then doing the pruning later. but with snapsync, we didn't record the state root into the triegc.

Hmm, I don't quite understand. After snapsync, the MPTs will already be pruned, because we downloaded only latest roots. After snapsync, triegc should work (not effectively as it'll commit every ~30-100th root)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't quite understand. After snapsync, the MPTs will already be pruned, because we downloaded only latest roots. After snapsync, triegc should work (not effectively as it'll commit every ~30-100th root)?

Yes, I were thinking about when the snapsync in-progress so I though like that. After snapsync it be work like you said.

@@ -116,6 +116,7 @@ func initFlags() {
validatorPubkeyFlag,
validatorPasswordFlag,
SyncModeFlag,
utils.GCModeFlag,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make a copy of utils.GCModeFlag but with default value = archive. It's because some existing nodes rely on the absence of GC, and will be overwhelmed if we implicitly enable GC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I got it

@hadv hadv requested a review from uprendis January 6, 2022 16:03
@hadv
Copy link
Contributor Author

hadv commented Jan 10, 2022

with clean up code (a.k.a --gcmode=full) the disk i/o is better than archive gcmode

FULL: average ~40MB/s
Screen Shot 2022-01-10 at 18 23 50

ARCHIVE: average +50MB/s

Screen Shot 2022-01-10 at 18 24 08

@hadv
Copy link
Contributor Author

hadv commented Jan 24, 2022

After snapsync (same for both --gcmode=archive and --gcmode=full), --gcmode=full still save a little bit disk i/o and total disk space than --gcmode=archive.

Below are metrics on the last 6 hours.

FULL: disk i/o ~45MB/s, total disk space: ~55GB
Screen Shot 2022-01-24 at 10 04 51

ARCHIVE: disk i/o ~ 65MB/s, total disk space: ~58GB
Screen Shot 2022-01-24 at 10 05 15

@hadv
Copy link
Contributor Author

hadv commented Feb 23, 2022

it might prune the old state trie more effectively on the disk also by running with Fantom-foundation/go-ethereum#29

@hadv
Copy link
Contributor Author

hadv commented Apr 1, 2022

@uprendis I found that currently, we have a cap call on every commitEVM with very low max default size, it's just 22 MB

MaxNonFlushedSize: 17*opt.MiB + scale.I(5*opt.MiB),

So the pruning cannot reach the most effective. Of course, we can increase it relatively with the total --cache size but I think it's still small. I tried to increase the MaxNonFlushedSize default value to 1024 MB and made a test on mainnet and it bring more effective for pruning.

if we synced mainnet from the latest snapshot at block 32661715 to block 33800000 then it can save more than 50 GB disk space compare to the current default one.

How do you think about it and can we increase the default value to 1024 MB or at least to 512 MB like default trie dirty size?

@uprendis
Copy link
Contributor

uprendis commented Apr 1, 2022

Got you! Let's keep MaxNonFlushedSize the same but introduce a separate config field for s.evm.Cap(s.cfg.MaxNonFlushedSize/3, s.cfg.MaxNonFlushedSize/4) instead of s.cfg.MaxNonFlushedSize/3 and s.cfg.MaxNonFlushedSize/4
Could you please check different sizes for this field in a range of 32 MB - 1024 MB?

@uprendis
Copy link
Contributor

uprendis commented Apr 1, 2022

Upd: we already have the TrieDirtyLimit field, we should use it
I think we should erase the func (s *Store) Cap(max, min int) { method and just call s.EvmState.TrieDB().Cap

@hadv
Copy link
Contributor Author

hadv commented Apr 4, 2022

Upd: we already have the TrieDirtyLimit field, we should use it I think we should erase the func (s *Store) Cap(max, min int) { method and just call s.EvmState.TrieDB().Cap

okay, I change the code like in this commit hadv@48e8d28, refactoring the func (s *Store) Cap() to reuse in difference places.

Could you please check different sizes for this field in a range of 32 MB - 1024 MB?

Yes, I will check the difference sizes for it. Actually, I already tested some difference values at default, 256 MB and 1024 MB. 256 MB seems good enough with --cache=160000 (16 GB) but I tested in same block range but in difference block heights then comparison a bit improperly. I will check difference size at the same block height to avoid comparing apple to orange.

@hadv
Copy link
Contributor Author

hadv commented Apr 8, 2022

@uprendis Basically, the more cache we use, the more dirty trie we can clean. But base on the below metrics table I think that the default value for TrieDirtyLimit is 256 MB is good enough due to if we use much more cache than that the effectiveness of dirty trie cleaning is not significant improvement. If node operator need to be more effective (but not much), they can increase the total cache size.

old code (MaxNonFlushedSize)

cache block dbsize
22 MB 32800000 378 GB
22 MB 33000000 400 GB

new code (TrieDirtyLimit)

  • default total cache = 3096 MB
cache block dbsize
32 MB 32800000 377 GB
32 MB 33000000 398 GB
160 MB 32800000 373 GB
160 MB 33000000 388 GB
256 MB 32800000 372 GB
256 MB 33000000 386 GB
  • total cache = 16000 MB
cache block dbsize
32 MB 32800000 372 GB
32 MB 33000000 388 GB
64 MB 32800000 372 GB
64 MB 33000000 385 GB
128 MB 32800000 371 GB
128 MB 33000000 385 GB
256 MB 32800000 371 GB
256 MB 33000000 385 GB
512 MB 32800000 371 GB (388875540)
512 MB 33000000 385 GB (403147896)
1024 MB 32800000 371 GB (388874108)
1024 MB 33000000 385 GB (403149576)

@uprendis uprendis merged commit b8e0fc6 into Fantom-foundation:develop Jun 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants