clean up and commit state trie for snapsync #210

hadv · 2021-12-29T04:08:34Z

Previously, by default opera only support archive gcmode, this PR will enable the --gcmode for opera node with two new options after introducing the snapsync

--gcmode=light: prune trie state in memory cache
--gcmode=full: prune trie state both in memory cache and in persistent db

And add new command to run compact db on non-running node.

uprendis · 2022-01-04T10:16:41Z

gossip/evmstore/store.go

@@ -126,6 +126,31 @@ func (s *Store) RebuildEvmSnapshot(root common.Hash) {
 	s.Snaps.Rebuild(root)
 }

+// Commit changes.
+func (s *Store) CleanCommit(block iblockproc.BlockState) error {


We may change it so that CleanCommit would get executed inside evmstore.Store.Commit if flush=true. It seems that we should either have 2 functions (CleanCommit and Commit) but without the flush argument, or only Commit function with the flush argument

But with the current code if flush=true mean we does not clean any things but commit everything to the db like archive mode

uprendis · 2022-01-04T10:25:36Z

gossip/evmstore/store.go

+	current := uint64(block.LastBlock.Idx)
+	// Garbage collect all below the current block
+	for !s.triegc.Empty() {
+		root, number := s.triegc.Pop()


If we dereference all roots but the latest one, then it conflicts a little with evmstore.Store.Flush. evmstore.Store.Flush tries to commit head, head-1, head-31 roots. All of them but head will already be dereferenced, so we can simplify it and commit only the head inside Flush. Flush also won't need the getBlock argument anymore

Also, if we dereference all roots bit the latest one, it will make them inaccessible by API calls (if I understand correctly). Then, we don't respect the TriesInMemory constant. As far as I understand, we should either make TriesInMemory=1 or not dereference roots inside CleanCommit

@uprendis Yes, I think in our case, we only need to commit only the head inside Flush due to go-opera don't need to re-org if I understand correctly. We might change TriesInMemory=1. I will consolidate all the related things and commit the code afterward. Thanks!

@uprendis by the way, on the 2nd thought, if we periodic call the commit every 10 seconds with period: 10 * time.Millisecond in the PeriodicFlusher then I think we don't have any effective way to pruning the state on mem-cache. Or I mis-understand smth on here.

We don't prune every 10 ms, we check if a commit is needed every 10ms. flushable.Store tracks the size of in-memory diff. We sum those sizes for every DB and compare it to a configured threshold. Practically we commit every ~30-100 blocks on mainnet.
But yes, it does mean that geth's pruning strategy may be not very effective for us, because it's based only on avoiding writing MPT data (unless node stops, or TrieTimeLimit passed since prev flush, or Cap is called and dirties map is too large). All we can archive with geth's approach is that we will flush every ~30-100th root rather than every root (which is still better than nothing, but not ideal in terms of data pruning). To make a more greedy pruning, we would need to not only avoid writing MPT data, but also erase previously written MPT data if it's not referenced anymore by latest root(s)

Yes, but it seems that the s.dbs.NotFlushedSizeEst() quickly return a big number then the commit happen very frequency

I tried to set MaxNonFlushedSize=256 but the NotFlushedSizeEst quickly become larger than that then the commit is call in every seconds.

@uprendis I debug log and found that with current snapsync, the latest block state is always at block number 1

// GetBlockState retrieves the latest block state func (s *Store) GetBlockState() iblockproc.BlockState { return *s.getBlockEpochState().BlockState }

So the below commitEVM invoke has no meaning for state commit for the current code on develop branch

// TODO: prune old MPTs in beginnings of committed sections if !s.store.cfg.EVM.Cache.TrieDirtyDisabled { s.store.commitEVM(true) }

And further more I think we also cannot make the trie state pruning by using the current triegc due to in that way, we need to record every block's state root into the triegc queue then doing the pruning later. but with snapsync, we didn't record the state root into the triegc.

I debug log and found that with current snapsync, the latest block state is always at block number 1

Yeah, that's right. While snapsync hasn't finished, current block number won't move. After snapsync, we'll jump from an epoch which we had before the snapsync, to the new epoch
Yes, Service.commit during snapsync will only flush the content of flushable.Store's. s.store.commitEVM will practically do nothing

And further more I think we also cannot make the trie state pruning by using the current triegc due to in that way, we need to record every block's state root into the triegc queue then doing the pruning later. but with snapsync, we didn't record the state root into the triegc.

Hmm, I don't quite understand. After snapsync, the MPTs will already be pruned, because we downloaded only latest roots. After snapsync, triegc should work (not effectively as it'll commit every ~30-100th root)?

Hmm, I don't quite understand. After snapsync, the MPTs will already be pruned, because we downloaded only latest roots. After snapsync, triegc should work (not effectively as it'll commit every ~30-100th root)?

Yes, I were thinking about when the snapsync in-progress so I though like that. After snapsync it be work like you said.

uprendis · 2022-01-04T10:31:39Z

cmd/opera/launcher/launcher.go

@@ -116,6 +116,7 @@ func initFlags() {
 		validatorPubkeyFlag,
 		validatorPasswordFlag,
 		SyncModeFlag,
+		utils.GCModeFlag,


Let's make a copy of utils.GCModeFlag but with default value = archive. It's because some existing nodes rely on the absence of GC, and will be overwhelmed if we implicitly enable GC

okay, I got it

hadv · 2022-01-10T11:29:02Z

with clean up code (a.k.a --gcmode=full) the disk i/o is better than archive gcmode

FULL: average ~40MB/s

ARCHIVE: average +50MB/s

hadv · 2022-01-24T03:11:50Z

After snapsync (same for both --gcmode=archive and --gcmode=full), --gcmode=full still save a little bit disk i/o and total disk space than --gcmode=archive.

Below are metrics on the last 6 hours.

FULL: disk i/o ~45MB/s, total disk space: ~55GB

ARCHIVE: disk i/o ~ 65MB/s, total disk space: ~58GB

hadv · 2022-02-23T03:29:59Z

it might prune the old state trie more effectively on the disk also by running with Fantom-foundation/go-ethereum#29

hadv · 2022-04-01T16:39:11Z

@uprendis I found that currently, we have a cap call on every commitEVM with very low max default size, it's just 22 MB

go-opera/gossip/config.go

Line 244 in 10ca669

MaxNonFlushedSize: 17*opt.MiB + scale.I(5*opt.MiB),

So the pruning cannot reach the most effective. Of course, we can increase it relatively with the total --cache size but I think it's still small. I tried to increase the MaxNonFlushedSize default value to 1024 MB and made a test on mainnet and it bring more effective for pruning.

if we synced mainnet from the latest snapshot at block 32661715 to block 33800000 then it can save more than 50 GB disk space compare to the current default one.

How do you think about it and can we increase the default value to 1024 MB or at least to 512 MB like default trie dirty size?

uprendis · 2022-04-01T17:14:07Z

Got you! Let's keep MaxNonFlushedSize the same but introduce a separate config field for s.evm.Cap(s.cfg.MaxNonFlushedSize/3, s.cfg.MaxNonFlushedSize/4) instead of s.cfg.MaxNonFlushedSize/3 and s.cfg.MaxNonFlushedSize/4
Could you please check different sizes for this field in a range of 32 MB - 1024 MB?

uprendis · 2022-04-01T17:20:07Z

Upd: we already have the TrieDirtyLimit field, we should use it
I think we should erase the func (s *Store) Cap(max, min int) { method and just call s.EvmState.TrieDB().Cap

hadv · 2022-04-04T03:37:42Z

Upd: we already have the TrieDirtyLimit field, we should use it I think we should erase the func (s *Store) Cap(max, min int) { method and just call s.EvmState.TrieDB().Cap

okay, I change the code like in this commit hadv@48e8d28, refactoring the func (s *Store) Cap() to reuse in difference places.

Could you please check different sizes for this field in a range of 32 MB - 1024 MB?

Yes, I will check the difference sizes for it. Actually, I already tested some difference values at default, 256 MB and 1024 MB. 256 MB seems good enough with --cache=160000 (16 GB) but I tested in same block range but in difference block heights then comparison a bit improperly. I will check difference size at the same block height to avoid comparing apple to orange.

hadv · 2022-04-08T04:59:25Z

@uprendis Basically, the more cache we use, the more dirty trie we can clean. But base on the below metrics table I think that the default value for TrieDirtyLimit is 256 MB is good enough due to if we use much more cache than that the effectiveness of dirty trie cleaning is not significant improvement. If node operator need to be more effective (but not much), they can increase the total cache size.

old code (MaxNonFlushedSize)

cache	block	dbsize
22 MB	32800000	378 GB
22 MB	33000000	400 GB

new code (TrieDirtyLimit)

default total cache = 3096 MB

cache	block	dbsize
32 MB	32800000	377 GB
32 MB	33000000	398 GB
160 MB	32800000	373 GB
160 MB	33000000	388 GB
256 MB	32800000	372 GB
256 MB	33000000	386 GB

total cache = 16000 MB

cache	block	dbsize
32 MB	32800000	372 GB
32 MB	33000000	388 GB
64 MB	32800000	372 GB
64 MB	33000000	385 GB
128 MB	32800000	371 GB
128 MB	33000000	385 GB
256 MB	32800000	371 GB
256 MB	33000000	385 GB
512 MB	32800000	371 GB (388875540)
512 MB	33000000	385 GB (403147896)
1024 MB	32800000	371 GB (388874108)
1024 MB	33000000	385 GB (403149576)

hadv requested review from uprendis and rus-alex December 29, 2021 04:08

hadv self-assigned this Dec 29, 2021

hadv requested a review from andrecronje as a code owner December 29, 2021 04:08

hadv requested a review from cyberbono3 December 29, 2021 04:09

hadv force-pushed the state-pruning branch from b7e0692 to 34ee8da Compare January 4, 2022 02:42

uprendis reviewed Jan 4, 2022

View reviewed changes

hadv force-pushed the state-pruning branch from 6a6f168 to 2542d03 Compare January 6, 2022 12:05

hadv requested a review from uprendis January 6, 2022 16:03

hadv added 3 commits January 20, 2022 10:31

clean up and commit state trie for snapsync

fd4a06d

fix some review code comments

0b707d1

get back value for TriesInMemory

44ab8a3

hadv force-pushed the state-pruning branch from d81e572 to 44ab8a3 Compare January 20, 2022 03:31

only need to clean state trie when snapsync is finalized

11cf122

Merge branch 'develop' into state-pruning

80ac828

hadv added 2 commits February 27, 2022 19:40

remove potential overlap garbage collection with clean commit

528e153

don't need cleanup trie node when flush

d2e81db

hadv mentioned this pull request Mar 3, 2022

trie: prunning old state trie node on disk Fantom-foundation/go-ethereum#29

Merged

hadv added 7 commits March 8, 2022 09:49

use new version of go-ethereum

017414f

Merge branch 'develop' into state-pruning

20d2edf

enable greedy gc when running full gcmode

1d8d9ab

upgrade ethereum

3fc4269

add warn log when cap

73f711f

upgrade ethereum

3f91af7

try to keep 128 trie in memory

e6a46bd

hadv added 7 commits March 9, 2022 16:48

upgrade ethereum

2030059

change gcmode flag to bool flag

7801ae3

update flag to enable gc light mode

7626176

gc all below the current block

62595d7

upgrade ethereum

ab62c70

upgrade ethereum

a521d1e

upgrade ethereum

aebe2ae

gossip: use TrieDirtyLimit instead of MaxNonFlushedSize

48e8d28

hadv added 15 commits April 8, 2022 14:56

gossip: set TrieDirtyLimit default value to 256 MB

09c90c2

get back MaxNonFlushedSize

a90dfc8

keep some recent trie nodes in memory

bee17fc

keep the last 16 recent trie nodes to avoid error

9b801f5

try to call dereference on every commit

344789a

add db compact cmd

c55215e

add some logs

fc37e48

close db after compacting

3252c12

add db stats before and after compaction

a76f49e

lower the log level

8947499

update some logs

672e075

defaut value for gcmode is fine without extra handling

76e9821

avoid underflow of cap limit

8b9a633

upgrade opera ethereum

7b35fca

Merge branch 'develop' into state-pruning

b17d4dc

uprendis merged commit b8e0fc6 into Fantom-foundation:develop Jun 11, 2022

clean up and commit state trie for snapsync #210

clean up and commit state trie for snapsync #210

Uh oh!

Conversation

hadv commented Dec 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadv Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadv Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadv commented Jan 10, 2022

Uh oh!

hadv commented Jan 24, 2022

Uh oh!

hadv commented Feb 23, 2022

Uh oh!

hadv commented Apr 1, 2022

Uh oh!

uprendis commented Apr 1, 2022

Uh oh!

uprendis commented Apr 1, 2022

Uh oh!

hadv commented Apr 4, 2022

Uh oh!

hadv commented Apr 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

hadv commented Dec 29, 2021 •

edited

Loading

hadv Jan 5, 2022 •

edited

Loading

hadv Jan 5, 2022 •

edited

Loading

hadv commented Apr 8, 2022 •

edited

Loading