Skip to content

Allow Checkpoint Sync on State Beyond Finalization #5128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ethDreamer
Copy link
Member

@ethDreamer ethDreamer commented Jan 25, 2024

Issue Addressed

Proposed Changes

  • Add --checkpoint-sync-state-id to enable user to specify a checkpoint StateId on CLI
  • Prevent checkpoint syncing from a state outside the DA boundary if the chain is past the deneb fork
  • Add AnchorState enum to ForkChoice to specify if the finalized state is truly finalized or just a non-revertible checkpiont state
  • Detect if checkpoint state is newer than finalization automatically when using --checkpoint-sync-url
  • Verify above automatic detection works with checkpointz
  • Don't filter out sync peers with younger finalization if our finalized state is a non-revertible checkpoint state
  • Properly persist AnchorState during shutdown

Additional Info

This is my first time messing around with fork-choice (been avoiding it..). I could use some extra scrutiny on the fork-choice modifications specifically.

@ethDreamer ethDreamer added work-in-progress PR is a work-in-progress deneb labels Jan 25, 2024
@ethDreamer ethDreamer force-pushed the checkpoint_sync_state_target branch from a33f25e to 8c071ce Compare January 29, 2024 09:19
@realbigsean realbigsean added the v5.0.0 Q1 2024 label Jan 31, 2024
@ethDreamer
Copy link
Member Author

So this PR doesn't work with checkpointz but I have made @samcm aware of the need to implement this. That's the last thing on the checklist. Not sure what else should be added to this PR.

@ethDreamer ethDreamer added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Feb 7, 2024
@realbigsean
Copy link
Member

So this PR doesn't work with checkpointz but I have made @samcm aware of the need to implement this. That's the last thing on the checklist. Not sure what else should be added to this PR.

default behavior still works right? is it just that checkpointz won't serve non-finalized?

@@ -447,6 +456,12 @@ impl<T: BeaconChainTypes> BeaconChain<T> {
self.canonical_head.cached_head_read_lock().snapshot.clone()
}

/// Returns the `AnchorState` of the finalized checkpoint
/// TODO: determine if this puts a burdon on the fork choice lock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is used every time we add a peer who's finalized is lower than ours. Might not be a big deal in practice. But if it is, we could track "has our anchor state ever been set to AnchorState::Finalized" because it will never go from AnchorState::NonRevertible => AnchorState::Finalized. If we track that, we won't need to bother fork choice as soon as finalization advances at any point after startup

pub type PersistedForkChoice = PersistedForkChoiceV20;

#[superstruct(
variants(V19, V20),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the versioning scheme we've got outlined here, the struct should be named based on the version it was introduced, so I think we want variants(V1, V20) here since PersistedForkChoice has always existed: https://github.com/sigp/lighthouse/blob/stable/beacon_node/beacon_chain/src/schema_change/README.md#naming-conventions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked @michaelsproul earlier about versioning this inner PersistedForkChoice because I noticed we had a separate outer PersistedForkChoice and I was worried that the outer one was created to shield the inner one from versioning or something. He told me we've already had different versions of the inner PersistedForkChoice. They just get removed eventually as we stop supporting super old DB versions.

So it wouldn't make sense to name it V1 since it hasn't actually been the same since V1. I suppose we could look at old code to try and figure out the last time this struct was modified. But before we go there let's back up & consider this comment from the guide you linked (which I'm going to slightly edit to adhere to the discussion at hand):

Previously the schema migration code would name types by the last version at which they were valid. For example if PersistentForkChoice changed in V20 then we would name the two variants PersistentForkChoiceV19 and PersistentForkChoiceV20. The problem with this scheme is that if PersistentForkChoice changes again in the future at say v24 then PersistentForkChoiceV20 would need to be renamed to PersistentForkChoiceV23, which is annoying. Using the first valid version as described above does not have this issue.

I agree that being forced to rename is very annoying. But there's actually no reason we have to rename PersistentForkChoiceV20 to PersistentForkChoiceV23. We could just as easily carry on with PersistentForkChoiceV19,PersistentForkChoiceV20, and PersistentForkChoiceV24. After all, these structs are still named after the versions in which they are used.

The only issue is communicating this easily in naming conventions guide.

Perhaps instead of:

Prefer to name versions of structs by the version at which the change was introduced.

we might write something along the lines of:

When introducing new versions of a struct, adhere to the following convention:

If the struct has no previous versions in the codebase, the original version should be named by the last version in which it was valid. All subsequent versions should be named after the version in which they were introduced. For example, if the struct Foo has no previous migrations, and you added a field to Foo in v9, then you would call the original version FooV8 and write a schema change that migrates from FooV8 to FooV9. If at some later date, another field is added to Foo in v12, you would add a schema change that migrates from FooV9 to FooV12.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're on the subject, I had another thought for the schema migrations guide. We should encourage the developer to add impl Into<FooV9> for FooV8 and impl Into<FooV8> for FooV9 just below the definition of the struct. Note that this encourages them to be explicit about which versions are being transformed.

Implementing this adds a significant amount of clarity about how versions are migrated and which versions are transformed into which. I previously got tripped up about this when I didn't implement these and introduced bugs in the migration code. You can leverage these impls in the schema migration code to make that simple as well. Also by being explicit about the versions you avoid having to clutter your diff with these lines when adding a new version. I already started to adopt this convention which is the only reason I made changes to beacon_fork_choice_store.rs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He told me we've already had different versions of the inner PersistedForkChoice. They just get removed eventually as we stop supporting super old DB versions.

Right that would make it tough to track introduced version. I like the suggestion to use the previous version if there are no versions defined in the super struct.

I'm on board with what you're suggesting re migration guide. And explicit intos as well

@realbigsean realbigsean added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Feb 7, 2024
@realbigsean realbigsean added the backwards-incompat Backwards-incompatible API change label Feb 18, 2024
@realbigsean
Copy link
Member

labeling this one as backwards incompat because it has a db migration

paulhauner added a commit that referenced this pull request Feb 18, 2024
Squashed commit of the following:

commit 36a5d49
Author: ethDreamer <[email protected]>
Date:   Fri Feb 9 11:27:33 2024 +0800

    Update beacon_node/client/src/builder.rs

    Co-authored-by: realbigsean <[email protected]>

commit 9b8b953
Author: Mark Mackey <[email protected]>
Date:   Mon Feb 5 12:15:55 2024 +0800

    fix up some loose ends

commit 2b82413
Author: Mark Mackey <[email protected]>
Date:   Tue Jan 30 15:00:55 2024 +0800

    Properly Persist AnchorState in ForkChoice

commit 8c071ce
Author: Mark Mackey <[email protected]>
Date:   Mon Jan 29 17:18:34 2024 +0800

    Fix Beacon Chain Tests

commit 433aac5
Author: Mark Mackey <[email protected]>
Date:   Thu Jan 25 16:42:33 2024 +0800

    Enforce Checkpoint State Inside DA Boundary

commit 76a386f
Author: Mark Mackey <[email protected]>
Date:   Thu Jan 25 15:03:27 2024 +0800

    Cleanup some TODOs

commit 117ee3c
Author: Mark Mackey <[email protected]>
Date:   Thu Jan 18 10:43:02 2024 +0100

    Add Peers with Lower Finalization on NonRevertible

commit 3ef3af3
Author: Mark Mackey <[email protected]>
Date:   Wed Jan 17 21:33:38 2024 +0100

    Added AnchorState to ForkChoice

commit aebc66d
Author: Mark Mackey <[email protected]>
Date:   Wed Jan 10 11:22:46 2024 +0100

    Allow Checkpoint Sync from Non-Finalized States
@paulhauner paulhauner mentioned this pull request Feb 18, 2024
paulhauner added a commit that referenced this pull request Feb 18, 2024
Squashed commit of the following:

commit 36a5d49
Author: ethDreamer <[email protected]>
Date:   Fri Feb 9 11:27:33 2024 +0800

    Update beacon_node/client/src/builder.rs

    Co-authored-by: realbigsean <[email protected]>

commit 9b8b953
Author: Mark Mackey <[email protected]>
Date:   Mon Feb 5 12:15:55 2024 +0800

    fix up some loose ends

commit 2b82413
Author: Mark Mackey <[email protected]>
Date:   Tue Jan 30 15:00:55 2024 +0800

    Properly Persist AnchorState in ForkChoice

commit 8c071ce
Author: Mark Mackey <[email protected]>
Date:   Mon Jan 29 17:18:34 2024 +0800

    Fix Beacon Chain Tests

commit 433aac5
Author: Mark Mackey <[email protected]>
Date:   Thu Jan 25 16:42:33 2024 +0800

    Enforce Checkpoint State Inside DA Boundary

commit 76a386f
Author: Mark Mackey <[email protected]>
Date:   Thu Jan 25 15:03:27 2024 +0800

    Cleanup some TODOs

commit 117ee3c
Author: Mark Mackey <[email protected]>
Date:   Thu Jan 18 10:43:02 2024 +0100

    Add Peers with Lower Finalization on NonRevertible

commit 3ef3af3
Author: Mark Mackey <[email protected]>
Date:   Wed Jan 17 21:33:38 2024 +0100

    Added AnchorState to ForkChoice

commit aebc66d
Author: Mark Mackey <[email protected]>
Date:   Wed Jan 10 11:22:46 2024 +0100

    Allow Checkpoint Sync from Non-Finalized States
@paulhauner paulhauner added v5.1.0 Q2 2024 and removed v5.0.0 Q1 2024 labels Feb 18, 2024
@pawanjay176 pawanjay176 removed the v5.1.0 Q2 2024 label Mar 7, 2024
@pawanjay176 pawanjay176 added the v5.2.0 Q2 2024 label Mar 7, 2024
@pawanjay176
Copy link
Member

Pushing this back to v5.2 after talking to @ethDreamer . Since this PR is almost ready, we can always make a patch release with this if really required.
Feel free to change it back if you feel this should go into 5.1.0

@michaelsproul
Copy link
Member

This needs some updates to address the merge conflict caused by downloading checkpoint blobs

@michaelsproul
Copy link
Member

Trying this on Goerli, I can't get any peers:

Mar 22 02:09:24.756 DEBG Handshake Failure, reason: Different finalized chain, peer: 16Uiu2HAmTyzbanJu6xJRi7efeUGMdDu1ae4qNm4aBfViNjfeZmLv, module: network::network_beacon_processor::rpc_methods:112

My node gets insta-banned after syncing off an unfinalized state. We had some fix for this I thought? Or does it require the peer to be running this PR as well?

@ethDreamer
Copy link
Member Author

Trying this on Goerli, I can't get any peers:

Mar 22 02:09:24.756 DEBG Handshake Failure, reason: Different finalized chain, peer: 16Uiu2HAmTyzbanJu6xJRi7efeUGMdDu1ae4qNm4aBfViNjfeZmLv, module: network::network_beacon_processor::rpc_methods:112

My node gets insta-banned after syncing off an unfinalized state. We had some fix for this I thought? Or does it require the peer to be running this PR as well?

Concerning.. I had only tested it locally with kurtosis. I guess I'll have to dig into this :/

@michaelsproul
Copy link
Member

Dropping this from v5.2.0

@michaelsproul
Copy link
Member

Closing in favour of a new from-scratch impl based on recent learnings:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards-incompat Backwards-incompatible API change deneb waiting-on-author The reviewer has suggested changes and awaits thier implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants