Skip to content

feat: merge unstable to peerDAS #7698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 46 commits into from
Closed

Conversation

twoeths
Copy link
Contributor

@twoeths twoeths commented Apr 14, 2025

Motivation

Screenshot 2025-04-14 at 11 06 06

Description

  • merge unstable to peerDAS to fix the issue

twoeths and others added 29 commits March 25, 2025 09:25
**Motivation**

- fix misused noble as a hasher, see
#7608 (comment)

**Description**

- use the latest `persistent-merkle-tree` and `as-sha256` everywhere
- this is a prerequisite for #7620
- also confirm the correct hasher in beacon-node

---------

Co-authored-by: Tuyen Nguyen <[email protected]>
**Motivation**

Document the PR workflow based on the last team
[discussion](#7554 (comment)).

**Description**

- Explain different natures of PRs
- Explain steps how to open and manage those PRs. 

**Steps to test or reproduce**

- Run all tests

---------

Co-authored-by: Matthew Keil <[email protected]>
Co-authored-by: Nico Flaig <[email protected]>
…nel (#7627)

**Motivation**


![image](https://github.com/user-attachments/assets/a8411db4-1d93-48dc-8841-9bdfd332758f)

**Description**

Add missing closing `}` bracket to promql query in "Heap Allocations -
Beacon Node and Validator" panel
…ion data calls (#7626)

**Motivation**

We already ignore `committee_index` in post-electra produce attestation
data calls and always set it to 0 in returned `AttestationData`, however
we can be less strict on the beacon node side and only reject api calls
that do not provide the committee index pre-electra.

**Description**

This changes the way we enforce that `committee_index` is provided by
caller, it moves the check from static schema validation to fork-based
validation which allows us to handle requests without the committee
index for post-electra.

The validator client will still pass the committee index in any case as
per spec this is a required property as of right now. However, there was
some discussion around this in
ethereum/beacon-APIs#511 and a long-term plan
could be to make the `committee_index` optional post-electra on the spec
as well but first beacon nodes need to be more relaxed on how they
enforce the parameter requirement.
Similar to #7566 , it introduces lazy load mechanism for
`processPendingConsolidations`.
**Motivation**

Electra `getPendingConsolidations` endpoint implemented:
ethereum/beacon-APIs#512

**Description**

- `GET /eth/v1/beacon/states/{state_id}/pending_consolidations added`

Closes #7575

---------

Co-authored-by: Attila Cseh <[email protected]>
Co-authored-by: Nico Flaig <[email protected]>
…ion slot post-deneb (#7636)

**Motivation**

[EIP-7045](https://eips.ethereum.org/EIPS/eip-7045) allows to include
attestations from current and previous epoch in block, while we already
account for increase of max attestation inclusion slot post-deneb when
packing the block (#6522), we
did not update our pruning and as of right now we will remove all
attestations older than `clockSlot - SLOTS_PER_EPOCH`.

This is not problematic during healthy network conditions but as
EIP-7045 shows this is important, especially during non-finality where
those attestations could become critical to include.
> As discussed in the Motivation, extending this max inclusion slot to
the end of the next epoch is critical for LMD-GHOST security proofs and
confirmation rule.


**Description**

Update pruning to account for increase of max attestation inclusion slot
post-deneb, ie. we will retain attestations for current and previous
epoch instead of previous behavior which removed attestations older than
32 slots.
**Motivation**

As part of our contribution to Ethereum's resilience, we deployed
mainnet bootnodes to support peer discovery in diverse geolocations and
providers. The bootnodes can be found in this PR:
https://github.com/eth-clients/mainnet/pull/2/files

**Description**

This PR:
- Includes our Lodestar bootnodes by default with bootEnrs for mainnet. 
- It adds the ENRs under so ensure parsing works with new bootnodes
- It also updates old references which existed before cleanup/migration
of the eth-clients repo for mainnet. This now points to the updated
references.
**Motivation**

- track on chain new seen attesters and total effective balance
increment of them per block
- see
#7646 (comment)

**Description**

- at every block processing, when process attestations we track it

---------

Co-authored-by: Tuyen Nguyen <[email protected]>
**Motivation**

- replacement for #7077 #7202

**Description**

- Upgrade libp2p to 2.0
- ~~Do NOT upgrade gossipsub to support IDONTWANT (suspected perf
degredation with it)~~
- Enable `IDONTWANT`

---------

Co-authored-by: Nico Flaig <[email protected]>
**Motivation**

- we have 2 bugs for electra logic:
  - one logic in phase0 cannot be applied to electra
  - wrong for loop place
- one bug for all forks: incorrect dependent root used when checking
attestation data, see #7651

**Description**

extract fixes from #7646
- add more metrics for the pool
- remove this logic for electra: "after 2 slots, there are a good chance
that we have 2 * MAX_ATTESTATIONS_ELECTRA attestations and break the for
loop early"
- correct the for loop place to limit attestation consolidations
- fix `isValidShuffling()` for #7651 
- unit tests are done in #7646 , since logic changes there I cannot
bring them here

---------

Co-authored-by: Tuyen Nguyen <[email protected]>
Co-authored-by: Nico Flaig <[email protected]>
**Motivation**

Test all available suits on the CI

**Description**

Seems that spec tests for the validator package was skipped in the CI
unintentionally. Bringing it back.


**Steps to test or reproduce**

Run all tests
…e misses (#7667)

**Motivation**

- we lack a metrics to track get aggregate cache misses and [need to use
logs](#7548 (comment))
to determine it which is not ideal
- aggregate participation metric buckets lack granularity and just show
`+Inf` which make the metric less useful since most / all aggregates
fall into this category
- allows to better investigate
#7548

**Description**

- adjust aggregate participation metric buckets to get more meaningful
data
- add metrics to track get aggregate cache misses, ie. total number of
`getAggregate` calls with no aggregate for slot, attestation data root,
and committee index
- slight refactoring to group attestation pool metrics similar to
#7656
- reorder function parameters and logs to better map to how data is
stored internally in attestation pool cache

We mostly have dashboard panels for those already but will add few more
in a separate PR.
…ur (#7668)

**Motivation**

I don't see a good reason why we need to retain validators we monitor
for longer than an hour. The registration is updated every epoch (~6.4
minutes) if the validator client is connected and downtime due to
maintenance should only be a few minutes and at most 2-3 epochs if
doppelganger protection is enabled (and not even then since we have
#6012) or keys are moved
around.

This long retention period was noticable during holesky rescue as we
offered our beacon node to the community but due to the fact that the
validator monitor keeps track of validators for 12 hours even if those
are not even connected anymore it made it harder to estimate how many
validators are actually connected and managed by the beacon node(s).

There was also a [complaint on
discord](https://discord.com/channels/593655374469660673/593655641445367808/1248614428329513072)
regarding this before.

**Description**

Reduce time to retain registered validators in monitor from previously
12 hours to only 1 hour.

We can re-evaluate this number in the future if it turns out to be
problematic but I don't see a scenario right now where it would be
useful to retain validators for longer which are not even connected
anymore.
…ationPool` (#7672)

Slight improvement on `onScrapeMetrics` introduced in #7656. 

We loop over every attestation group of `previousSlot` twice which can
be reduced to one.

Co-authored-by: Nico Flaig <[email protected]>
**Motivation**

This information might be useful to diagnose why we might not be packing
as many aggregates into a block as we would expect.

Similar reasoning to #7550.

**Description**

Adds two metrics to track insert outcome of aggregated attestations to
oppool
-
`lodestar_oppool_aggregated_attestation_pool_gossip_insert_outcome_total`
- `lodestar_oppool_aggregated_attestation_pool_api_insert_outcome_total`

Dashboard panels come later :)

---------

Co-authored-by: twoeths <[email protected]>
**Motivation**

The VS Code settings file has linting and formatting enabled on save,
but not organizing imports. I think enabling this would help fix some
Biome errors before they're caught in CI.

**Description**

Adds `"source.organizeImports.biome": "explicit"` to
`.vscode/settings.json`
**Motivation**

- closes #7677

**Description**

Schedule electra on gnosis

- as per gnosischain/specs#74
…7676)

The current buckets `[32, 64, 96, 128]` are no longer meaningful
post-electra since we can only include max. 8 attestations per block and
as I don't think we need this metric from now until electra goes live on
mainnet we should adjust it already to give us more meaningful data for
electra networks, like hoodi or holesky.
**Motivation**

Launching Hoodi testnet which replaces Holesky.

**Description**

Replace Holesky testnet with Hoodi default testnet in the docs and in
code where `holesky` flag is used as a default testnet.

#7586

**Steps to test or reproduce**
- Check documentation
- Run a Hoodi testnet using the new flag

---------

Co-authored-by: Nico Flaig <[email protected]>
…7688)

Just read this part of code again and the long metric variable name
makes is a bit to verbose imo, didn't wanna push changes to
#7673 though
**Motivation**

- packed attestations are currently sorted by new seen attesters, should
be by effective balance
- not enough metrics when producing packed attestations

**Description**

- for electra, retain up to 8 attestations per group. When getting
attestations, we also get up to 8 of them (instead of 3)
- track packed attestations by new seen effective balance
- reevaluate every attestation after an attestation is included, see
`getMostValuableAttestation()`. This means there is no useless
attestations included
- add a lot of metrics:
  - committee bits for each packed attestation
  - total committee members of each packed attestation
  - non-participation of each packed attestation
- distance from block slot to attestation slot for each packed
attestation
  - total effective balance for each packed attestations
  - scanned slots and termination reason
  - scanned attestations per scanned slot
  - returned attestations per scanned slot
  - total slots in pool at the time of producing block
  - total consolidations

Closes #7544

**Testing**

- see some metrics here
#7646 (comment)
- performance are the same on hoodi prod node

<img width="846" alt="Screenshot 2025-04-08 at 16 09 17"
src="https://github.com/user-attachments/assets/7ffb3bcb-7774-4383-b2e7-da1065ff8f97"
/>

---------

Co-authored-by: Tuyen Nguyen <[email protected]>
Co-authored-by: Nico Flaig <[email protected]>
**Motivation**

It is not perfectly clear how slashing protection works in lodestar.
Therefore a short section should be added to explain some details.

**Description**

This PR adds a short section to the page `Starting a Validator Client`
which describes slashing protection behavior and how to migrate to and
from a different client.

---------

Co-authored-by: towo <[email protected]>
Co-authored-by: Nico Flaig <[email protected]>
@twoeths
Copy link
Contributor Author

twoeths commented Apr 14, 2025

there is a performance issue of `peerIdFromString ChainSafe/js-libp2p-gossipsub#519
not sure if it's the root cause of this issue but got this error a lot

debug: peer score adjusted scoreChange=-10, newScore=-100, peerId=16...wMzkk8, actionName=REQUEST_ERROR_DIAL_ERROR

**Motivation**

The ValidatorMonitor should be moved out of metrics into the BeaconChain
object.

**Description**
- Move ValidatorMonitor from the Metrics to the BeaconChain Object.
- Delete `registerValidatorStatuses()` from the
BeaconStateTransitionMetrics type.

Closes #7624

**Steps to test or reproduce**

---------

Co-authored-by: Nico Flaig <[email protected]>
twoeths added a commit that referenced this pull request Apr 16, 2025
**Motivation**

- we're using noble on peerDAS, need to merge unstable to fix this but
got libp2p dial issue there (see #7698) so cherry-pick #7621 to unblock
peerDAS

**Description**

- use the latest `persistent-merkle-tree` and `as-sha256` everywhere

Co-authored-by: Tuyen Nguyen <[email protected]>
@twoeths twoeths force-pushed the te/peerDAS_after_merge_apr_14 branch from c8a2b84 to d0d2f9d Compare April 17, 2025 02:13
wemeetagain and others added 5 commits April 17, 2025 03:54
**Motivation**

See
https://discord.com/channels/593655374469660673/1353810292202799154/1362013907668963439

**Description**

- set peerstore expiry options to Infinity, practically disabling the
feature

Closes #7702

---------

Co-authored-by: Tuyen Nguyen <[email protected]>
…nSlotRange (#7711)

In post-deneb, we don't use `earliestPermissibleSlot` anymore when
checking attestation's slot but checking the slot's epoch.

Because in post-deneb, we moved from
```
[IGNORE] aggregate.data.slot is within the last ATTESTATION_PROPAGATION_SLOT_RANGE slots 
(with a MAXIMUM_GOSSIP_CLOCK_DISPARITY allowance) 
-- i.e. aggregate.data.slot + ATTESTATION_PROPAGATION_SLOT_RANGE >= current_slot >= aggregate.data.slot
```
to 
```
[IGNORE] the epoch of aggregate.data.slot is either the current or previous epoch 
(with a MAXIMUM_GOSSIP_CLOCK_DISPARITY allowance) 
-- i.e. compute_epoch_at_slot(aggregate.data.slot) in (get_previous_epoch(state), get_current_epoch(state))
```


In our current code, we mix pre-deneb calculation ie.
`earliestPermissibleSlot` into post-deneb calculation:
```const earliestPermissiblePreviousEpoch = computeEpochAtSlot(earliestPermissibleSlot);```
We should separate the two calculations to make it more readable
**Motivation**

- to know the reason why we cannot dial a peer. In `peerdas-devnet-6` it
happens a lot cc @matthewkeil

**Description**

- model all possible libp2p errors
- do not dial a peer if it has no multiaddr, this follows up #7708
- new metric `notDialReason` in Discovery which I bring from `peerDAS`
branch

**Testing**
This is how it showed for `peerdas-devnet-6`

<img width="1540" alt="Screenshot 2025-04-17 at 13 16 09"
src="https://github.com/user-attachments/assets/5fca773c-3ba6-42e3-a41e-be963d25d774"
/>

Co-authored-by: Tuyen Nguyen <[email protected]>
@matthewkeil
Copy link
Member

@twoeths I merged back in peerDAS and unstable changes to your update. Before I merge to our devnet branch I wanted to give you a chance to make sure things look ok. I did notice that there are some check-types errors in the perf and e2e tests that maybe @dguenther can help us knock out. I think some are already on his second lint branch but wanted to give him a chance to see them before I did double work to resolve if they are already fixed

matthewkeil and others added 2 commits April 18, 2025 02:56
Copy link

codecov bot commented Apr 18, 2025

Codecov Report

Attention: Patch coverage is 20.64935% with 611 lines in your changes missing coverage. Please review.

Please upload report for BASE (peerDAS@97104b7). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             peerDAS    #7698   +/-   ##
==========================================
  Coverage           ?   42.43%           
==========================================
  Files              ?      733           
  Lines              ?    53055           
  Branches           ?     2259           
==========================================
  Hits               ?    22515           
  Misses             ?    30498           
  Partials           ?       42           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@twoeths
Copy link
Contributor Author

twoeths commented Apr 21, 2025

@matthewkeil peerdas-devnet-6 is in unfinality time, nodes stay out of synced so I don't feel convenient merging this PR
need to debug the sync issue in peerDAS branch first

@matthewkeil
Copy link
Member

@matthewkeil peerdas-devnet-6 is in unfinality time, nodes stay out of synced so I don't feel convenient merging this PR need to debug the sync issue in peerDAS branch first

I agree. As a not, the whole devnet-6 is down for all clients. None can sync. From what I gather several clients have bugs in the validator custody implementation and I think there was also a bad block or two. Im guessing it will be the primary topic on the peerDAS call tomorrow

@twoeths twoeths marked this pull request as ready for review May 13, 2025 03:44
@twoeths twoeths requested a review from a team as a code owner May 13, 2025 03:44
@twoeths
Copy link
Contributor Author

twoeths commented May 13, 2025

works fine in peerdas-devnet-7 after this fix 658d7cf

Screenshot 2025-05-13 at 10 44 57

Copy link
Member

@matthewkeil matthewkeil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!! 🚀

@twoeths
Copy link
Contributor Author

twoeths commented May 14, 2025

closing in favor of #7827

@twoeths twoeths closed this May 14, 2025
@twoeths twoeths deleted the te/peerDAS_after_merge_apr_14 branch May 14, 2025 10:08
matthewkeil added a commit that referenced this pull request May 20, 2025
**Motivation**

merge `unstable` to `peerDAS`

**Description**
replace #7698

---------

Co-authored-by: Tuyen Nguyen <[email protected]>
Co-authored-by: Nazar Hussain <[email protected]>
Co-authored-by: Matthew Keil <[email protected]>
Co-authored-by: Nico Flaig <[email protected]>
Co-authored-by: NC <[email protected]>
Co-authored-by: csehatt741 <[email protected]>
Co-authored-by: Attila Cseh <[email protected]>
Co-authored-by: Phil Ngo <[email protected]>
Co-authored-by: Cayman <[email protected]>
Co-authored-by: Derek Guenther <[email protected]>
Co-authored-by: Ekaterina Riazantseva <[email protected]>
Co-authored-by: Tobias Wohland <[email protected]>
Co-authored-by: towo <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sylvia <[email protected]>
Co-authored-by: Matthew Keil <[email protected]>
Co-authored-by: Vickish <[email protected]>
Co-authored-by: VictoriaAde <adedayovicky123@gmail>
Co-authored-by: Kolby Moroz Liebl <[email protected]>
Co-authored-by: Mercy Boma Naps Nkari <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.