Skip to content

Gossip: Add dynamic stake weighting based on fraction of unstaked nodes #7070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

gregcusack
Copy link

@gregcusack gregcusack commented Jul 22, 2025

Problem

This PR addresses this Issue: #6903
Unstaked and low staked nodes in gossip do not receive many messages on testnet due to the testnet stake distribution and the aggressive quadratic weighting of peers by stake. As a result, unstaked nodes have a poor and inconsistent view of the peers in the network.

Linear weighting is not aggressive enough and causes a disproportionate number of messages to be sent to unstaked nodes on mainnet.

We need weighting that falls in between linear and quadratic.

Summary of Changes

Add a dynamic weighting element alpha. alpha fluctuates between 1 and 2 at runtime and is linearly proportional to the number of unstaked nodes in the cluster.
For example, on mainnet, there are 82% unstaked nodes, so alpha will converge on 1.82. On testnet, there are 4% unstaked nodes, so alpha will converge on 1.04.

We use a first-order, low-pass filter to update alpha to avoid constant alpha thrashing and overshooting the new target. Unit tests show that alpha converges on the target alpha within approximately 4-5 push active set rotations (30-38s).

Note: floating point operations are expensive but rotate() is called once every 7.5s, so should not add any significant overhead.

Testing has shown that flattening out the weight curve when their aren't many unstaked nodes (as on testnet), increases an unstaked nodes ingress gossip traffic as desired. On the other hand, high staked nodes don't see a significant reduction in received traffic.

All calculations and data that backup and support this PR here:

  1. Statistical Gossip Propagation Analysis
  2. Low Pass Filter Discussion on Discord
  3. Testing data on mininet and using statistical analysis on Discord

@gregcusack gregcusack force-pushed the gossip-adaptive-stake-weighting branch from 3a3b813 to d67d4d8 Compare July 22, 2025 04:37
@gregcusack gregcusack marked this pull request as ready for review July 22, 2025 04:43
@gregcusack gregcusack force-pushed the gossip-adaptive-stake-weighting branch from d67d4d8 to 509b595 Compare July 22, 2025 04:44
// (small d(alpha_target)/dt). If future code changes make `alpha_target` jump
// larger, we must retune `TC`/`K` or use a higher‑order filter to avoid
// lag/overshoot.
let alpha = K * alpha_target + (1.0 - K) * self.alpha;
Copy link

@alexpyattaev alexpyattaev Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we extract the IIR filter into its own module and stick it in net-utils or something like it? Would make it far easier to test and also to guard against possible invalid inputs and such.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya i think that's a good idea. locally, i have it pulled out into a separate file within the gossip module. not sure it fits in net-utils since its not really a network-related method. maybe we leave in gossip module for now. but if anyone wants to reuse the filter in the future, we can pull it into it's own standalone module?

// 30_000ms convergence time
const TC: f64 = 30_000.0;
// Low pass filter smoothing constant
// K ~ 0.611

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does rust-analyzer not compute consts? Also would be nice to get an idea of how long this filter takes to reach 0.5 in response to a step input from 0 to 1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh it does on my end, i can remove the comment: // K ~ 0.611

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call. will add a comment in the code. But this is what it is:

/// - From a step change in target from 0 to 1, `alpha` reaches:
///   - ~61% of the way to target after 1 update
///   - ~85% after 2
///   - ~94% after 3
///   - ~98% after 4
///   - ~99% after 5
Each update occurs every 7.5s. So we reach 94% of the way to the target in 22.5 seconds

if (1.0..=2.0).contains(&alpha) {
self.alpha = alpha;
} else {
// We should never see this, but if we do, log and reset to safe default.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be better to have an explicit is_finite check followed by a clamp to the target range? Then we would not need to have an error here.

@alexpyattaev
Copy link

Overall the PR looks good, thank you!

@gregcusack
Copy link
Author

@alexpyattaev Does it make sense to keep floating point math or switch over to fixed point? I have a fixed point implementation. Sounds like trent would prefer fixed point to avoid possible bugs even though we can try and make them as unlikely as possible.

@gregcusack
Copy link
Author

i suggest we switch over to this pr with fixed point math: #7098

@alexpyattaev
Copy link

@alexpyattaev Does it make sense to keep floating point math or switch over to fixed point? I have a fixed point implementation. Sounds like trent would prefer fixed point to avoid possible bugs even though we can try and make them as unlikely as possible.

In this case it is not critically important, but floats require some extra caution around possible NANs. So if we can avoid them let us do so.

@gregcusack
Copy link
Author

moving to: #7098

@gregcusack gregcusack closed this Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants