Gossip: Add dynamic stake weighting based on fraction of unstaked nodes - Fixed Point Math #7098

gregcusack · 2025-07-23T04:12:19Z

Problem

This PR addresses this Issue: #6903
This achieves the same goal as PR: #7070 but does it with fixed point instead of floating point math.

Unstaked and low staked nodes in gossip do not receive many messages on testnet due to the testnet stake distribution and the aggressive quadratic weighting of peers by stake. As a result, unstaked nodes have a poor and inconsistent view of the peers in the network.

Linear weighting is not aggressive enough and causes a disproportionate number of messages to be sent to unstaked nodes on mainnet.

We need weighting that falls in between linear and quadratic.

Summary of Changes

Add a dynamic weighting element alpha. alpha fluctuates between 1 and 2 at runtime and is linearly proportional to the number of unstaked nodes in the cluster.
For example, on mainnet, there are 82% unstaked nodes, so alpha will converge on 1.82. On testnet, there are 4% unstaked nodes, so alpha will converge on 1.04.

We use a first-order, low-pass filter to update alpha to avoid constant alpha thrashing and overshooting the new target. Unit tests show that alpha converges on the target alpha within approximately 4-5 push active set rotations (30-38s).

Note: floating point operations are expensive but rotate() is called once every 7.5s, so should not add any significant overhead.

Testing has shown that flattening out the weight curve when their aren't many unstaked nodes (as on testnet), increases an unstaked nodes ingress gossip traffic as desired. On the other hand, high staked nodes don't see a significant reduction in received traffic.

All calculations and data that backup and support this PR here:

codecov-commenter · 2025-07-23T05:18:04Z

Codecov Report

❌ Patch coverage is 95.41885% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.3%. Comparing base (f6e6ff2) to head (97ecb64).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #7098    +/-   ##
========================================
  Coverage    83.2%    83.3%            
========================================
  Files         800      802     +2     
  Lines      362783   363517   +734     
========================================
+ Hits       302195   302884   +689     
- Misses      60588    60633    +45

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gregcusack · 2025-07-23T15:41:00Z

per the discussion here: https://discord.com/channels/428295358100013066/478692221441409024/1397296872984678506, going to implement account control over TC, FS, Static/Dynamic weighting

alexpyattaev · 2025-07-24T20:58:45Z

gossip/src/stake_weighting_config.rs

+
+#[derive(Serialize, Deserialize, Debug, Default, Clone)]
+#[repr(C)]
+pub struct WeightingConfig {


I'm not sure but would you not get some additional header written here by record program? Or are you using a different program to write onchain state?

It may also be nice to comment why we are deriving Serialize here as we are only deserializing this one.

oh right, we do not need deserialize here. Also, ya i wrote my own program to update onchain state since I figure it would be good to learn how to do it. Not sure either tbh, don't have any issues deserializing or updating push active set values.

alexpyattaev · 2025-07-24T21:01:30Z

gossip/src/push_active_set.rs

+    // alpha = 2.0 -> Quadratic
+    Static,
+    // alpha in [1.0, 2.0], smoothed over time, scaled up by 1000 to avoid floating-point math
+    Dynamic,


Would it not be better to wrap alpha into the Dynamic enum variant to avoid carrying it around when mode is set to Static?

yesss much better idea.

alexpyattaev · 2025-07-24T21:03:37Z

gossip/src/low_pass_filter.rs

Would this be better placed in net-utils? We may want to reuse this for other protocols later, and it is clearly not gossip-specific.

ya definitely not gossip specific. was trying to figure out where to put it. net-utils didn't feel like the best place either since it's not net specific. other idea is leave it in gossip for now and if someone wants to use it in the future, we can find a better place for it?

Yeah but by then it will be part of public API =) net-utils is also small enough and pulls almost no other deps, unlike gossip which is rather huge already. It is also somewhat unlikely this kind of filter will be used outside of traffic shaping applications.

crates are free. give it its own. hide everything behind an agave-unstable-api cargo feature

ya we can do this. just makes the push active set setup a little harder to read. would it be best to have this crate behind a feature and then define some dummy low pass filter api that is compiled when the feature is not enabled? otherwise we end up having a ton of gated definitions throughout push_active_set.rs

I believe you only need to activate the feature in Cargo.toml, the code in push_active_set.rs will be effectively the same, just pulling lpf from another crate.

ok so when we activate the feature, we pull lpf from the actual crate. but then when it's not activated, we just use some dummy lpf which never gets used since we'd be running WeightingMode::Static? or am i misunderstanding?

we always activate the feature in agave, kinda like DCOU. but external users of our repo do not get to use that code without activating the "i will not complain about broken public API" feature. Think of it as DCOU but for prod code.

but then when it's not activated

this doesn't exist. if the feature isn't specified, the crate exports no symbols. we're forcing consumers into a contract where they acknowledge consuming an unstable api

gossip/src/cluster_info.rs

alexpyattaev · 2025-07-24T22:05:54Z

gossip/src/low_pass_filter.rs

+// Fixed point scale for K and `alpha` calculation
+pub const SCALE: NonZeroU64 = NonZeroU64::new(1000).unwrap();
+// 2 * pi * SCALE
+const TWO_PI_SCALED: u64 = 6_283;


Suggested change

const TWO_PI_SCALED: u64 = 6_283;

const TWO_PI_SCALED: u64 = (std::f64::consts::PI * 2.0 * 1000.0) as u64;

this suggestion leaves rounding to the platform. prefer a literal

This does not round on the platform. The rounding is done at compile time and is 100% reproducible, else it would not build. https://godbolt.org/z/vonv18Kos both asm outputs return literal 3141592 .

this is true if we ship a binary, right? but if we ship the source code and people build from source, the platform still has to build this and round at compile time.

the rounding is 100% deterministic.

the rounding is 100% deterministic.

thanks for completely ignoring "to the platform"

alexpyattaev · 2025-07-24T22:06:27Z

gossip/src/low_pass_filter.rs

+use std::num::NonZeroU64;
+
+// Fixed point scale for K and `alpha` calculation
+pub const SCALE: NonZeroU64 = NonZeroU64::new(1000).unwrap();


What is the benefit of having this as NonZero type if this is a constant anyway?

It may further be preferable to have this as a power of two for performance reasons.

i think cross the PoT claim when slowlana tells us to. in the meantime, base-10 is easier to reason about

alexpyattaev · 2025-07-24T22:09:43Z

gossip/src/low_pass_filter.rs

+/// If future code changes make `alpha_target` jump larger, we must retune
+/// `TC`/`K` or use a higher‑order filter to avoid lag/overshoot.
+/// Returns `alpha_new = K * target + (1 - K) * prev`, rounded and clamped.
+pub fn filter_alpha(prev: u64, target: u64, k: u64, min: u64, max: u64) -> u64 {


Would it not be better to supply a range such as 1..2 rather than min and max parameters? A function taking 4 u64 parameters is asking to get misused.

ya good call. can do something like:

pub fn filter_alpha(alpha_prev: u64, alpha_target: u64, config: FilterConfig) -> u64 {...}

where:

pub struct FilterConfig { output_range: Range<u64>, k: u64, }

alexpyattaev · 2025-07-24T22:10:31Z

gossip/src/low_pass_filter.rs

@@ -0,0 +1,62 @@
+//! Fixed-point IIR filter for smoothing `alpha` updates.


Would it make sense to note that this is an order 1 Butterworth filter in case someone greps for it?

t-nelson

first pass looking good. didn't review any of the tests yet

t-nelson · 2025-07-25T00:48:44Z

gossip/src/low_pass_filter.rs

+use std::num::NonZeroU64;
+
+// Fixed point scale for K and `alpha` calculation
+pub const SCALE: NonZeroU64 = NonZeroU64::new(1000).unwrap();


are we sure millies is sufficient precision? i usually try to take as many orders as the expected math will allow. we can always truncate in place if necessary

t-nelson · 2025-07-25T00:49:36Z

gossip/src/low_pass_filter.rs

+pub fn compute_k(fs_ms: u64, tc_ms: u64) -> u64 {
+    if tc_ms == 0 {
+        return 0;
+    } // disabled / passthrough


i'm not sure this consumer detail belongs down here

t-nelson · 2025-07-25T00:50:13Z

gossip/src/low_pass_filter.rs

+        return 0;
+    } // disabled / passthrough
+    let scale = SCALE.get();
+    let wc_scaled = (TWO_PI_SCALED * fs_ms) / tc_ms;


should we fix all of the math in this module to saturating variants?

t-nelson · 2025-07-25T00:54:31Z

gossip/src/low_pass_filter.rs

+    debug_assert!(t <= scale, "interpolation t={} > SCALE={}", t, scale);
+    let base_squared = base.saturating_mul(base);
+    ((base * (scale - t) + base_squared * t) + scale / 2) / scale
+}


this module could use some tests. especially at the boundaries of mathematical operations

t-nelson · 2025-07-25T00:57:44Z

gossip/src/stake_weighting_config.rs

+}
+
+mod weighting_config_control_pubkey {
+    solana_pubkey::declare_id!("maCwV5aXW6srR4kppBWTeNdizYhyEwVcVh6mSWnFZxo");


consider grinding a vanity key for this

Would we need multisig for this same as for all2all?

this is just the address to load the record account from, no? multisig is not applicable here

Yes, the validator part of things does not care how the account is handled on chain.

t-nelson · 2025-07-25T01:04:13Z

gossip/src/push_active_set.rs

+    tc_ms: u64,
+}
+
+impl Default for PushActiveSet {


Default is usually bug. prefer literal instantiation. add a fn default_for_tests() under dcou if that's where the "convenience" of Default is desired

t-nelson · 2025-07-25T01:05:10Z

gossip/src/push_active_set.rs

+}
+
+impl PushActiveSet {
+    #[cfg(test)]


this will only be visible within the crate

t-nelson · 2025-07-25T01:06:08Z

gossip/src/push_active_set.rs

+    }
+
+    fn apply_cfg(&mut self, cfg: WeightingConfig) {
+        match cfg.weighting_mode {


deserializer can handle this for us, no?

t-nelson · 2025-07-25T01:07:16Z

gossip/src/push_active_set.rs

+        }
+
+        // Only recompute K if tc_ms changed
+        let new_tc_ms = if cfg.tc_ms != 0 {


if we're going to use a sentinel value, prefer an enum with self-documenting variant names

t-nelson · 2025-07-25T01:10:57Z

gossip/src/push_active_set.rs

-                .collect();
-            entry.rotate(rng, size, num_bloom_filter_items, nodes, &weights);
+
+        match self.mode {


can this block be refactored to avoid replicode? seems like the map closure is the only difference

alexpyattaev

Looks good generally, left a couple of nits, mostly in low-pass-filter crate.

gossip/src/crds_gossip_push.rs

low-pass-filter/Cargo.toml

alexpyattaev · 2025-08-06T17:28:09Z

low-pass-filter/src/lib.rs

+    // Fixed point scale for K and `alpha` calculation
+    pub const SCALE: NonZeroU64 = NonZeroU64::new(1_000_000).unwrap();
+    // 2 * pi * SCALE
+    const TWO_PI_SCALED: u64 = 6_283_185;


we have const eval and it works!

Suggested change

const TWO_PI_SCALED: u64 = 6_283_185;

const TWO_PI_SCALED: u64 = (std::f64::consts::PI * SCALE as f64) as u64;

trent thought we should define the literal. see: #7098 (comment)

Apologies, missed it. Consts are evaluated at compile time and are 100% consistent across builds on any architecture. You can trivially test it on a mac vs devbox.

low-pass-filter/src/lib.rs

alexpyattaev · 2025-08-06T17:30:39Z

low-pass-filter/src/lib.rs

+    /// Note: This function is most accurate when `base` is small e.g. < ~25.
+    #[inline]
+    #[allow(clippy::arithmetic_side_effects)]
+    pub fn interpolate(base: u64, t: u64) -> u64 {


in comment parameter is called alpha and in signature it is t. Also a bit unclear why this function is called interpolate and which function is it actually approximating

good catch. will update parameters to be uniform.

also, this probably should be in push_active_set. it's not related to the filter at all. we need interpolation here in order to avoid floating point math:

In Agave, we currently weight nodes by: (bucket + 1)^2. But the square is too severe and underweights unstaked nodes, especially when there aren't many unstaked in the network (aka testnet). So we are going to replace the square with alpha where alpha is in the range [1,2] and is based on the fraction of unstaked nodes in the network. so if there are 60% unstaked nodes, then alpha will be 1.6.

However, we can't just do (bucket + 1)^alpha since this would require floating point math. So, we scale alpha by 1_000_000. so now alpha is in [1000000, 2000000]. and we still can't do (bucket + 1)^alpha since we will overflow. We need to approximate (bucket + 1)^alpha by linearly interpolating between (bucket + 1)^1 and (bucket + 1)^2.

also, maybe i can call it linearly_interpolate() or something.

Ok now I fully understand what it does, thank you for explaining!

You could directly sidestep the overflow concern by using u128 (yes it will be slower but this does not have to be ridiculously fast).

I think the core cause of confusion is that this function's signature does not match what one would expect from a "normal" interpolator (since it is gossip-specific). Normally one would expect it to be defined like

fn linear_interpolate (x1:u64 /* start */, x2:u64 /* stop */, t:u64 /* parameter */ ); // t < SCALE (i.e. between 0 and 1)

So extracting the interpolator from the gossip-specific callsite where we call

interpolate ((bucket + 1), (bucket+1)^2, (alpha *SCALE) as u64 - SCALE)

would go a long way towards making it obvious. Then you could keep the interpolator in this crate (maybe call crate fixed-point-math or something generic like that) and call it from gossip with gossip-specific parameters as in example above.

Alternatively, this can be moved to gossip crate as in that context it would be more clear, just call it gossip_bucket_interpolate or something like it, so people grepping for a linear interpolator would not get confused & you need not make any changes to the implementation/tests. Probably better way to ship this in 3.0.

ya think the last option is best for now. will rename to make more clear

gossip/src/stake_weighting_config.rs

alexpyattaev

LGTM, left a few nits but nothing blocking at this point.

alexpyattaev · 2025-08-11T09:31:30Z

gossip/src/stake_weighting_config.rs

@@ -0,0 +1,76 @@
+use {serde::Deserialize, solana_account::ReadableAccount, solana_runtime::bank::Bank};


Maybe hide this whole module behind agave-unstable-api? Then we can safely axe it once tests are done without breaking semver. Does not have to be this PR.

what's the best way to do this? with the low-pass-filter crate i could gate it behind agave-unstable-api and then pull it into gossip crate in cargo.toml and activate the agave-unstable-api feature. but here, this file is already in the gossip crate...

Well actually you could do better and just not mark any items here "pub". They are not used outside of gossip anyway so no need for them to be pub, right?

alexpyattaev · 2025-08-11T09:34:08Z

gossip/src/stake_weighting_config.rs

+}
+
+mod weighting_config_control_pubkey {
+    solana_pubkey::declare_id!("goSwVUizoqNYKEaaiTjkgdN2RgLpvsTvFt1MEVGibY9");


Please allocate this account on testnet before merging the changes and write all zeros in there so we can be sure this does not activate accidentally as we are setting up the multisigs.

ok so i can deploy this account without setting up multisig first?

Yes, you can set authority later.

why tf does this need a control account? these are not toys to be used at the slightest whim. where is the cluster gate? there is no way in hell we can deploy this logic to mb. please think before you merge

you're right. I forgot to add the cluster gate.

the thing we're changing with the control account is the filter parameter tc_ms and then whether we want the dynamic weighting or the fixed weighting. Math says dynamic weighting should work as expected + filter tests show filter converging pretty quickly (~4 iterations) with the current 30000 tc_ms. Downside if we don't have the control is that if something goes wrong testnet may go down (i guess that's the point of testnet) or we have to make a new PR with a different tc_ms (not expected).

Are you suggesting we just axe the control account and just roll out dynamic weighting and just see if it matches expectations on testnet?

alexpyattaev · 2025-08-11T09:45:44Z

gossip/src/push_active_set.rs

@@ -212,21 +409,20 @@ mod tests {
    }

    #[test]
-    fn test_push_active_set() {
+    fn test_push_active_set_static_weighting() {
        const CLUSTER_SIZE: usize = 117;


Unrelated to this PR, but should we not make this ~4000 for this test to be more representative?

ya maybe in a future PR. it doesn't really matter. we're just testing rotating nodes into our active set and pruning. cluster size doesn't have an impact here beyond like 20

ok then, nevermind=)

alexpyattaev · 2025-08-11T09:47:14Z

gossip/src/push_active_set.rs

+    #[test]
+    fn test_alpha_converges_to_expected_target() {
+        const CLUSTER_SIZE: usize = 415;
+        const TOLERANCE_MILLI: u64 = 10_000; // ±1% of alpha


As I understand, scale is hardcoded here. This will explode if we change the scale, right? Maybe worth updating this one? Doe not have to be this PR.

alexpyattaev · 2025-08-11T09:50:49Z

low-pass-filter/Cargo.toml

+path = "src/lib.rs"
+
+[features]
+default = []


FYI by default all features are disabled https://doc.rust-lang.org/cargo/reference/features.html#the-default-feature. Does not matter here of course.

…t math

…factor, etc

alexpyattaev

Thank you!

gregcusack force-pushed the gossip-adaptive-stake-weighting-fixed-point branch from c0d5a77 to cf6ebe0 Compare July 23, 2025 04:17

gregcusack mentioned this pull request Jul 23, 2025

Gossip: Add dynamic stake weighting based on fraction of unstaked nodes #7070

Closed

gregcusack marked this pull request as ready for review July 23, 2025 15:05

gregcusack marked this pull request as draft July 23, 2025 15:41

gregcusack force-pushed the gossip-adaptive-stake-weighting-fixed-point branch 7 times, most recently from 57ef75a to 1c58bc0 Compare July 24, 2025 19:20

gregcusack marked this pull request as ready for review July 24, 2025 19:21

gregcusack force-pushed the gossip-adaptive-stake-weighting-fixed-point branch from 1c58bc0 to 23eaef6 Compare July 24, 2025 19:32

gregcusack requested review from alexpyattaev and t-nelson July 24, 2025 20:26

alexpyattaev reviewed Jul 24, 2025

View reviewed changes

gossip/src/cluster_info.rs Show resolved Hide resolved

alexpyattaev reviewed Jul 24, 2025

View reviewed changes

t-nelson reviewed Jul 25, 2025

View reviewed changes

gregcusack force-pushed the gossip-adaptive-stake-weighting-fixed-point branch 2 times, most recently from 17baf80 to d93b7da Compare August 4, 2025 20:09

alexpyattaev self-requested a review August 4, 2025 20:31

gregcusack force-pushed the gossip-adaptive-stake-weighting-fixed-point branch from d93b7da to 320c570 Compare August 5, 2025 05:33

alexpyattaev reviewed Aug 6, 2025

View reviewed changes

gregcusack requested a review from a team as a code owner August 7, 2025 21:46

gregcusack force-pushed the gossip-adaptive-stake-weighting-fixed-point branch 3 times, most recently from 756e0fc to c56f65a Compare August 7, 2025 22:06

gregcusack removed the request for review from a team August 7, 2025 22:07

gregcusack force-pushed the gossip-adaptive-stake-weighting-fixed-point branch from c56f65a to acc661b Compare August 7, 2025 22:20

alexpyattaev reviewed Aug 8, 2025

View reviewed changes

gossip/src/stake_weighting_config.rs Outdated Show resolved Hide resolved

gregcusack force-pushed the gossip-adaptive-stake-weighting-fixed-point branch 3 times, most recently from 41a83ee to 9686c5d Compare August 8, 2025 21:35

alexpyattaev reviewed Aug 11, 2025

View reviewed changes

gregcusack added 10 commits August 11, 2025 17:40

add dynamic weighting based on fraction of unstaked nodes. fixed poin…

8e51ba6

…t math

Add setup for account control

5de4dfd

address comments - add explicit types, bundle config into Dynamic, re…

34c0167

…factor, etc

update to vanity key: gosW... (gossip weight)

fd4d222

agave-unstable-api implementation

7fdf9cf

refactor add apply_cfg tests

ba67237

refactor and address comments

f2ea0b4

update interpolate. change lpf

10ed24f

fix deser bug. dont forget to rm logs

811613b

update WeightingConfig to match record program

60dc858

gregcusack force-pushed the gossip-adaptive-stake-weighting-fixed-point branch 2 times, most recently from 8490937 to 3ca64a7 Compare August 11, 2025 19:04

address comments and make default setting static for PushActiveSet

97ecb64

gregcusack force-pushed the gossip-adaptive-stake-weighting-fixed-point branch from 3ca64a7 to 97ecb64 Compare August 11, 2025 20:12

gregcusack requested a review from alexpyattaev August 11, 2025 23:17

alexpyattaev approved these changes Aug 12, 2025

View reviewed changes

gregcusack merged commit 299fd87 into anza-xyz:master Aug 12, 2025
51 checks passed

gregcusack deleted the gossip-adaptive-stake-weighting-fixed-point branch August 12, 2025 18:17

	const TWO_PI_SCALED: u64 = 6_283;
	const TWO_PI_SCALED: u64 = (std::f64::consts::PI * 2.0 * 1000.0) as u64;

		@@ -0,0 +1,62 @@
		//! Fixed-point IIR filter for smoothing `alpha` updates.

	const TWO_PI_SCALED: u64 = 6_283_185;
	const TWO_PI_SCALED: u64 = (std::f64::consts::PI * SCALE as f64) as u64;

		@@ -0,0 +1,76 @@
		use {serde::Deserialize, solana_account::ReadableAccount, solana_runtime::bank::Bank};

Gossip: Add dynamic stake weighting based on fraction of unstaked nodes - Fixed Point Math #7098

Gossip: Add dynamic stake weighting based on fraction of unstaked nodes - Fixed Point Math #7098

Conversation

gregcusack commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Uh oh!

codecov-commenter commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gregcusack commented Jul 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

t-nelson Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexpyattaev Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

t-nelson Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

t-nelson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gregcusack commented Jul 23, 2025 •

edited

Loading

codecov-commenter commented Jul 23, 2025 •

edited

Loading

t-nelson Jul 30, 2025 •

edited

Loading

alexpyattaev Aug 7, 2025 •

edited

Loading

t-nelson Aug 12, 2025 •

edited

Loading