-
Notifications
You must be signed in to change notification settings - Fork 52
feat: Gossipsub interop testing #648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
git-subtree-dir: gossipsub-interop git-subtree-split: fa401a8099eb0b290317b0289513deaadbc46db7
also @jxs , would do you think about adding some more detailed tracing on receiving and sending the RPC. I'm jumping through some awkward hoops to properly log duplicates, and I'm not getting all the info I want there either. |
Reviewer recommendations:
|
Nice idea! I couldn't yet try or go through in detail, but let me add a few things here from past experience (maybe some of them already implemented):
|
This was a bug on my end. Specifically the message ID function in the go implementation was incorrect (it would use the string form of an integer instead of the byte form (e.g. "0" (ascii for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work Marco, thanks for getting this started!
left a couple nits and left a review of the rust code on https://github.com/libp2p/test-plans/pull/649/files to be easier
also @jxs , would do you think about adding some more detailed tracing on receiving and sending the RPC. I'm jumping through some awkward hoops to properly log duplicates, and I'm not getting all the info I want there either.
makes sense to me!
Very siked to try this simulation with sigp/rust-libp2p#570
Thanks for the feedback @cskiraly!
We should discuss this synchronously. but I agree that this definition might be a bit too constraining and too vague. For example, if an implementation improved the resulting latency and bandwidth efficiency metrics significantly and kept the same reliability it should be considered interoperable. I think the inverse, an implementation that significantly degrades dissemination latency and bandwidth efficiency despite maintaining reliability, should not be considered interoperable (maybe practically interoperable is a better word here?). A more accurate definition of an interoperable implementation is an implementation that faithfully implements the specification (assuming the specification is well defined and correct). This testing framework tries to test this by proxy and being agnostic to the details of the specification by testing the higher level results that we practically care about. That said, I'm open to suggestions for a better name.
Yep. Individual machine logs are in
I just checked, and for all-rust and all-go the results aren't exactly the same, but they are very close. I'm not sure why they aren't exactly the same (maybe new schedule order introduced from more calls to Running the same test without changing log level yields the same results.
Thanks! I intentionally deferred visualizations for later, but this is a good note! One nice thing about this system is that the visualizations will be reusable across implementations without having to change any implementation.
👍
👍 I'll add
By using the git commit hash as mentioned above we should just get this for free.
Agreed. This should be easy to add on top of this with a simple bash/whatever script. |
Co-authored-by: João Oliveira <[email protected]>
Co-authored-by: João Oliveira <[email protected]>
I agree with @cskiraly that different implementations don't have to give the same performance. Concretely I think there are two objectives of interoperability:
|
config_builder | ||
.validation_mode(ValidationMode::Anonymous) | ||
// Custom message ID function similar to Go implementation | ||
.message_id_fn(|message| MessageId::from(&message.data[0..8])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marco, while speaking with @dknopik which has also simulated gossipsub with shadow , According to his report:
Issue A: We send out IDONTWANT pretty early, before most validation. IRL, this means we do this before spending ~5ms validating the blob. In the simulation, the validation takes ~0ms (compare this shadow feature request), so blobs propagate more quickly, leaving less opportunity to receive a IDONTWANT before validation finishes. Therefore, we dont save bandwidth.
wrt to the delay Daniel suggests:
.message_id_fn(|message| MessageId::from(&message.data[0..8])); | |
.message_id_fn(|message| { | |
std::thread::sleep(std::time::Duration::from_millis(5)); | |
MessageId::from(&message.data[0..8]) | |
}); | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense. I agree it would be good to add some validation delay in the system.
I'm not sure about putting it in the message id function. Doesn't that seem wrong? Is there a way we can do validation explicitly? and maybe with a future instead of a thread sleep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the link to @dknopik's experiment. I was trying reach out to him to ask him some questions, could you connect us please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We send out IDONTWANT pretty early, before most validation
What does most validation mean? Shouldn't we send an IDONTWANT
only when we have finished all relevant validation and we are sure we can accept the message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we should send IDONTWANT before application validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rust libp2p deletes the message from the cache in this case https://docs.rs/libp2p-gossipsub/0.48.0/libp2p_gossipsub/struct.Behaviour.html#method.report_message_validation_result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it also allow another kind of attack as a single message triggers the sending of an IDONTWANT to mesh peers and if flood publish is enabled many more (I don't know if IDONTWANT is sent in this case tho)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this open an attack vector where a malicious peer can send an invalid message for a specific message ID, thereby delaying or even preventing a peer from receiving the valid message?
Message ID is a local calculation when you have a message. I can't give you a message for some other ID.
Could it also allow another kind of attack as a single message triggers the sending of an IDONTWANT to mesh peers and if flood publish is enabled many more (I don't know if IDONTWANT is sent in this case tho)?
For each invalid message, you send an unwanted IDONTWANT message to each peer. An attacker can use this at worst (I think) to cause a peer to receive D unwanted IDONTWANT message if they can find all of the target's mesh peers. This isn't that attractive because a message needs to be pretty large to trigger the IDONTWANT, so you may not even see an amplification.
Flood publish in gossipsub only applies when you are publishing a new message. Not forwarding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submitted #660
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation.
Flood publish in gossipsub only applies when you are publishing a new message. Not forwarding.
IDONTWANT
is a published message, isn't it?
Edge(australia, australia, 2), | ||
Edge(australia, east_asia, 110), | ||
Edge(australia, europe, 165), | ||
Edge(australia, na_west, 110), | ||
Edge(australia, na_east, 150), | ||
Edge(australia, south_america, 190), | ||
Edge(australia, south_africa, 220), | ||
Edge(australia, west_asia, 180), | ||
Edge(east_asia, australia, 110), | ||
Edge(east_asia, east_asia, 4), | ||
Edge(east_asia, europe, 125), | ||
Edge(east_asia, na_west, 100), | ||
Edge(east_asia, na_east, 140), | ||
Edge(east_asia, south_america, 175), | ||
Edge(east_asia, south_africa, 175), | ||
Edge(east_asia, west_asia, 110), | ||
Edge(europe, australia, 165), | ||
Edge(europe, east_asia, 125), | ||
Edge(europe, europe, 2), | ||
Edge(europe, na_west, 110), | ||
Edge(europe, na_east, 70), | ||
Edge(europe, south_america, 140), | ||
Edge(europe, south_africa, 95), | ||
Edge(europe, west_asia, 60), | ||
Edge(na_west, australia, 110), | ||
Edge(na_west, east_asia, 100), | ||
Edge(na_west, europe, 110), | ||
Edge(na_west, na_west, 2), | ||
Edge(na_west, na_east, 60), | ||
Edge(na_west, south_america, 100), | ||
Edge(na_west, south_africa, 160), | ||
Edge(na_west, west_asia, 150), | ||
Edge(na_east, australia, 150), | ||
Edge(na_east, east_asia, 140), | ||
Edge(na_east, europe, 70), | ||
Edge(na_east, na_west, 60), | ||
Edge(na_east, na_east, 2), | ||
Edge(na_east, south_america, 100), | ||
Edge(na_east, south_africa, 130), | ||
Edge(na_east, west_asia, 110), | ||
Edge(south_america, australia, 190), | ||
Edge(south_america, east_asia, 175), | ||
Edge(south_america, europe, 140), | ||
Edge(south_america, na_west, 100), | ||
Edge(south_america, na_east, 100), | ||
Edge(south_america, south_america, 7), | ||
Edge(south_america, south_africa, 195), | ||
Edge(south_america, west_asia, 145), | ||
Edge(south_africa, australia, 220), | ||
Edge(south_africa, east_asia, 175), | ||
Edge(south_africa, europe, 95), | ||
Edge(south_africa, na_west, 160), | ||
Edge(south_africa, na_east, 130), | ||
Edge(south_africa, south_america, 190), | ||
Edge(south_africa, south_africa, 7), | ||
Edge(south_africa, west_asia, 110), | ||
Edge(west_asia, australia, 180), | ||
Edge(west_asia, east_asia, 110), | ||
Edge(west_asia, europe, 60), | ||
Edge(west_asia, na_west, 150), | ||
Edge(west_asia, na_east, 110), | ||
Edge(west_asia, south_america, 145), | ||
Edge(west_asia, south_africa, 110), | ||
Edge(west_asia, west_asia, 5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to Daniel:
Issue B: Shadow does not support network latency jitter, and my current network topology does not make a difference between nodes in a region/reliability. This means that if a node in Europe sends a message to all peers, the message will arrive at the exact same time for nodes in the same region/reliability: after 70ms in US East, 110ms in US West, etc.. So nodes in the same region will never have the opportunity to receive an IDONTWANT from nodes in the same region, as they process the message exactly simultaneously, and instantly (see issue A).
Quick fix:
drop intra-region latency low enough so that some IDONTWANTs will be received before the above 5ms
we already have most intra regions under 5 seconds, only West Asia is 5 seconds, and South America and South Africa are set to 7 seconds, should we bring them all to under 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this help the issue? If all nodes in a region get the message at the same time, why does the intra-region latency matter?
Also we connect nodes randomly in the network, your peers won't necessarily be in your region
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intra-region latency matters, as lower latency allows IDONTWANT
s to arrive before validation is done.
e.g.:
Let's suppose node A has latency 100ms to nodes B and C.
Node A sends a message to B and C. If there is no validation delay, B and C will immediately send IDONTWANT and the actual message to each other. If there is a validation delay of e.g. 5ms, B and C will immediately send IDONTWANT. Now, if the latency between B and C is below 5ms, the IDONTWANT will always arrive before validation finishes, and the sending of the message is prevented. If the latency is above 5ms, the IDONTWANT has not arrived when validation is finished and both nodes always send the message.
As you can see, the effectiveness of IDONTWANT is very dependent on the configuration parameters in this simple case, as IDONTWANT are either always on time, or always late. Shadow does not support jitter in latency, which would allow us making it be based on chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the impact of the shadow simulation on IDONTWANT
effectiveness, is it a shadow issue, or is it because we don't run validation code in the simulations? Apparently, shadow runs the application code normally, and the real time will pass accordingly. The difference is that the simulation time isn't advanced. This shouldn't interfere with IDONTWANT
sent and checks as they aren't time-based, should it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation @dknopik. I understand now. In these simulations we still see IDONTWANT benefits because all nodes in a mesh are not in the same region.
We also don't currently account for validation time, although I would like to. Doing something simple like sleeping for some time would let us model some expensive operation during validation in simulation time.
@diegomrsantos IDONTWANT checks are timing based. They aren't effective if you receive them after you've sent a message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation time likely depends heavily on the application. If we choose to run the actual validation code or simulate it with sleeps, it might make more sense to handle that externally when simulating specific systems, rather than hardcoding a fixed value here. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the real world we take advantage of the delay in validation to help make IDONTWANT more effective. Imagine the case where validation delay was really really big, you would see IDONTWANT being effective in reducing duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I might’ve commented in the wrong thread. Just to give a bit more context—my point is about Shadow not modeling CPU time. That’s often seen as a limitation when simulating scenarios involving IDONTWANT. What I’m trying to say is that if we run the actual validation code during simulations, it will still take the same amount of real time as in a real system, since Shadow executes the application code as-is. The only difference is that this time isn’t reflected in the simulated time. But I don’t think that’s an issue for evaluating the effectiveness of IDONTWANT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked a question on a shadow discussion shadow/shadow#1675 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this interop tester, we'll simulate validation code by simply sleeping for some amount of time. That will cause the simulation time to advance.
We agree that all implementations should get the same reliability (100% for most scenarios). For the performance aspect, this is much looser. I'm not saying they need the same performance, just a ball park. If an implementation is 10x slower, and that degrades the whole network then its performance becomes an interoperability issue (technically a network health issue, but "interoperability" is a convenient umbrella term).
I agree with this. |
Regarding delays, I was using the RIPE Atlas database here https://ethresear.ch/t/pppt-fighting-the-gossipsub-overhead-with-push-pull-phase-transition/22118#p-53773-simulation-scenario-11 I would recommend using the same. You need this in the
Then you need a function to assign nodes to IDs in the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
How would we go about extending this framework for various stress testing and benchmarking scenarios? For example:
- Publish messages at a given production rate from a group of publishers (or all nodes) using a preset size distribution, over X topics, with a concrete interval +/- fuzz.
- Publish the same message from a group of publishers at regular intervals. Repeat with fresh messages until the test ends. (e.g. this is the blob sidecar scenario).
- Randomize subscriptions over specific topics and publish messages via fanout (e.g. this is the attestations scenario).
Or are such scenarios pushing the framework beyond its indended scope? In principle, they should all be doable with Shadow; it's just tricky with the statically-defined instruction file being the only input. Adding more control flow directives is probably a no-go: soon we'd end up with a fully fledged DSL in our hands.
One possible path is to extend the framework to receive instructions from a controller process, either via stdin or via an IPC socket (unless they're instrumented by Shadow — I wonder how we'd work around this). Not sure if the controller would need to "watch" Shadow: backpressure from stdin or IPC socket might be enough to pace the instruction stream. Some inspiration here: https://shadow.github.io/docs/guide/getting_started_tgen.html.
D: int | None = None # Optimal degree for a GossipSub topic mesh | ||
Dlo: int | None = None # Lower bound on the number of peers in a topic mesh | ||
Dhi: int | None = None # Upper bound on the number of peers in a topic mesh | ||
Dscore: int | None = None # Number of high-scoring peers to retain when pruning | ||
Dout: int | None = ( | ||
None # Quota for outbound connections to maintain in a topic mesh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If we use Field()
we can include descriptions that appear in API docs, and can attach additional validation rules if we'd like to (greater than, min length, etc.)
D: int | None = Field(None, description="Optimal degree for a GossipSub topic mesh")
Dlo: int | None = Field(None, description="Lower bound on the number of peers in a topic mesh")
Dhi: int | None = Field(None, description="Upper bound on the number of peers in a topic mesh")
Dscore: int | None = Field(None, description="Number of high-scoring peers to retain when pruning")
Dout: int | None = Field(None, description="Quota for outbound connections to maintain in a topic mesh")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a bit more? I don't see the description argument in the Field: https://docs.python.org/3/library/dataclasses.html
panic(err) | ||
} | ||
addr := fmt.Sprintf("/ip4/%s/tcp/9000/p2p/%s", addrs[0], peerId) | ||
// TODO support QUIC in Shadow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What else is needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub opportunistic_graft_peers: Option<i32>, | ||
#[serde(rename = "opportunisticGraftTicks")] | ||
pub opportunistic_graft_ticks: Option<i32>, | ||
#[serde(rename = "opprotunisticGraftTicksBackoff")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[serde(rename = "opprotunisticGraftTicksBackoff")] | |
#[serde(rename = "opportunisticGraftTicksBackoff")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also use #[serde(rename_all = "camelCase")]
, and then override individual fields if necessary (e.g. the Ds, TTL, etc.) to avoid manual errors if you'd like.
These are good scenarios. I'll add support for them.
For these scenarios, I don't think it's too hard. We can get pretty far by expanding/unrolling control flow in the step before running the test.
Let's cross the bridge when we need to. I've found having a well defined static sequence of steps easy to understand. |
* introduce sleep to simulate blob verification time * rust: async gossipsub validation * Add SetTopicValidationDelayInstruction --------- Co-authored-by: Marco Munizaga <[email protected]>
Should we merge this, and continue evolving it on master? If there are oustanding items/blockers preventing merge, it would be great to capture them in a checklist for visibility. |
Merging this now and we can iterate on smaller PRs |
This PR adds a way to test interop between different GossipSub implementations. Check the README.md for a deeper explanation.
At a very high level this lets you simulate a large network composed of nodes with different implementations and collect results. Interoperability is defined as having roughly the same results across varying compositions. e.g. an all rust network should behave the same as a 50/50 rust/go network.
This can also support different kinds of tests (called scenarios) to run under simulation without changing the implementations.
This tool is useful to gain confidence in interoperability and inform protocol and implementation changes.
@jxs, I would appreciate a review (or feel free to make any changes) to the rust-libp2p implementation here. All issues are my own, any clever tricks are the LLM's.
One curious thing stood out to me when running this. It seems that when I run a network of 50/50 go/rust nodes I see more duplicates being sent than in a network of just go and rust nodes. This may be a bug on my endIndeed a bug on my end. See my comment belowThanks to @ppopth for the initial Shadow+GossipSub base.