-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix(kad): enforce a timeout for inbound substreams #6009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
925b8ac
to
36ce9b5
Compare
I'm getting an incorrect CI error:
We need a new patch version, because https://crates.io/crates/libp2p-kad is at 0.47.0. I've updated |
Thank you for the PR @teor2345! For outbound substreams, the timeout is implemented by using the That would also match the implementation in other protocols, as you already stated in #5981:
|
Unfortunately not (or not without a larger refactor). rust-libp2p/protocols/kad/src/handler.rs Lines 906 to 907 in 5377558
It would be possible to hold the streams in a But we can't iterate through a rust-libp2p/protocols/kad/src/handler.rs Lines 557 to 582 in 5377558
So the only solution I could find is to add a timeout field to some of the inbound substream states. ( The underlying issue is that the code couples the state of the substream with the stream of items from it. A refactor could put Is this a change that would be acceptable in a bug fix? Particularly one that other users might want backported? |
Could we not use EDIT: Or maybe poll the stream in the future for the set? |
Sure, but that doesn't deal with substream reuse, because the There are two pieces of functionality that this code needs:
Here's one possible way to implement that:
This will involve some quite weird types, like Is this a change that would be acceptable in a bug fix? Particularly one that other users might want backported? I also can't guarantee I'll have time to work on this any time soon, because the current PR code works, and fixes our downstream bug. |
Thank you for the follow-ups and detailed explanation @teor2345.
Opened thomaseizinger/rust-futures-bounded#8 to see if we can implement iterator for |
Description
In the
kad
substream handler, outbound substreams have a 10s timeout, but inbound substreams don't have any timeout.This results in large numbers of warnings under specific heavy load conditions, which we have encountered in subspace:
Fixes #5981
Notes & open questions
Should the substream be closed on a timeout?
The existing code doesn't close them on (most) substream errors, so this PR handles timeouts the same way.
I have been unable to replicate the specific conditions leading to this bug in a test or my local subspace node, but we've confirmed the fix works on multiple nodes in the subspace network.
Change checklist