Skip to content

swarm: connection management behaviours can lead to inconsistent state #4870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
thomaseizinger opened this issue Nov 16, 2023 · 2 comments

Comments

@thomaseizinger
Copy link
Contributor

In #4777, we discovered that a "connection management" behaviour like libp2p-connection-limits can lead to inconsistent state if it is not applied "first" in the tree of behaviours.

That is because the handle_ functions take &mut self and thus allow modifying the state of a NetworkBehaviour, potentially pre-loading a handler with state. If a behaviour deeper in the tree ends up rejecting the connection, that state is lost.

Preloading handlers is useful to ensure they don't "instant timeout" based on keep-alive after a connection has been established.

The underlying design issue here is that the composition of handle_ functions leads to an inconsistent view across the state of NetworkBehaviours. By having them take &self, this issue would not occur (unless the user users internal mutability but that we can't guard against).

Somehow, we'd need to find a way to still pass data to the handler but only in case the connection is fully established. Perhaps we should find a way to mitigate the "instant shutdown" in a different way by not starting the Connection task until the ConnectionEstablished event has been dispatched to the behaviour?

@mxinden
Copy link
Member

mxinden commented Nov 22, 2023

Thank you for documenting this. Unfortunate that we did not catch this earlier.

Two alternative suggestions:

  1. Only allow denying a pending connection, not an established connection. In other words change the signature of handle_established_inbound_connection and handle_established_outbound_connection. Arguably a recess for connection management, though potentially worth the removal of this footgun.
    diff --git a/swarm/src/behaviour.rs b/swarm/src/behaviour.rs
    index c25b14e75..434fe5c3c 100644
    --- a/swarm/src/behaviour.rs
    +++ b/swarm/src/behaviour.rs
    @@ -149,13 +149,13 @@ pub trait NetworkBehaviour: 'static {
         fn handle_established_inbound_connection(
             &mut self,
             _connection_id: ConnectionId,
             peer: PeerId,
             local_addr: &Multiaddr,
             remote_addr: &Multiaddr,
    -    ) -> Result<THandler<Self>, ConnectionDenied>;
    +    ) -> THandler<Self>;
     
         /// Callback that is invoked for every outbound connection attempt.
         ///
         /// We have access to:
         ///
         /// - The [`PeerId`], if known. Remember that we can dial without a [`PeerId`].
    @@ -185,13 +185,13 @@ pub trait NetworkBehaviour: 'static {
         fn handle_established_outbound_connection(
             &mut self,
             _connection_id: ConnectionId,
             peer: PeerId,
             addr: &Multiaddr,
             role_override: Endpoint,
    -    ) -> Result<THandler<Self>, ConnectionDenied>;
    +    ) -> THandler<Self>;
  2. At a first glance, one could call ConnectionHandler::poll_close on a denied ConnectionHandler. This would allow the ConnectionHandler to pass the pre-loaded events back to the NetworkBehaviour. Though that is not possible, given that in Swarm we don't have access to each ConnectionHandler, but only the root ConnectionHandler, which never comes into existence, given that the connection is being denied. Documenting here for completeness sake.

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Nov 22, 2023

I don't think (1) is a suitable option as it makes it impossible to implement block lists for example.

In combination with a default, non-zero idle timeout, perhaps just passing data to the handler in ConnectionEstablished is the best option?

Or, we simply specialise the "connection just got established"-case. Arguably, a user should be able to react to the SwarmEvent::ConnectionEstablished and use it, even if no NetworkBehaviour "needs" the connection.

I think you suggested something like this at some point @mxinden when we talked about the idle-connection-timeout.

What do you think of a 1(?)-second, unconfigurable delay before considering a just-established connection idle? It feels a bit wrong because picking a delay is inventiably sensitive to the environment we are running it. Yet, it seems the current situation is not good and I'd like to change the above to &self.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants