-
Notifications
You must be signed in to change notification settings - Fork 17
Close connections where streams haven't been opened since some time #77
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a problem with this approach: QUIC imposes a limit on how many streams can be opened at the same time. Once that limit is reached, calls to OpenStream
will block.
It might be cleaner to add a PunchHole(remote *net.UDPAddr)
method to the QUIC transport. This method would send out a single UDP packet (with random payload) to the remote addr. That would create the NAT binding we need, and we'd only end up with a single QUIC connection.
We also need logic in the swarm to always select a direct connection over a relay connection, otherwise we may end up with clown shoes. |
@vyzo We already do that. Please see libp2p/go-libp2p-swarm#233. |
connmgr.go
Outdated
if !info.lastStreamOpen.IsZero() && time.Since(info.lastStreamOpen) > maxStreamOpenDuration { | ||
selected = append(selected, c) | ||
target-- | ||
seen[c] = struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should only do this if there are no active streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And because it might be expensive to count on demand, we can keep a running count using the StreamOpened/StreamClosed events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current heuristic is:-
"Cleanup connections that vane't seen a stream since 10 Minutes (but could have open streams older than that)".
Are you saying that we should clean-up streams that:
"Don't have any open streams at all inspite of being older than 10 minutes" ?
Any reason to prefer the latter ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That heuristic is incorrect, it will break applications with long lived stream.
Imagine a pubsub client, who simply connects and opens the pubsub streams and does nothing else afterwards.
If we reap the connection because there are no new streams, we just broke the app.
So yeah, we should only close connections if they have been idle for 10min without any streams present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense.
Have addressed your review. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some small coding issues that need to be addressed.
connmgr.go
Outdated
@@ -19,6 +19,8 @@ var SilencePeriod = 10 * time.Second | |||
|
|||
var log = logging.Logger("connmgr") | |||
|
|||
var maxStreamOpenDuration = 10 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name... this is poorly named.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
connmgr.go
Outdated
} | ||
|
||
// connections that haven't seen a new stream since 10 minutes and have NO streams open. | ||
if !info.lastStreamOpen.IsZero() && time.Since(info.lastStreamOpen) > maxStreamOpenDuration && info.nStreams == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the count check before the time.Since check as it is a lot cheaper and can short-circuit; you can even put it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, cache the current time and use that instead of doing gettimeofday 1ktimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point.
connmgr.go
Outdated
_, ok = cinf.conns[c] | ||
if !ok { | ||
log.Error("received stream close notification for conn we are not tracking: ", p) | ||
return | ||
} | ||
|
||
cinf.conns[c].nStreams-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clown shoes... you are doing the map lookup twice, and it's not free. Just capture it in a variable in the first lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
connmgr.go
Outdated
_, ok = cinf.conns[c] | ||
if !ok { | ||
log.Error("received stream open notification for conn we are not tracking: ", p) | ||
return | ||
} | ||
|
||
cinf.conns[c].lastStreamOpen = time.Now() | ||
cinf.conns[c].nStreams++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing here, you are doing the lookup 3(!!!!) times; just capture it in a variable and use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Also, we need to make sure we are not closing a protected connection with this logic. |
@vyzo Have addressed your changes. We don't close protected connections with this logic as we filter out protected connections from the candidate set in the |
As discussed offline with @vyzo , this is quickly getting messy for now. We don't need this for now because: i. Limited Relays will anyways close conns after a grace period. |
For libp2p/go-libp2p#1039.
As discussed with @Stebalien , once hole punching is in place, we could end up with multiple connections between peers often. That's because we'd have the Relayed connection on which the successful hole punch was co-ordinated and for QUIC, we'd simply end up with two connections between the peers during a hole punch.
However, in the Swarm, we'll ONLY open stream on the newest direct connection with the most streams going ahead. So, in the connection manager, we should close Connections that haven't seen a stream in some time.
Note that for opening a Stream, the Swarm will always prefer a direct connection over a Relayed connection as implemented in libp2p/go-libp2p-swarm#233.