-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Bug: rendezvous server panics in response to libp2p_request_response::Event::Message #5997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Additionally, i saw an interesting note from a previous issue: @dariusc93 Perhaps this is similar? |
It's very possible. Could you provide any logs and your swarm configuration? |
Perhaps this helps @dariusc93 the logs i have at this moment dont show much more insight other than the panic itself, I can enable debug and see if there is anything notable Note: that i did test a deployment where i set connection limits to default, which i believe is just a max u32, so i dont think that is causing the issue Additionally, does it make sense for a panic to occur here? i would expect connections to potentially expire or close at any time, but i wouldnt expect that to crash the swarm |
Thanks. Though it should not be the issue, you might want to move the connection limit behaviour to the top so it is handled first when the behaviours are polled by swarm since polling them later in the tree can cause an inconsistent state if the connection is later denied while other behaviours accept the connection. See #4870. Besides that i dont see an exact cause in the code you provided so it may just be some inconsistency in the state of the behaviours. |
Perhaps i should move the rendezvous server behavior to the top with the connection limit behavior above it also? this way rendezvous server events get handled first? |
Moving connection limit and rendezvous server behaviors to the top has not fixed the panic issue so far |
Could you provide a log with |
From the panic before we forked libp2p
We updated the rendezvous logic to not panic if a connection dropped and we see this (we added our own log that states "failed to send response" rather than panic:
redacted a couple things but its all the same peer and same ip Here is the change we made as a PR: #6002 Now we are no longer panicking with these changes |
Uh oh!
There was an error while loading. Please reload this page.
Summary
Currently running a rendezvous server that helps nodes bootstrap into a set of peers under a topic.
The rendezvous server handled up to about 1000 connections with no issue, but now we have seen this rendezvous server get a panic from within the libp2p-rendezvous crate.
The number of connections has increased to about 3k-5k and this issue started to show up. We removed pending connection and established connection limits and started to see this behavior.
thread 'tokio-runtime-worker' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libp2p-rendezvous-0.14.0/src/server.rs:184:38:
This is the where the panic is occurring:
rust-libp2p/protocols/rendezvous/src/server.rs
Line 184 in 74e3157
I'm unsure if the issue is related to the number of connections, but that's the only thing that has changed.
The comments suggest that
self.inner.send_response
will bubble up an error:If the [
ResponseChannel] is already closed due to a timeout or the connection being closed, the response is returned as an
Errfor further handling.
https://github.com/libp2p/rust-libp2p/blob/master/protocols/request-response/src/lib.rs#L484
Questions:
self.inner.send_response
to panic if a connection timed out or closed?Additional Note:
Expected behavior
I would expect for the response to fail for that specific connection and for the rendezvous server to continue handling all the other connections that are happening rather than crashing
Actual behavior
It panics frequently, after a couple minutes of handling connections.
Relevant log output
`thread 'tokio-runtime-worker' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libp2p-rendezvous-0.14.0/src/server.rs:184:38:`
Possible Solution
Perhaps the error should just be handled in a way that doesnt panic the entire swarm?
Version
libp2p=version = "0.53.2"
Would you like to work on fixing this bug?
Maybe
The text was updated successfully, but these errors were encountered: