-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Substream limits exceeded #1504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It would be tremendously helpful if you could provide some sort of test case that makes it possible to reproduce your problems. Could you also provide the specific errors you see or some logs? And about the panic ― do you see this with both multiplexer implementations or only with mplex? |
Hi, thank you for the comment, but unfortunately I can't share the project now, and POC segregation will take much time. I thought that given the fact that there are no any projects that use get providers feature explicitly (maybe I am wrong, but can't find on the github and in the list of notable users) and there are some quickly detectable inconveniences (#1507, #1526), the problem here could be also easily discoverable... On default libp2p with debug log level I see Panic is appeared only on mplex and I believe now that it isn't linked with reaching the maximum substreams count. |
Issues libp2p#1504 and libp2p#1523 reported panics caused by polling the sink of `secio::LenPrefixCodec` after it had entered its terminal state, i.e. after it had previously encountered an error or was closed. According to the reports this happened only when using mplex as a stream multiplexer. It seems that because mplex always stores and keeps the `Waker` when polling, a wakeup of any of those wakers will resume the polling even for those cases where the previous poll did not return `Poll::Pending` but resolved to a value. To prevent polling after the connection was closed or an error happened we check for those conditions prior to every poll.
@michaelvoronov: Would you mind testing #1529 to see if the panic is gone? |
@twittner thanks, will try to do so today or tomorrow. |
* mplex: Check for error and shutdown. Issues #1504 and #1523 reported panics caused by polling the sink of `secio::LenPrefixCodec` after it had entered its terminal state, i.e. after it had previously encountered an error or was closed. According to the reports this happened only when using mplex as a stream multiplexer. It seems that because mplex always stores and keeps the `Waker` when polling, a wakeup of any of those wakers will resume the polling even for those cases where the previous poll did not return `Poll::Pending` but resolved to a value. To prevent polling after the connection was closed or an error happened we check for those conditions prior to every poll. * Keep error when operations fail. Co-authored-by: Pierre Krieger <[email protected]>
@twittner I've tested with this complex test several times - there weren't any panics on mplex. |
I guess this is resolved? 🎉 |
I have stumbled with problem of reaching the maximum number of substreams inside multiplexors.
I have complex test, and will try to describe it: from 20 nodes using libp2p-kad for connectivity I constructed 176 pairs (with
tuple_combinations
), then created 176 clients connected to this nodes. In each pair one client sends some info to other client. Each node has two separated services based on different libp2p's swarms: one for clients, other for nodes in the network. Node swarm behaviour combines kademlia, identity, ping and my own protocol. My own protocol is pretty simple and is based on theOneShotHandler
. Peer swarm just controls clients and sends requests from them to node swarm. Clients are separated entities and also built on rust-libp2p.Node swarm after receiveing request from a client uses the
get providers
capability of libp2p-kad for finding other client (second in the corresponding pair). After that it tries to connect to it by my own protocol and send some short message. So the flow looks like this:client_1
->node_1
{client swarm
->node swarm
(FindProviders-Connect-Send)} ->node_2
{node swarm
->client swarm
} ->client_2
.The first client from each pair sends 100 message to the second client. All pairs are placed in
FuturesUnordered
so sending beteen pairs is parallel but inside one pair is sequential - the test waits for recevieng message by the second client with timeout 5 sec.All of these works fine (without any timeouts) on my 176 pairs only If I increase a number of maximum substreams on multiplexors (both mplex and yamux). But after several tries it stucks also after reaching the set maximum of substreams or smth like that - I have seen several corresponding errors of reaching substream limits from mplex in logs. And for example if I increase substream counts from default 128 for mplex to 1024 the test completes thefirst time successfully. Also from logs it could be seen that message is successfully passed to node swarm of the
node_1
connected to theclient_1
.I have instrumented mplex a little bit and it seems that finding providers generates several new substreams (5-6 on my network) but only one of them will be closed.
All of this looks like the root case in the stream management on the libp2p-kad side, but after very quick analysis couldn't find the cause.
Also very rarely I have such panic:
The text was updated successfully, but these errors were encountered: