Skip to content

refactor(tcp): use SelectAll for driving listener streams #3361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 30, 2023

Conversation

vnermolaev
Copy link
Contributor

@vnermolaev vnermolaev commented Jan 20, 2023

Description

The PR optimizes polling of the listeners in the TCP transport by using futures::SelectAll instead of storing them in a queue and polling manually.

Resolves #2781.

Notes

I observed that the following branches are never triggered

Poll::Ready(None) => {
return Poll::Ready(TransportEvent::ListenerClosed {
listener_id: listener.listener_id,
reason: Ok(()),
});
}
Poll::Ready(Some(Err(err))) => {
return Poll::Ready(TransportEvent::ListenerClosed {
listener_id: listener.listener_id,
reason: Err(err),
});
}

This is because the underlying ListenStream always returns Poll::Ready(Some(Ok(..))) and neither Poll::Ready(Some(Err(...))) (in-line documentation says such errors are not fatal), nor Poll::Ready(None).

I ended up following the approach found in

https://github.com/vnermolaev/rust-libp2p/blob/acf84113cff5843234ba6ea20c79b079e9d1195c/protocols/relay/src/priv_client/transport.rs#L232-L235

Please advise. Is this the desired behaviour? Otherwise, what approach should I follow? @elenaf9

Links to any relevant issues

#2781

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@vnermolaev vnermolaev changed the title transports/tcp: use futures::stream::SelectAll for driving listener streams refactor(transports/tcp: use futures::stream::SelectAll for driving listener streams Jan 20, 2023
@vnermolaev vnermolaev changed the title refactor(transports/tcp: use futures::stream::SelectAll for driving listener streams refactor(transports/tcp): use futures::stream::SelectAll for driving listener streams Jan 20, 2023
@vnermolaev vnermolaev changed the title refactor(transports/tcp): use futures::stream::SelectAll for driving listener streams refactor(tcp): use SelectAll for driving listener streams Jan 20, 2023
@vnermolaev vnermolaev marked this pull request as ready for review January 20, 2023 15:14
Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, and thanks for tackling this! Some notes left

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, otherwise LGTM.

Clean work, thank you!

Comment on lines 685 to 749
if let Some(event) = me.queued_events.pop_front() {
return Poll::Ready(Some(event));
}

if me.is_closed {
// Terminate the stream if the listener closed and all remaining events have been reported.
return Poll::Ready(None);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have two return paths here that don't register their own waker. You'll need to store a waker in the listener before you return Poll::Pending at the bottom.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lack expertise on this particular issue. Instead I have aligned the impl for the tcp transport with the impl for the quic transport, they are pretty similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elenaf9 Did you consider this case when implementing QUIC? I think we may have a potential unwoken task here:

  • Assume 1 active listener
  • Assume the listener had no work and returned Pending
  • We call close on the listener
  • Without an explicit waker, I believe the close will only happen once the listener gets polled again due to another event

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it appears this PR can't be merged before this question is cleared out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move forward here by just adding the waker. Worst case, we have an unnecessary wake-up.

  • Make a new field inside the listener, holding an Option<Waker>
  • Call it close_listener_waker
  • Set it before you return Poll::Pending at the bottom
  • Use take to remove the listener within close and if it is present, call wake on it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jxs @mxinden perhaps you want to take a look at this too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my eyes, the suggestion above by @thomaseizinger is the way to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @thomaseizinger. Yes you are right, we need the explicit waker here.

@vnermolaev vnermolaev force-pushed the feature/select-all branch 3 times, most recently from c1eba3d to 8980a45 Compare January 24, 2023 10:56
Comment on lines 685 to 749
if let Some(event) = me.queued_events.pop_front() {
return Poll::Ready(Some(event));
}

if me.is_closed {
// Terminate the stream if the listener closed and all remaining events have been reported.
return Poll::Ready(None);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my eyes, the suggestion above by @thomaseizinger is the way to go.

@vnermolaev vnermolaev force-pushed the feature/select-all branch 4 times, most recently from 44b83d9 to ab06ff8 Compare January 27, 2023 12:26
@vnermolaev
Copy link
Contributor Author

vnermolaev commented Jan 27, 2023

@mxinden @thomaseizinger
I have applied the requested changes, i.e.,

We can move forward here by just adding the waker. Worst case, we have an unnecessary wake-up.

* Make a new field inside the listener, holding an `Option<Waker>`

* Call it `close_listener_waker`

* Set it before you return `Poll::Pending` at the bottom

* Use `take` to remove the listener within `close` and if it is present, call `wake` on it

However,

  1. now, there are linting errors coming from parts other than the TCP transport. Should they be addressed separately?
    Also the following lint
error: lint `clippy::derive_hash_xor_eq` has been renamed to `clippy::derived_hash_with_manual_eq`
Error:   --> muxers/mplex/src/codec.rs:68:14
   |
68 |     #![allow(clippy::derive_hash_xor_eq)]
   |              ^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use the new name: `clippy::derived_hash_with_manual_eq`
   |
   = note: `-D renamed-and-removed-lints` implied by `-D warnings`

if applied will also break the build stating that such a lint is not known... can someone confirm?
2. The issue with sending the last event on closing and thus setting up a waker, I assume, also exists in other transports as it was the case and this one.

@mergify
Copy link
Contributor

mergify bot commented Jan 27, 2023

This pull request has merge conflicts. Could you please resolve them @vnermolaev? 🙏

Copy link
Member

@elenaf9 elenaf9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tackling this @vnermolaev!

👍 LGTM

  1. now, there are linting errors coming from parts other than the TCP transport. Should they be addressed separately?

They are already addressed in #3389. Would you mind reverting ab06ff8?

  1. The issue with sending the last event on closing and thus setting up a waker, I assume, also exists in other transports as it was the case and this one.

Yes you are right. Would you be interested in doing patches to fix them there as well (in separate PRs; namely for libp2p-quic and libp2p-webrtc)?

Comment on lines 685 to 749
if let Some(event) = me.queued_events.pop_front() {
return Poll::Ready(Some(event));
}

if me.is_closed {
// Terminate the stream if the listener closed and all remaining events have been reported.
return Poll::Ready(None);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @thomaseizinger. Yes you are right, we need the explicit waker here.

@vnermolaev
Copy link
Contributor Author

vnermolaev commented Jan 27, 2023

  1. now, there are linting errors coming from parts other than the TCP transport. Should they be addressed separately?

They are already addressed in #3389. Would you mind reverting ab06ff8?

@elenaf9 I have rebased on top of the latest master, excluding the mentioned commit, clippy still fails on sources outside the TCP transport, i.e., swarm, see the failing check.

@elenaf9
Copy link
Member

elenaf9 commented Jan 27, 2023

  1. now, there are linting errors coming from parts other than the TCP transport. Should they be addressed separately?

They are already addressed in #3389. Would you mind reverting ab06ff8?

@elenaf9 I have rebased on top of the latest master, excluding the mentioned commit, clippy still fails on sources outside the TCP transport, i.e., swarm, see the failing check.

Yes, #3389 is not merged yet. The failing check however is only the clippy beta one, which is not required for merging the PR. Ideally #3389 will be merged soon, but it shouldn't block this PR, i.e. we can also merge without.

@vnermolaev
Copy link
Contributor Author

Yes, #3389 is not merged yet. The failing check however is only the clippy beta one, which is not required for merging the PR. Ideally #3389 will be merged soon, but it shouldn't block this PR, i.e. we can also merge without.

then I think it is ready to be merged :)

Yes you are right. Would you be interested in doing patches to fix them there as well (in separate PRs; namely for libp2p-quic and libp2p-webrtc)?

Yes, I am interested.

@@ -17,9 +17,12 @@

- Update `rust-version` to reflect the actual MSRV: 1.60.0. See [PR 3090].

- Optimize listeners polling behaviour in the tcp transport, fixing [2781](https://github.com/libp2p/rust-libp2p/issues/2781). See [PR 3361].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? It is mostly a refactoring that is not user facing.

Could mention the bug fix but otherwise I'd suggest not adding a changelog entry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

Victor Ermolaev and others added 2 commits January 29, 2023 18:17
@vnermolaev
Copy link
Contributor Author

All right, all suggestions/requests are satisfied.
Unfortunately, some CI checks fail, not triggered by changes in this PR.

Checking libp2p-request-response v0.24.0 (/home/runner/work/rust-libp2p/rust-libp2p/protocols/request-response)
error[E0369]: cannot multiply `&&EdwardsBasepointTable` by `&Scalar`
   --> /home/runner/.cargo/registry/src/g.yxqyang.asia-1ecc6299db9ec823/snow-0.9.0/src/resolvers/default.rs:132:47
    |
132 |         let point = (&ED25519_BASEPOINT_TABLE * &self.privkey).to_montgomery();
    |                      ------------------------ ^ ------------- &Scalar
    |                      |
    |                      &&EdwardsBasepointTable
    |
help: `*` can be used on `&EdwardsBasepointTable` if you dereference the left-hand side
    |
132 |         let point = (*&ED25519_BASEPOINT_TABLE * &self.privkey).to_montgomery();
    |                      +

and clippy-beta as before.

@thomaseizinger
Copy link
Contributor

See mcginty/snow#146 for the build failure.

@thomaseizinger
Copy link
Contributor

See mcginty/snow#146 for the build failure.

I am afraid we cannot fix this on our end. We'd have to insert cargo update statements in our CI pipeline which doesn't help our users either. Let's see how responsive the snow maintainers are.

@vnermolaev
Copy link
Contributor Author

Do I understand correctly that this PR is on hold for now?

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Jan 30, 2023

Do I understand correctly that this PR is on hold for now?

Yes, we can't merge PRs at the moment due to the breakage in snow.

I have a proposal up to track Cargo.lock in Git which allows us to pin an earlier version of curve25519-dalek: #3399

thomaseizinger
thomaseizinger previously approved these changes Jan 30, 2023
@mergify mergify bot dismissed thomaseizinger’s stale review January 30, 2023 12:06

Approvals have been dismissed because the PR was updated after the send-it label was applied.

@mergify mergify bot merged commit c15e651 into libp2p:master Jan 30, 2023
@vnermolaev vnermolaev deleted the feature/select-all branch January 31, 2023 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

transports/tcp: use futures::stream::SelectAll for driving listener streams
5 participants