Skip to content

fix(netwatch): BSD rebind socket on errors #2913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 12, 2024

Conversation

dignifiedquire
Copy link
Contributor

@dignifiedquire dignifiedquire commented Nov 8, 2024

also uses AbortOnDropHandle to better cleanup tasks

Closes #2909

@dignifiedquire dignifiedquire requested a review from flub November 8, 2024 09:02
Copy link

github-actions bot commented Nov 8, 2024

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/2913/docs/iroh/

Last updated: 2024-11-12T21:34:12Z

Copy link

github-actions bot commented Nov 8, 2024

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: a0561b3

Comment on lines 32 to 33
#[allow(dead_code)]
handle: AbortOnDropHandle<()>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[allow(dead_code)]
handle: AbortOnDropHandle<()>,
_handle: AbortOnDropHandle<()>,

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's actually used, which is why I don't want to use _handle

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it doesn't need #[allow(dead_code)]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but rust is too stupid..it gets constructed and it has drop semantics.. but the compiler doesn't understand that as usage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are we having a philosophical disagreement on what unused means?

My interpretation is the opposite in that case, to me it is definitely not dead code. _thing does not mean something is unused, let _guard = thing(); is an entirely common pattern, this is just the same but in a struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, will change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

actor.run().await;
if let Err(err) = actor.run().await {
warn!("netmon actor died: {:?}", err);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially a supervisor. The problem with this version is that you get a single WARN in the logfile and things just carry on. I wonder if maybe it might as well restart the actor - maybe with a small delay. That is still not proper supervision - if it keeps dying we should bring down the entire Endpoint - but it's a start and we'll flood the logs with WARN messages which could be a bit more obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally yes, but currently there is no error case where this happens (and it is not super easy to restart the actor currently)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines 129 to 131
Some(event) = self.mon_receiver.recv() => {
match event {
NetworkMessage::Change => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some(NetworkMessage::Change) = self.mon_receiver.recv() might be a nicer way to write this?

I also noticed that NetworkMessage::Change is marked with #[allow(dead_code)]. This seems suspicious, can we remove that marker as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems suspicious, can we remove that marker as well?

no because this message is not used on android, and I didn't want to start adding cfgs to lint hints

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some(NetworkMessage::Change) = self.mon_receiver.recv() might be a nicer way to write this?

maybe, we usually always put the match inside, so I wrote it like that for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see issues with tokio::select, I cleaned this up to not use an elsebranch

@@ -102,7 +101,7 @@ impl Actor {
self.actor_sender.clone()
}

pub(super) async fn run(mut self) {
pub(super) async fn run(mut self) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, the signature is changed but there's not a single error return?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a version where it propagated an error. and then I left this change in, can revert, no strong feelings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's revert as then the supervisor discussion goes away as well :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 15 to 16
#[allow(dead_code)]
handle: AbortOnDropHandle<()>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[allow(dead_code)]
handle: AbortOnDropHandle<()>,
_handle: AbortOnDropHandle<()>,

?

I'm not a fan of #[allow(dead_code)] as it easily stays around too long. The _ is much easier to keep track of as the compiler will tell you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -52,12 +55,25 @@ impl RouteMonitor {
}
Err(err) => {
warn!("AF_ROUTE: error reading: {:?}", err);
// recreate socket, as it is likely in an invalid state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a small delay in case this is hot-looping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear, the reported error doesn't because the socket is already dead

@dignifiedquire dignifiedquire force-pushed the fix-netmon-bsd-socket-error branch from 270457c to ec61b9b Compare November 12, 2024 16:40
@dignifiedquire dignifiedquire added this to the v0.29.0 milestone Nov 12, 2024
also uses AbortOnDropHandle to better cleanup tasks

Closes #2909
@dignifiedquire dignifiedquire force-pushed the fix-netmon-bsd-socket-error branch from ec61b9b to e3c3455 Compare November 12, 2024 20:04
@dignifiedquire dignifiedquire added this pull request to the merge queue Nov 12, 2024
Merged via the queue into main with commit c451750 Nov 12, 2024
26 of 27 checks passed
matheus23 pushed a commit that referenced this pull request Nov 14, 2024
also uses `AbortOnDropHandle` to better cleanup tasks

Closes #2909

---------

Co-authored-by: Divma <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Nov 15, 2024
## Description

With #2913, the netmon actor properly exits when the channel sender is
dropped, but the android implementation never kept the sender around.
The magicsock actor would then error when trying to subscribe to the
network monitor, and the `Endpoint` would be unusable.

## Change checklist

- [x] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
@dignifiedquire dignifiedquire deleted the fix-netmon-bsd-socket-error branch November 28, 2024 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

BSD RouteMonitor does not handle AF_ROUTE socket disconnection well
3 participants