-
Notifications
You must be signed in to change notification settings - Fork 290
fix(netwatch): BSD rebind socket on errors #2913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/2913/docs/iroh/ Last updated: 2024-11-12T21:34:12Z |
net-tools/netwatch/src/netmon.rs
Outdated
#[allow(dead_code)] | ||
handle: AbortOnDropHandle<()>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[allow(dead_code)] | |
handle: AbortOnDropHandle<()>, | |
_handle: AbortOnDropHandle<()>, |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's actually used, which is why I don't want to use _handle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then it doesn't need #[allow(dead_code)]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but rust is too stupid..it gets constructed and it has drop semantics.. but the compiler doesn't understand that as usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or are we having a philosophical disagreement on what unused means?
My interpretation is the opposite in that case, to me it is definitely not dead code. _thing
does not mean something is unused, let _guard = thing();
is an entirely common pattern, this is just the same but in a struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, will change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
net-tools/netwatch/src/netmon.rs
Outdated
actor.run().await; | ||
if let Err(err) = actor.run().await { | ||
warn!("netmon actor died: {:?}", err); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is essentially a supervisor. The problem with this version is that you get a single WARN in the logfile and things just carry on. I wonder if maybe it might as well restart the actor - maybe with a small delay. That is still not proper supervision - if it keeps dying we should bring down the entire Endpoint - but it's a start and we'll flood the logs with WARN messages which could be a bit more obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally yes, but currently there is no error case where this happens (and it is not super easy to restart the actor currently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Some(event) = self.mon_receiver.recv() => { | ||
match event { | ||
NetworkMessage::Change => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some(NetworkMessage::Change) = self.mon_receiver.recv()
might be a nicer way to write this?
I also noticed that NetworkMessage::Change
is marked with #[allow(dead_code)]
. This seems suspicious, can we remove that marker as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems suspicious, can we remove that marker as well?
no because this message is not used on android, and I didn't want to start adding cfgs to lint hints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some(NetworkMessage::Change) = self.mon_receiver.recv()
might be a nicer way to write this?
maybe, we usually always put the match inside, so I wrote it like that for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see issues with tokio::select, I cleaned this up to not use an else
branch
@@ -102,7 +101,7 @@ impl Actor { | |||
self.actor_sender.clone() | |||
} | |||
|
|||
pub(super) async fn run(mut self) { | |||
pub(super) async fn run(mut self) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, the signature is changed but there's not a single error return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a version where it propagated an error. and then I left this change in, can revert, no strong feelings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, let's revert as then the supervisor discussion goes away as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
net-tools/netwatch/src/netmon/bsd.rs
Outdated
#[allow(dead_code)] | ||
handle: AbortOnDropHandle<()>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[allow(dead_code)] | |
handle: AbortOnDropHandle<()>, | |
_handle: AbortOnDropHandle<()>, |
?
I'm not a fan of #[allow(dead_code)]
as it easily stays around too long. The _
is much easier to keep track of as the compiler will tell you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -52,12 +55,25 @@ impl RouteMonitor { | |||
} | |||
Err(err) => { | |||
warn!("AF_ROUTE: error reading: {:?}", err); | |||
// recreate socket, as it is likely in an invalid state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a small delay in case this is hot-looping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear, the reported error doesn't because the socket is already dead
270457c
to
ec61b9b
Compare
also uses AbortOnDropHandle to better cleanup tasks Closes #2909
ec61b9b
to
e3c3455
Compare
also uses `AbortOnDropHandle` to better cleanup tasks Closes #2909 --------- Co-authored-by: Divma <[email protected]>
## Description With #2913, the netmon actor properly exits when the channel sender is dropped, but the android implementation never kept the sender around. The magicsock actor would then error when trying to subscribe to the network monitor, and the `Endpoint` would be unusable. ## Change checklist - [x] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented.
also uses
AbortOnDropHandle
to better cleanup tasksCloses #2909