Skip to content

connmgr: fix transport association bug #3221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 21, 2025
Merged

connmgr: fix transport association bug #3221

merged 1 commit into from
Mar 21, 2025

Conversation

sukunrt
Copy link
Member

@sukunrt sukunrt commented Mar 4, 2025

The second listen for the same IP:Port combination doesn't get tagged with the association. So if you listen on webtransport after quic, the wt dials won't reuse the transport.

We should refactor this is bit more, but any change will probably change the refCountedTransport interface which is returned from one public method, so I'd like to do that in a separate PR, and use this for a patch release.

This probably impacts: #3165

@sukunrt sukunrt requested a review from MarcoPolo March 4, 2025 16:45
@sukunrt sukunrt force-pushed the push-ykywklwlxkmz branch 2 times, most recently from 91c2947 to 377b411 Compare March 4, 2025 17:58
@sukunrt sukunrt requested a review from guillaumemichel March 10, 2025 15:09
@ethan-gallant
Copy link

@MarcoPolo I have confirmed this resolves the issue and enables the hole punch to succeed.

@ethan-gallant
Copy link

@sukunrt can we merge this one?

@MarcoPolo
Copy link
Collaborator

I’ll review this on Monday. Thanks for tracking this down @sukunrt

@@ -360,6 +368,35 @@ func (r *reuse) AddTransport(tr *refcountedTransport, laddr *net.UDPAddr) error
return nil
}

func (r *reuse) AssociateListenTransport(association any, tr refCountedQuicTransport) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this method? It's doing some sanity checks, but ultimately it just
calls tr.associate

Comment on lines 194 to 205
} else if c.enableReuseport && association != nil {
reuse, err := c.getReuse(netw)
if err != nil {
return nil, fmt.Errorf("reuse error: %w", err)
}
err = reuse.AssociateListenTransport(association, entry.ln.transport)
if err != nil {
return nil, fmt.Errorf("reuse associate failed: %w", err)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if c.enableReuseport && association != nil {
reuse, err := c.getReuse(netw)
if err != nil {
return nil, fmt.Errorf("reuse error: %w", err)
}
err = reuse.AssociateListenTransport(association, entry.ln.transport)
if err != nil {
return nil, fmt.Errorf("reuse associate failed: %w", err)
}
} else if c.enableReuseport {
if tr, ok := entry.ln.transport.(*refcountedTransport); ok {
tr.associate(association)
}

Copy link
Member Author

@sukunrt sukunrt Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with sanity checks?

  • The reuse type isn't exported, so this method is effectively private to this package.

I'd argue for one of the two:

  1. We move all associate inside reuse which may still require a similar method.
    or
  2. We add Associate on the refCountedTransport interface with a NOOP implementation in singleOwnerTransport to avoid the type assertion.

But for a patch release, I think this method with sanity checks is fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity checks are fine in general. These seem to be in the wrong place and thus add noise to the change. You are checking the reuse struct's invariants. Is there a reason you don't trust these invariants? Would it be better to do these checks where we would break the invariants instead (here)?

If you do trust these invariants, then the sanity checks here are just noise. and the above suggestion is a clearer change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like this would also be okay:

Suggested change
} else if c.enableReuseport && association != nil {
reuse, err := c.getReuse(netw)
if err != nil {
return nil, fmt.Errorf("reuse error: %w", err)
}
err = reuse.AssociateListenTransport(association, entry.ln.transport)
if err != nil {
return nil, fmt.Errorf("reuse associate failed: %w", err)
}
} else if c.enableReuseport {
// ... get reuse
err := reuse.AssertTransportExists(tr) // This method _only_ checks invariants
// ... handle err
if tr, ok := entry.ln.transport.(*refcountedTransport); ok {
tr.associate(association)
}

This split seems clearer to me because you have your sanity check step, and then your normal association step.

Note that I believe I would have also made the same initial design, and am mostly able to make this point as a fresh reviewer of this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you don't trust these invariants?

The input transport is provided from outside reuse. That user might pass in anything.

On the optimal design though, I've changed my mind. I think we should put associate on refCountedTransport and have a NO-Op implementation in singleOwnerTransport. Then we move the associate bit out of transportForListen in to this function. So associate always happens on all code paths no matter the input.
I'm not sure how semver compliant that is so I'll do that in a separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input transport is provided from outside reuse. That user might pass in anything.

I don't follow. The transport here is from the quicListeners map which is controlled by this quicreuse package.

On the optimal design though, I've changed my mind. I think we should put associate on refCountedTransport and have a NO-Op implementation in singleOwnerTransport.

No strong opinion here, but I think it's fine to keep this method only on refCountedTransport as it's only meaningful there.

@MarcoPolo
Copy link
Collaborator

I want to add an actual hole punching test using the simconn for this as well.

@sukunrt sukunrt force-pushed the push-ykywklwlxkmz branch from 38fa860 to 3ea686a Compare March 18, 2025 12:00
reuse port didn't work for the second transport, either QUIC or 
WebTransport, that was used for listening. This change fixes 
it by calling associate on all paths. 

This impacted hole punching for some users since you cannot hole 
punch without reuse port. There's a test in holepunch package
to prevent regressions. 

Fixes #3165

Co-authored-by: Marco Munizaga <[email protected]>
@sukunrt sukunrt force-pushed the push-ykywklwlxkmz branch from 3ea686a to f457284 Compare March 18, 2025 12:04
@sukunrt sukunrt merged commit 7bdeeb4 into master Mar 21, 2025
9 of 10 checks passed
sukunrt added a commit that referenced this pull request Mar 21, 2025
reuse port didn't work for the second transport, either QUIC or 
WebTransport, that was used for listening. This change fixes 
it by calling associate on all paths. 

This impacted hole punching for some users since you cannot hole 
punch without reuse port. There's a test in holepunch package
to prevent regressions. 

Fixes #3165

Co-authored-by: Marco Munizaga <[email protected]>
sukunrt added a commit that referenced this pull request Mar 24, 2025
reuse port didn't work for the second transport, either QUIC or 
WebTransport, that was used for listening. This change fixes 
it by calling associate on all paths. 

This impacted hole punching for some users since you cannot hole 
punch without reuse port. There's a test in holepunch package
to prevent regressions. 

Fixes #3165

Co-authored-by: Marco Munizaga <[email protected]>
sukunrt added a commit that referenced this pull request Mar 24, 2025
reuse port didn't work for the second transport, either QUIC or 
WebTransport, that was used for listening. This change fixes 
it by calling associate on all paths. 

This impacted hole punching for some users since you cannot hole 
punch without reuse port. There's a test in holepunch package
to prevent regressions. 

Fixes #3165

Co-authored-by: Marco Munizaga <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants