Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider reducing MAX_PATH_PROBES with ECN #2490

Open
mxinden opened this issue Mar 11, 2025 · 6 comments · May be fixed by #2560
Open

Consider reducing MAX_PATH_PROBES with ECN #2490

mxinden opened this issue Mar 11, 2025 · 6 comments · May be fixed by #2560

Comments

@mxinden
Copy link
Collaborator

mxinden commented Mar 11, 2025

Currently we probe a path up to 6 times, i.e. 3 times with ECN, 3 times without ECN.

/// The number of times that a path will be probed before it is considered failed.
///
/// Note that with [`crate::ecn`], a path is probed [`MAX_PATH_PROBES`] with ECN
/// marks and [`MAX_PATH_PROBES`] without.
pub const MAX_PATH_PROBES: usize = 3;

On Firefox Nightly we see around ~2% of connection attempts seeing an ECN black-hole. In other words for 2% of connection attempts the first 3 path probes with ECN fail and one of the consecutive path probes without ECN succeed.

Image

https://yardstick.mozilla.org/d/aeak3dvriig3kd/http3?orgId=1&from=now-7d&to=now&timezone=browser&viewPanel=panel-3

If I understand correctly our initial PTO should be ~100ms. Thus ~2% of HTTP3 connections get delayed by 300ms. I would assume most of these ~2% thus loose the race to a concurrent HTTP2 connection attempt

Given the insights from Firefox Nightly, should we reduce the first MAX_PATH_PROBES with ECN from 3 to e.g. 1? Or do we consider ~2% not significant enough?

@larseggert
Copy link
Collaborator

Should we do an experiment?

@martinthomson
Copy link
Member

What metric would you seek to optimize? HTTP/3 usage? ECN usage? Connection establishment time?

@mxinden
Copy link
Collaborator Author

mxinden commented Mar 12, 2025

Should we do an experiment?

We can. That said, having done one myself now, I am not sure it is worth the bureaucratic overhead.

Given that Firefox Nightly is a small population only, and given the overhead of an experiment, how about we enable ECN on Firefox Early Beta first. Rolling out to more devices gives us higher confidence in the percentage of ECN black holes seen by Firefox users. In case the percentage of black holes is still relatively high, we either (a) do an experiment or (b) make a change like the one suggested above.

What metric would you seek to optimize? HTTP/3 usage? ECN usage? Connection establishment time?

Metrics worth monitoring:

@mxinden
Copy link
Collaborator Author

mxinden commented Mar 27, 2025

If I understand correctly our initial PTO should be ~100ms. Thus ~2% of HTTP3 connections get delayed by 300ms. I would assume most of these ~2% thus loose the race to a concurrent HTTP2 connection attempt

Once we no longer do time threshold based loss detection before the first ACK (i.e. #2492), this is wrong. Our initial RTT is 100ms. Thus the first PTO is 100ms + 4*rttvar where rttvar is RTT/2, i.e. 300ms. The second PTO is 600ms. The third PTO is 1200ms. In sum, this is 2100ms before we detect an ECN blackhole and thus stop marking with Ect0.

See changes to handshake_delay_with_ecn_blackhole test in #2492.

We are currently rolling out ECN support to Firefox Early Beta. Unless that reveals significantly other numbers than shared above, I think we need to take action before going to Firefox Release.

Options thus far:

  • Reduce the number of MAX_PATH_PROBES with ECN.
  • Reduce the initial PTO time with ECN.
  • Only start marking packets after the handshake is done.

@larseggert
Copy link
Collaborator

As a quick fix, I would suggest doing either or both of your first two bullets. If we then still see issues, maybe do the third bullet.

@mxinden
Copy link
Collaborator Author

mxinden commented Apr 5, 2025

I don't find reports by others with such high ECN black-hole rates (i.e. > 2%). Thus I slightly doubt the metrics I introduced.

We previously reduced the set of connection failures we consider an ECN blackhole via https://phabricator.services.mozilla.com/D239884.

I propose restricting this set even further, only considering a path to be an ECN blackhole, if the connection handshake succeeds after ECN black-hole detection, i.e. without ECN marking. Linking https://phabricator.services.mozilla.com/D244507 here for the record.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants