Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add some crate features for performance #2477

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

larseggert
Copy link
Collaborator

Let's see if they do.

Also, @mxinden, I was wondering why we went with a multi-threaded tokio client and server. I'm wondering if the thread-management overheads are worth it compared to using just the rt scheduler?

Let's see if they do.

Also, @mxinden, I was wondering why we went with a multi-threaded `tokio` client and server. I'm wondering if the thread-management overheads are worth it compared to using just the `rt` scheduler?
Copy link

codecov bot commented Mar 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.43%. Comparing base (8b4a9c9) to head (c532c76).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2477      +/-   ##
==========================================
+ Coverage   95.41%   95.43%   +0.01%     
==========================================
  Files         115      115              
  Lines       36996    36996              
  Branches    36996    36996              
==========================================
+ Hits        35301    35306       +5     
+ Misses       1689     1686       -3     
+ Partials        6        4       -2     
Components Coverage Δ
neqo-common 97.53% <ø> (+0.35%) ⬆️
neqo-crypto 90.44% <ø> (ø)
neqo-http3 94.50% <ø> (ø)
neqo-qpack 96.29% <ø> (ø)
neqo-transport 96.24% <ø> (ø)
neqo-udp 95.29% <ø> (ø)

Copy link

github-actions bot commented Mar 6, 2025

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 9354a53.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

github-actions bot commented Mar 6, 2025

Benchmark results

Performance differences relative to 9354a53.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💔 Performance has regressed.
       time:   [726.77 ms 730.91 ms 735.12 ms]
       thrpt:  [136.03 MiB/s 136.82 MiB/s 137.59 MiB/s]
change:
       time:   [+1.3001% +2.1349% +2.9678%] (p = 0.00 < 0.05)
       thrpt:  [-2.8823% -2.0903% -1.2835%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.
       time:   [347.60 ms 349.24 ms 350.83 ms]
       thrpt:  [28.503 Kelem/s 28.634 Kelem/s 28.769 Kelem/s]
change:
       time:   [-0.5483% +0.1348% +0.8050%] (p = 0.69 > 0.05)
       thrpt:  [-0.7986% -0.1346% +0.5514%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [24.918 ms 25.093 ms 25.276 ms]
       thrpt:  [39.564  elem/s 39.852  elem/s 40.132  elem/s]
change:
       time:   [-0.9261% +0.0203% +0.9563%] (p = 0.97 > 0.05)
       thrpt:  [-0.9473% -0.0203% +0.9348%]
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💚 Performance has improved.
       time:   [1.8193 s 1.8420 s 1.8669 s]
       thrpt:  [53.565 MiB/s 54.287 MiB/s 54.967 MiB/s]
change:
       time:   [-6.1021% -4.6359% -3.1475%] (p = 0.00 < 0.05)
       thrpt:  [+3.2498% +4.8613% +6.4987%]

Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe

decode 4096 bytes, mask ff: Change within noise threshold.
       time:   [12.091 µs 12.135 µs 12.186 µs]
       change: [+0.2052% +0.8190% +1.3927%] (p = 0.01 < 0.05)

Found 17 outliers among 100 measurements (17.00%)
1 (1.00%) low severe
3 (3.00%) low mild
1 (1.00%) high mild
12 (12.00%) high severe

decode 1048576 bytes, mask ff: 💔 Performance has regressed.
       time:   [3.1285 ms 3.1381 ms 3.1494 ms]
       change: [+5.9457% +6.3947% +6.7803%] (p = 0.00 < 0.05)

Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
9 (9.00%) high severe

decode 4096 bytes, mask 7f: Change within noise threshold.
       time:   [20.176 µs 20.234 µs 20.297 µs]
       change: [+0.0833% +0.6802% +1.2624%] (p = 0.03 < 0.05)

Found 23 outliers among 100 measurements (23.00%)
3 (3.00%) low severe
2 (2.00%) low mild
4 (4.00%) high mild
14 (14.00%) high severe

decode 1048576 bytes, mask 7f: 💔 Performance has regressed.
       time:   [5.2493 ms 5.2620 ms 5.2753 ms]
       change: [+9.1723% +9.5764% +9.9807%] (p = 0.00 < 0.05)

Found 15 outliers among 100 measurements (15.00%)
15 (15.00%) high severe

decode 4096 bytes, mask 3f: 💔 Performance has regressed.
       time:   [7.0189 µs 7.0476 µs 7.0836 µs]
       change: [+10.951% +11.742% +12.875%] (p = 0.00 < 0.05)

Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) low severe
2 (2.00%) low mild
2 (2.00%) high mild
9 (9.00%) high severe

decode 1048576 bytes, mask 3f: 💚 Performance has improved.
       time:   [1.7915 ms 1.7983 ms 1.8055 ms]
       change: [-16.985% -16.575% -16.150%] (p = 0.00 < 0.05)

Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
2 (2.00%) high mild
8 (8.00%) high severe

1 streams of 1 bytes/multistream: 💔 Performance has regressed.
       time:   [73.609 µs 74.271 µs 75.368 µs]
       change: [+2.8166% +3.8819% +5.5308%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

1000 streams of 1 bytes/multistream: 💔 Performance has regressed.
       time:   [26.014 ms 26.051 ms 26.087 ms]
       change: [+2.5204% +2.7118% +2.9083%] (p = 0.00 < 0.05)
10000 streams of 1 bytes/multistream: Change within noise threshold.
       time:   [1.7156 s 1.7172 s 1.7188 s]
       change: [+0.8176% +0.9578% +1.0934%] (p = 0.00 < 0.05)

Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low severe
7 (7.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe

1 streams of 1000 bytes/multistream: 💔 Performance has regressed.
       time:   [76.209 µs 77.434 µs 79.001 µs]
       change: [+4.9857% +6.5620% +8.4877%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe

100 streams of 1000 bytes/multistream: 💔 Performance has regressed.
       time:   [3.5038 ms 3.5105 ms 3.5177 ms]
       change: [+4.2404% +4.5382% +4.8690%] (p = 0.00 < 0.05)

Found 23 outliers among 100 measurements (23.00%)
23 (23.00%) high severe

1000 streams of 1000 bytes/multistream: Change within noise threshold.
       time:   [144.22 ms 144.30 ms 144.38 ms]
       change: [+0.6443% +0.7165% +0.7991%] (p = 0.00 < 0.05)

Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [94.836 ns 95.147 ns 95.472 ns]
       change: [-0.7577% -0.1797% +0.4022%] (p = 0.55 > 0.05)

Found 11 outliers among 100 measurements (11.00%)
7 (7.00%) high mild
4 (4.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [112.59 ns 112.92 ns 113.29 ns]
       change: [-0.7826% -0.3833% -0.0220%] (p = 0.06 > 0.05)

Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
10 (10.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [112.30 ns 112.78 ns 113.35 ns]
       change: [-0.7880% -0.2955% +0.2215%] (p = 0.25 > 0.05)

Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
7 (7.00%) high severe

coalesce_acked_from_zero 1000+1 entries: Change within noise threshold.
       time:   [93.152 ns 93.639 ns 94.148 ns]
       change: [-2.6795% -1.5546% -0.4511%] (p = 0.01 < 0.05)

Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [116.20 ms 116.26 ms 116.32 ms]
       change: [-0.3799% -0.3049% -0.2332%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

SentPackets::take_ranges: No change in performance detected.
       time:   [8.2624 µs 8.5294 µs 8.7698 µs]
       change: [-2.9035% +0.9469% +4.9587%] (p = 0.64 > 0.05)

Found 19 outliers among 100 measurements (19.00%)
9 (9.00%) low severe
8 (8.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

transfer/pacing-false/varying-seeds: Change within noise threshold.
       time:   [35.769 ms 35.829 ms 35.889 ms]
       change: [+0.2099% +0.4616% +0.7330%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

transfer/pacing-true/varying-seeds: Change within noise threshold.
       time:   [36.114 ms 36.168 ms 36.220 ms]
       change: [-0.5396% -0.3204% -0.1051%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-false/same-seed: Change within noise threshold.
       time:   [35.715 ms 35.783 ms 35.850 ms]
       change: [-0.6779% -0.4005% -0.1595%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-true/same-seed: Change within noise threshold.
       time:   [36.235 ms 36.284 ms 36.332 ms]
       change: [-0.6700% -0.4931% -0.3147%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

Client/server transfer results

Performance differences relative to 9354a53.

Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.

Client Server CC Pacing Mean ± σ Min Max MiB/s ± σ Δ main Δ main
neqo neqo reno on 421.1 ± 44.9 389.6 629.0 76.0 ± 0.7 7.1 1.7%
neqo neqo reno 449.2 ± 100.6 389.0 911.0 71.2 ± 0.3 -26.3 -5.5%
neqo neqo cubic on 418.4 ± 36.5 388.7 550.0 76.5 ± 0.9 8.8 2.1%
neqo neqo cubic 413.8 ± 38.7 387.4 595.2 77.3 ± 0.8 9.3 2.3%
google neqo reno on 759.0 ± 92.2 559.0 944.2 42.2 ± 0.3 -6.6 -0.9%
google neqo reno 771.9 ± 91.7 559.0 977.3 41.5 ± 0.3 2.3 0.3%
google neqo cubic on 762.6 ± 91.1 546.3 970.5 42.0 ± 0.4 3.1 0.4%
google neqo cubic 759.4 ± 91.6 541.7 996.7 42.1 ± 0.3 0.5 0.1%
google google 572.8 ± 41.4 548.4 774.8 55.9 ± 0.8 -0.1 -0.0%
neqo msquic reno on 272.9 ± 33.2 243.9 409.6 117.2 ± 1.0 6.9 2.6%
neqo msquic reno 267.3 ± 24.0 246.4 366.3 119.7 ± 1.3 -0.4 -0.2%
neqo msquic cubic on 264.7 ± 13.0 246.1 305.7 120.9 ± 2.5 -2.3 -0.8%
neqo msquic cubic 267.9 ± 29.1 241.8 410.8 119.4 ± 1.1 3.7 1.4%
msquic msquic 172.6 ± 20.9 152.1 252.8 185.4 ± 1.5 -8.1 -4.5%

⬇️ Download logs

@mxinden
Copy link
Collaborator

mxinden commented Mar 6, 2025

Also, @mxinden, I was wondering why we went with a multi-threaded tokio client and server.

I chose multi-threaded as it is the de-facto default. No other reason.

I'm wondering if the thread-management overheads are worth it compared to using just the rt scheduler?

👍 worth experimenting. Intuitively, given that it is a single future only, there is no cross-thread communication and thus no significant overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants