fix: Add some crate features for performance #2477

larseggert · 2025-03-06T10:20:18Z

Let's see if they do.

Also, @mxinden, I was wondering why we went with a multi-threaded tokio client and server. I'm wondering if the thread-management overheads are worth it compared to using just the rt scheduler?

@mxinden

Let's see if they do. Also, @mxinden, I was wondering why we went with a multi-threaded `tokio` client and server. I'm wondering if the thread-management overheads are worth it compared to using just the `rt` scheduler?

codecov · 2025-03-06T10:30:14Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.43%. Comparing base (8b4a9c9) to head (c532c76).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2477      +/-   ##
==========================================
+ Coverage   95.41%   95.43%   +0.01%     
==========================================
  Files         115      115              
  Lines       36996    36996              
  Branches    36996    36996              
==========================================
+ Hits        35301    35306       +5     
+ Misses       1689     1686       -3     
+ Partials        6        4       -2

Components	Coverage Δ
neqo-common	`97.53% <ø> (+0.35%)`	⬆️
neqo-crypto	`90.44% <ø> (ø)`
neqo-http3	`94.50% <ø> (ø)`
neqo-qpack	`96.29% <ø> (ø)`
neqo-transport	`96.24% <ø> (ø)`
neqo-udp	`95.29% <ø> (ø)`

github-actions · 2025-03-06T10:46:11Z

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 9354a53.

neqo-latest as client

neqo-latest vs. aioquic: Z
neqo-latest vs. go-x-net: BP BA
neqo-latest vs. haproxy: BP BA
neqo-latest vs. kwik: run cancelled after 20 min
neqo-latest vs. lsquic: L1 C1
neqo-latest vs. msquic: ⚠️R Z A L1 C1 ⚠️C2
neqo-latest vs. mvfst: A L1 C1 BA
neqo-latest vs. nginx: BP BA
neqo-latest vs. ngtcp2: CM
neqo-latest vs. picoquic: A ⚠️L1
neqo-latest vs. quic-go: A
neqo-latest vs. quiche: BP BA
neqo-latest vs. s2n-quic: BP BA CM
neqo-latest vs. tquic: S BP BA
neqo-latest vs. xquic: A

neqo-latest as server

aioquic vs. neqo-latest: CM
go-x-net vs. neqo-latest: CM
kwik vs. neqo-latest: BP BA CM
lsquic vs. neqo-latest: CM
msquic vs. neqo-latest: Z U CM
mvfst vs. neqo-latest: Z A L1 C1 CM
openssl vs. neqo-latest: LR M CM
quic-go vs. neqo-latest: run cancelled after 20 min
quiche vs. neqo-latest: ⚠️C1 CM
quinn vs. neqo-latest: V2 CM
s2n-quic vs. neqo-latest: CM
tquic vs. neqo-latest: CM
xquic vs. neqo-latest: M CM

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2 BP BA
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. linuxquic: H DC LR C20 M S R Z 3 B U E A L1 L2 🚀C1 C2 6 V2 BP BA CM
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A L2 C2 6 V2 BP BA
neqo-latest vs. msquic: H DC LR C20 M S ⚠️R B U L2 ⚠️C2 6 V2 BP BA
neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6 BP
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA
neqo-latest vs. picoquic: H DC LR C20 M S R Z 3 B U E ⚠️L1 L2 C1 C2 6 V2 BP BA
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U L1 L2 C1 C2 6 BP BA
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 BP BA
neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. tquic: H DC LR C20 M R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L1 L2 C1 C2 6 BP BA

neqo-latest as server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C1 C2 6 V2 BP BA
chrome vs. neqo-latest: 3
go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6 BP BA
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
linuxquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
lsquic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6 V2 BP BA
msquic vs. neqo-latest: H DC LR C20 M S R B A L1 L2 C1 C2 6 V2 BP BA
mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6 BP BA
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
ngtcp2 vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
openssl vs. neqo-latest: H DC C20 S R 3 B A L2 C2 6 BP BA
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
quiche vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 ⚠️C1 C2 6 BP BA
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 BP BA
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6 BP BA
tquic vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6 BP BA
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6 BP BA

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E CM
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2 CM
neqo-latest vs. haproxy: E CM
neqo-latest vs. lsquic: CM
neqo-latest vs. msquic: 3 E CM
neqo-latest vs. mvfst: C20 S E V2 CM
neqo-latest vs. nginx: E V2 CM
neqo-latest vs. picoquic: CM
neqo-latest vs. quic-go: E V2 CM
neqo-latest vs. quiche: E V2 CM
neqo-latest vs. quinn: V2 CM
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. tquic: E V2 CM
neqo-latest vs. xquic: S E V2 CM

neqo-latest as server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2 BP BA CM
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
lsquic vs. neqo-latest: C20 Z U
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
openssl vs. neqo-latest: Z U E L1 C1 V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
tquic vs. neqo-latest: C20 U E V2
xquic vs. neqo-latest: E V2

github-actions · 2025-03-06T11:25:52Z

Benchmark results

Performance differences relative to 9354a53.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💔 Performance has regressed.

       time:   [726.77 ms 730.91 ms 735.12 ms]
       thrpt:  [136.03 MiB/s 136.82 MiB/s 137.59 MiB/s]
change:
       time:   [+1.3001% +2.1349% +2.9678%] (p = 0.00 < 0.05)
       thrpt:  [-2.8823% -2.0903% -1.2835%]
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.

       time:   [347.60 ms 349.24 ms 350.83 ms]
       thrpt:  [28.503 Kelem/s 28.634 Kelem/s 28.769 Kelem/s]
change:
       time:   [-0.5483% +0.1348% +0.8050%] (p = 0.69 > 0.05)
       thrpt:  [-0.7986% -0.1346% +0.5514%]

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.

       time:   [24.918 ms 25.093 ms 25.276 ms]
       thrpt:  [39.564  elem/s 39.852  elem/s 40.132  elem/s]
change:
       time:   [-0.9261% +0.0203% +0.9563%] (p = 0.97 > 0.05)
       thrpt:  [-0.9473% -0.0203% +0.9348%]

1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💚 Performance has improved.

       time:   [1.8193 s 1.8420 s 1.8669 s]
       thrpt:  [53.565 MiB/s 54.287 MiB/s 54.967 MiB/s]
change:
       time:   [-6.1021% -4.6359% -3.1475%] (p = 0.00 < 0.05)
       thrpt:  [+3.2498% +4.8613% +6.4987%]
Found 7 outliers among 100 measurements (7.00%)

5 (5.00%) high mild

2 (2.00%) high severe

decode 4096 bytes, mask ff: Change within noise threshold.

       time:   [12.091 µs 12.135 µs 12.186 µs]
       change: [+0.2052% +0.8190% +1.3927%] (p = 0.01 < 0.05)
Found 17 outliers among 100 measurements (17.00%)

1 (1.00%) low severe

3 (3.00%) low mild

1 (1.00%) high mild

12 (12.00%) high severe

decode 1048576 bytes, mask ff: 💔 Performance has regressed.

       time:   [3.1285 ms 3.1381 ms 3.1494 ms]
       change: [+5.9457% +6.3947% +6.7803%] (p = 0.00 < 0.05)
Found 10 outliers among 100 measurements (10.00%)

1 (1.00%) low mild

9 (9.00%) high severe

decode 4096 bytes, mask 7f: Change within noise threshold.

       time:   [20.176 µs 20.234 µs 20.297 µs]
       change: [+0.0833% +0.6802% +1.2624%] (p = 0.03 < 0.05)
Found 23 outliers among 100 measurements (23.00%)

3 (3.00%) low severe

2 (2.00%) low mild

4 (4.00%) high mild

14 (14.00%) high severe

decode 1048576 bytes, mask 7f: 💔 Performance has regressed.

       time:   [5.2493 ms 5.2620 ms 5.2753 ms]
       change: [+9.1723% +9.5764% +9.9807%] (p = 0.00 < 0.05)
Found 15 outliers among 100 measurements (15.00%)

15 (15.00%) high severe

decode 4096 bytes, mask 3f: 💔 Performance has regressed.

       time:   [7.0189 µs 7.0476 µs 7.0836 µs]
       change: [+10.951% +11.742% +12.875%] (p = 0.00 < 0.05)
Found 16 outliers among 100 measurements (16.00%)

3 (3.00%) low severe

2 (2.00%) low mild

2 (2.00%) high mild

9 (9.00%) high severe

decode 1048576 bytes, mask 3f: 💚 Performance has improved.

       time:   [1.7915 ms 1.7983 ms 1.8055 ms]
       change: [-16.985% -16.575% -16.150%] (p = 0.00 < 0.05)
Found 11 outliers among 100 measurements (11.00%)

1 (1.00%) low mild

2 (2.00%) high mild

8 (8.00%) high severe

1 streams of 1 bytes/multistream: 💔 Performance has regressed.

       time:   [73.609 µs 74.271 µs 75.368 µs]
       change: [+2.8166% +3.8819% +5.5308%] (p = 0.00 < 0.05)
Found 2 outliers among 100 measurements (2.00%)

1 (1.00%) high mild

1 (1.00%) high severe

1000 streams of 1 bytes/multistream: 💔 Performance has regressed.

       time:   [26.014 ms 26.051 ms 26.087 ms]
       change: [+2.5204% +2.7118% +2.9083%] (p = 0.00 < 0.05)

10000 streams of 1 bytes/multistream: Change within noise threshold.

       time:   [1.7156 s 1.7172 s 1.7188 s]
       change: [+0.8176% +0.9578% +1.0934%] (p = 0.00 < 0.05)
Found 15 outliers among 100 measurements (15.00%)

1 (1.00%) low severe

7 (7.00%) low mild

5 (5.00%) high mild

2 (2.00%) high severe

1 streams of 1000 bytes/multistream: 💔 Performance has regressed.

       time:   [76.209 µs 77.434 µs 79.001 µs]
       change: [+4.9857% +6.5620% +8.4877%] (p = 0.00 < 0.05)
Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high severe

100 streams of 1000 bytes/multistream: 💔 Performance has regressed.

       time:   [3.5038 ms 3.5105 ms 3.5177 ms]
       change: [+4.2404% +4.5382% +4.8690%] (p = 0.00 < 0.05)
Found 23 outliers among 100 measurements (23.00%)

23 (23.00%) high severe

1000 streams of 1000 bytes/multistream: Change within noise threshold.

       time:   [144.22 ms 144.30 ms 144.38 ms]
       change: [+0.6443% +0.7165% +0.7991%] (p = 0.00 < 0.05)
Found 4 outliers among 100 measurements (4.00%)

4 (4.00%) high mild

coalesce_acked_from_zero 1+1 entries: No change in performance detected.

       time:   [94.836 ns 95.147 ns 95.472 ns]
       change: [-0.7577% -0.1797% +0.4022%] (p = 0.55 > 0.05)
Found 11 outliers among 100 measurements (11.00%)

7 (7.00%) high mild

4 (4.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.

       time:   [112.59 ns 112.92 ns 113.29 ns]
       change: [-0.7826% -0.3833% -0.0220%] (p = 0.06 > 0.05)
Found 11 outliers among 100 measurements (11.00%)

1 (1.00%) low mild

10 (10.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.

       time:   [112.30 ns 112.78 ns 113.35 ns]
       change: [-0.7880% -0.2955% +0.2215%] (p = 0.25 > 0.05)
Found 11 outliers among 100 measurements (11.00%)

2 (2.00%) low severe

1 (1.00%) low mild

1 (1.00%) high mild

7 (7.00%) high severe

coalesce_acked_from_zero 1000+1 entries: Change within noise threshold.

       time:   [93.152 ns 93.639 ns 94.148 ns]
       change: [-2.6795% -1.5546% -0.4511%] (p = 0.01 < 0.05)
Found 5 outliers among 100 measurements (5.00%)

3 (3.00%) high mild

2 (2.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.

       time:   [116.20 ms 116.26 ms 116.32 ms]
       change: [-0.3799% -0.3049% -0.2332%] (p = 0.00 < 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) low mild

SentPackets::take_ranges: No change in performance detected.

       time:   [8.2624 µs 8.5294 µs 8.7698 µs]
       change: [-2.9035% +0.9469% +4.9587%] (p = 0.64 > 0.05)
Found 19 outliers among 100 measurements (19.00%)

9 (9.00%) low severe

8 (8.00%) low mild

1 (1.00%) high mild

1 (1.00%) high severe

transfer/pacing-false/varying-seeds: Change within noise threshold.

       time:   [35.769 ms 35.829 ms 35.889 ms]
       change: [+0.2099% +0.4616% +0.7330%] (p = 0.00 < 0.05)
Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high mild

transfer/pacing-true/varying-seeds: Change within noise threshold.

       time:   [36.114 ms 36.168 ms 36.220 ms]
       change: [-0.5396% -0.3204% -0.1051%] (p = 0.00 < 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) low mild

transfer/pacing-false/same-seed: Change within noise threshold.

       time:   [35.715 ms 35.783 ms 35.850 ms]
       change: [-0.6779% -0.4005% -0.1595%] (p = 0.00 < 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) low mild

transfer/pacing-true/same-seed: Change within noise threshold.

       time:   [36.235 ms 36.284 ms 36.332 ms]
       change: [-0.6700% -0.4931% -0.3147%] (p = 0.00 < 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) low mild

Client/server transfer results

Performance differences relative to 9354a53.

Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.

Client	Server	CC	Pacing	Mean ± σ	Min	Max	MiB/s ± σ	Δ `main`	Δ `main`
neqo	neqo	reno	on	421.1 ± 44.9	389.6	629.0	76.0 ± 0.7	7.1	1.7%
neqo	neqo	reno		449.2 ± 100.6	389.0	911.0	71.2 ± 0.3	-26.3	-5.5%
neqo	neqo	cubic	on	418.4 ± 36.5	388.7	550.0	76.5 ± 0.9	8.8	2.1%
neqo	neqo	cubic		413.8 ± 38.7	387.4	595.2	77.3 ± 0.8	9.3	2.3%
google	neqo	reno	on	759.0 ± 92.2	559.0	944.2	42.2 ± 0.3	-6.6	-0.9%
google	neqo	reno		771.9 ± 91.7	559.0	977.3	41.5 ± 0.3	2.3	0.3%
google	neqo	cubic	on	762.6 ± 91.1	546.3	970.5	42.0 ± 0.4	3.1	0.4%
google	neqo	cubic		759.4 ± 91.6	541.7	996.7	42.1 ± 0.3	0.5	0.1%
google	google			572.8 ± 41.4	548.4	774.8	55.9 ± 0.8	-0.1	-0.0%
neqo	msquic	reno	on	272.9 ± 33.2	243.9	409.6	117.2 ± 1.0	6.9	2.6%
neqo	msquic	reno		267.3 ± 24.0	246.4	366.3	119.7 ± 1.3	-0.4	-0.2%
neqo	msquic	cubic	on	264.7 ± 13.0	246.1	305.7	120.9 ± 2.5	-2.3	-0.8%
neqo	msquic	cubic		267.9 ± 29.1	241.8	410.8	119.4 ± 1.1	3.7	1.4%
msquic	msquic			172.6 ± 20.9	152.1	252.8	185.4 ± 1.5	-8.1	-4.5%

⬇️ Download logs

mxinden · 2025-03-06T14:37:59Z

Also, @mxinden, I was wondering why we went with a multi-threaded tokio client and server.

I chose multi-threaded as it is the de-facto default. No other reason.

I'm wondering if the thread-management overheads are worth it compared to using just the rt scheduler?

👍 worth experimenting. Intuitively, given that it is a single future only, there is no cross-thread communication and thus no significant overhead.

Signed-off-by: Lars Eggert <[email protected]>

fix: Add some crate features for performance

c532c76

Let's see if they do. Also, @mxinden, I was wondering why we went with a multi-threaded `tokio` client and server. I'm wondering if the thread-management overheads are worth it compared to using just the `rt` scheduler?

larseggert mentioned this pull request Mar 12, 2025

chore: Pin deps via Cargo.lock #2461

Merged

larseggert added 5 commits March 12, 2025 15:36

Add perf

d4e3f83

Not needed

911f129

Merge branch 'main' into fix-features

9d87f9e

Merge branch 'main' into fix-features

316c4d1

Signed-off-by: Lars Eggert <[email protected]>

Merge branch 'main' into fix-features

365bac6

Signed-off-by: Lars Eggert <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Add some crate features for performance #2477

fix: Add some crate features for performance #2477

larseggert commented Mar 6, 2025

codecov bot commented Mar 6, 2025 •

edited

Loading

github-actions bot commented Mar 6, 2025 •

edited

Loading

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

github-actions bot commented Mar 6, 2025 •

edited

Loading

mxinden commented Mar 6, 2025

fix: Add some crate features for performance #2477

Are you sure you want to change the base?

fix: Add some crate features for performance #2477

Conversation

larseggert commented Mar 6, 2025

codecov bot commented Mar 6, 2025 • edited Loading

Codecov Report

github-actions bot commented Mar 6, 2025 • edited Loading

Failed Interop Tests

neqo-latest as client

neqo-latest as server

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

github-actions bot commented Mar 6, 2025 • edited Loading

Benchmark results

Client/server transfer results

mxinden commented Mar 6, 2025

codecov bot commented Mar 6, 2025 •

edited

Loading

github-actions bot commented Mar 6, 2025 •

edited

Loading

github-actions bot commented Mar 6, 2025 •

edited

Loading