-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add some crate features for performance #2477
base: main
Are you sure you want to change the base?
Conversation
Let's see if they do. Also, @mxinden, I was wondering why we went with a multi-threaded `tokio` client and server. I'm wondering if the thread-management overheads are worth it compared to using just the `rt` scheduler?
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2477 +/- ##
==========================================
+ Coverage 95.41% 95.43% +0.01%
==========================================
Files 115 115
Lines 36996 36996
Branches 36996 36996
==========================================
+ Hits 35301 35306 +5
+ Misses 1689 1686 -3
+ Partials 6 4 -2
|
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to 9354a53. neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
Benchmark resultsPerformance differences relative to 9354a53. 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💔 Performance has regressed.time: [726.77 ms 730.91 ms 735.12 ms] thrpt: [136.03 MiB/s 136.82 MiB/s 137.59 MiB/s] change: time: [+1.3001% +2.1349% +2.9678%] (p = 0.00 < 0.05) thrpt: [-2.8823% -2.0903% -1.2835%] 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.time: [347.60 ms 349.24 ms 350.83 ms] thrpt: [28.503 Kelem/s 28.634 Kelem/s 28.769 Kelem/s] change: time: [-0.5483% +0.1348% +0.8050%] (p = 0.69 > 0.05) thrpt: [-0.7986% -0.1346% +0.5514%] 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.time: [24.918 ms 25.093 ms 25.276 ms] thrpt: [39.564 elem/s 39.852 elem/s 40.132 elem/s] change: time: [-0.9261% +0.0203% +0.9563%] (p = 0.97 > 0.05) thrpt: [-0.9473% -0.0203% +0.9348%] 1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💚 Performance has improved.time: [1.8193 s 1.8420 s 1.8669 s] thrpt: [53.565 MiB/s 54.287 MiB/s 54.967 MiB/s] change: time: [-6.1021% -4.6359% -3.1475%] (p = 0.00 < 0.05) thrpt: [+3.2498% +4.8613% +6.4987%] decode 4096 bytes, mask ff: Change within noise threshold.time: [12.091 µs 12.135 µs 12.186 µs] change: [+0.2052% +0.8190% +1.3927%] (p = 0.01 < 0.05) decode 1048576 bytes, mask ff: 💔 Performance has regressed.time: [3.1285 ms 3.1381 ms 3.1494 ms] change: [+5.9457% +6.3947% +6.7803%] (p = 0.00 < 0.05) decode 4096 bytes, mask 7f: Change within noise threshold.time: [20.176 µs 20.234 µs 20.297 µs] change: [+0.0833% +0.6802% +1.2624%] (p = 0.03 < 0.05) decode 1048576 bytes, mask 7f: 💔 Performance has regressed.time: [5.2493 ms 5.2620 ms 5.2753 ms] change: [+9.1723% +9.5764% +9.9807%] (p = 0.00 < 0.05) decode 4096 bytes, mask 3f: 💔 Performance has regressed.time: [7.0189 µs 7.0476 µs 7.0836 µs] change: [+10.951% +11.742% +12.875%] (p = 0.00 < 0.05) decode 1048576 bytes, mask 3f: 💚 Performance has improved.time: [1.7915 ms 1.7983 ms 1.8055 ms] change: [-16.985% -16.575% -16.150%] (p = 0.00 < 0.05) 1 streams of 1 bytes/multistream: 💔 Performance has regressed.time: [73.609 µs 74.271 µs 75.368 µs] change: [+2.8166% +3.8819% +5.5308%] (p = 0.00 < 0.05) 1000 streams of 1 bytes/multistream: 💔 Performance has regressed.time: [26.014 ms 26.051 ms 26.087 ms] change: [+2.5204% +2.7118% +2.9083%] (p = 0.00 < 0.05) 10000 streams of 1 bytes/multistream: Change within noise threshold.time: [1.7156 s 1.7172 s 1.7188 s] change: [+0.8176% +0.9578% +1.0934%] (p = 0.00 < 0.05) 1 streams of 1000 bytes/multistream: 💔 Performance has regressed.time: [76.209 µs 77.434 µs 79.001 µs] change: [+4.9857% +6.5620% +8.4877%] (p = 0.00 < 0.05) 100 streams of 1000 bytes/multistream: 💔 Performance has regressed.time: [3.5038 ms 3.5105 ms 3.5177 ms] change: [+4.2404% +4.5382% +4.8690%] (p = 0.00 < 0.05) 1000 streams of 1000 bytes/multistream: Change within noise threshold.time: [144.22 ms 144.30 ms 144.38 ms] change: [+0.6443% +0.7165% +0.7991%] (p = 0.00 < 0.05) coalesce_acked_from_zero 1+1 entries: No change in performance detected.time: [94.836 ns 95.147 ns 95.472 ns] change: [-0.7577% -0.1797% +0.4022%] (p = 0.55 > 0.05) coalesce_acked_from_zero 3+1 entries: No change in performance detected.time: [112.59 ns 112.92 ns 113.29 ns] change: [-0.7826% -0.3833% -0.0220%] (p = 0.06 > 0.05) coalesce_acked_from_zero 10+1 entries: No change in performance detected.time: [112.30 ns 112.78 ns 113.35 ns] change: [-0.7880% -0.2955% +0.2215%] (p = 0.25 > 0.05) coalesce_acked_from_zero 1000+1 entries: Change within noise threshold.time: [93.152 ns 93.639 ns 94.148 ns] change: [-2.6795% -1.5546% -0.4511%] (p = 0.01 < 0.05) RxStreamOrderer::inbound_frame(): Change within noise threshold.time: [116.20 ms 116.26 ms 116.32 ms] change: [-0.3799% -0.3049% -0.2332%] (p = 0.00 < 0.05) SentPackets::take_ranges: No change in performance detected.time: [8.2624 µs 8.5294 µs 8.7698 µs] change: [-2.9035% +0.9469% +4.9587%] (p = 0.64 > 0.05) transfer/pacing-false/varying-seeds: Change within noise threshold.time: [35.769 ms 35.829 ms 35.889 ms] change: [+0.2099% +0.4616% +0.7330%] (p = 0.00 < 0.05) transfer/pacing-true/varying-seeds: Change within noise threshold.time: [36.114 ms 36.168 ms 36.220 ms] change: [-0.5396% -0.3204% -0.1051%] (p = 0.00 < 0.05) transfer/pacing-false/same-seed: Change within noise threshold.time: [35.715 ms 35.783 ms 35.850 ms] change: [-0.6779% -0.4005% -0.1595%] (p = 0.00 < 0.05) transfer/pacing-true/same-seed: Change within noise threshold.time: [36.235 ms 36.284 ms 36.332 ms] change: [-0.6700% -0.4931% -0.3147%] (p = 0.00 < 0.05) Client/server transfer resultsPerformance differences relative to 9354a53. Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.
|
I chose multi-threaded as it is the de-facto default. No other reason.
👍 worth experimenting. Intuitively, given that it is a single future only, there is no cross-thread communication and thus no significant overhead. |
Signed-off-by: Lars Eggert <[email protected]>
Signed-off-by: Lars Eggert <[email protected]>
Let's see if they do.
Also, @mxinden, I was wondering why we went with a multi-threaded
tokio
client and server. I'm wondering if the thread-management overheads are worth it compared to using just thert
scheduler?