Directly Build UstrMap For Properties #525

krakow10 · 2025-05-03T19:17:30Z

If there's no reason why InstanceBuilder.properties can't be directly assigned to Instance.properties, then let's do that. The same logic is applied to building the properties in rbx_xml. The "Miner's Haven/Deserialize" benchmark says throughput is increased by 30%.

…e.properties

Dekkonot · 2025-05-03T23:22:04Z

For what it's worth, this was a performance decision because a lot of time is spent inserting into that hashmap: #467

On my machine the change in this PR actually degrades performance by quite a lot:

Are you able to replicate the boost in performance? What are your system specs like?

krakow10 · 2025-05-04T01:09:14Z

Excellent, I assumed it must be that way for performance reasons, but I still don't understand how it's faster to build a vec and then build a hashmap instead of just directly building a hashmap. Is is because of rehashing while allocating more capacity? I'm running the benchmarks on Linux using a 5950X like I mentioned on #529.

I reran the benchmarks first on master and then on properties branch, here is the full output for the properties branch:

     Running benches/suite/main.rs (target/release/deps/suite-4220ebfc15de915e)
Gnuplot not found, using plotters backend
100 Folders/Serialize   time:   [20.673 µs 20.685 µs 20.699 µs]
                        thrpt:  [36.445 MiB/s 36.469 MiB/s 36.490 MiB/s]
                 change:
                        time:   [+0.2859% +0.3785% +0.4745%] (p = 0.00 < 0.05)
                        thrpt:  [-0.4723% -0.3771% -0.2850%]
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
100 Folders/Deserialize time:   [34.843 µs 34.905 µs 34.985 µs]
                        thrpt:  [20.771 MiB/s 20.819 MiB/s 20.857 MiB/s]
                 change:
                        time:   [-6.5392% -6.4015% -6.2569%] (p = 0.00 < 0.05)
                        thrpt:  [+6.6745% +6.8393% +6.9967%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

100 Deep Folders/Serialize
                        time:   [22.916 µs 22.935 µs 22.952 µs]
                        thrpt:  [32.866 MiB/s 32.891 MiB/s 32.918 MiB/s]
                 change:
                        time:   [-2.3817% -2.0812% -1.8326%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8668% +2.1254% +2.4398%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
100 Deep Folders/Deserialize
                        time:   [40.367 µs 40.395 µs 40.423 µs]
                        thrpt:  [18.001 MiB/s 18.013 MiB/s 18.026 MiB/s]
                 change:
                        time:   [-5.9860% -5.8948% -5.8109%] (p = 0.00 < 0.05)
                        thrpt:  [+6.1694% +6.2640% +6.3671%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

100 100-line ModuleScripts/Serialize
                        time:   [87.191 µs 87.308 µs 87.453 µs]
                        thrpt:  [42.181 MiB/s 42.251 MiB/s 42.307 MiB/s]
                 change:
                        time:   [-0.9348% -0.0077% +1.3494%] (p = 0.98 > 0.05)
                        thrpt:  [-1.3314% +0.0077% +0.9436%]
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) high mild
  16 (16.00%) high severe
100 100-line ModuleScripts/Deserialize
                        time:   [148.15 µs 148.28 µs 148.43 µs]
                        thrpt:  [24.576 MiB/s 24.601 MiB/s 24.622 MiB/s]
                 change:
                        time:   [-4.6385% -4.1208% -3.6891%] (p = 0.00 < 0.05)
                        thrpt:  [+3.8305% +4.2979% +4.8642%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) high mild
  4 (4.00%) high severe

1,000 Parts/Serialize   time:   [1.7785 ms 1.7821 ms 1.7863 ms]
                        thrpt:  [2.3432 MiB/s 2.3487 MiB/s 2.3535 MiB/s]
                 change:
                        time:   [+0.8976% +1.1602% +1.4339%] (p = 0.00 < 0.05)
                        thrpt:  [-1.4136% -1.1469% -0.8896%]
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
1,000 Parts/Deserialize time:   [1.9735 ms 1.9768 ms 1.9807 ms]
                        thrpt:  [2.0516 MiB/s 2.0556 MiB/s 2.0591 MiB/s]
                 change:
                        time:   [+6.9658% +7.2007% +7.4589%] (p = 0.00 < 0.05)
                        thrpt:  [-6.9412% -6.7170% -6.5122%]
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

Benchmarking Miner's Haven/Serialize: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 14.3s, or reduce sample count to 30.
Miner's Haven/Serialize time:   [142.46 ms 142.93 ms 143.44 ms]
                        thrpt:  [52.796 MiB/s 52.986 MiB/s 53.159 MiB/s]
                 change:
                        time:   [-9.0355% -8.5930% -8.1495%] (p = 0.00 < 0.05)
                        thrpt:  [+8.8726% +9.4008% +9.9330%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
Benchmarking Miner's Haven/Deserialize: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 18.0s, or reduce sample count to 20.
Miner's Haven/Deserialize
                        time:   [178.69 ms 179.64 ms 180.70 ms]
                        thrpt:  [41.957 MiB/s 42.205 MiB/s 42.428 MiB/s]
                 change:
                        time:   [-26.671% -25.882% -25.149%] (p = 0.00 < 0.05)
                        thrpt:  [+33.600% +34.920% +36.371%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

Seems positive and reproducible.

kennethloeffler · 2025-05-05T23:53:45Z

This change overall degrades performance on my M1 MacBook Air. Here is the deserialize Miner's Haven benchmark plotted (n = 1000, red is latest master, blue is this branch):

I'm not entirely sure why there is such a significant difference, but I did some cursory profiling and noticed that these changes cause ~10% more L1 misses and ~12% more L2 misses when deserializing miners-haven.rbxl on my machine. So, one reason could be that using a scratch Vec for instance builder properties somehow improves cache locality over UstrMap (at least, on my machine and on this workload).

I highly recommend resolving #529 before moving on with any performance work! The benchmarks are only as good as the data they can collect - if the data is polluted with e.g. CPU boosts then it's much, much harder to get a good signal

krakow10 · 2025-05-06T01:11:20Z

I looked up how to set the cpu performance governor, and it can be done using cpupower like so:

sudo cpupower frequency-set -g performance

Which supposedly just sets it to always run at the maximum frequency. The default governer was called "powersave".

Running benchmarks repeatedly on the properties branch has been consistent within 5% across 10 runs, but running 5 benchmarks on the master or entries branch both elicit the 70% variance issue, going by feeling I'd say it's fast 50% of the time and slow 50% of the time. The properties branch's consistent benchmark is 10% slower than master for the "fast" variant of the "1,000 Parts/Deserialize" benchmark.

Ustr uses precomputed hashes which I assume are the same for every hashmap. Could it be accidental quadratic behaviour when doing new_map = map.iter().collect();?

krakow10 added 3 commits May 3, 2025 12:07

rbx_dom_weak: use UstrMap for InstanceBuilder.properties

845a78b

rbx_dom_weak: directly insert InstanceBuilder.properties into Instanc…

c2f7b7f

…e.properties

rbx_xml: directly build UstrMap for Instance.properties

f1e9ccd

krakow10 mentioned this pull request May 13, 2025

Unreliable Benchmarks #529

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Directly Build UstrMap For Properties #525

Directly Build UstrMap For Properties #525

Uh oh!

krakow10 commented May 3, 2025 •

edited

Loading

Uh oh!

Dekkonot commented May 3, 2025

Uh oh!

krakow10 commented May 4, 2025 •

edited

Loading

Uh oh!

kennethloeffler commented May 5, 2025 •

edited

Loading

Uh oh!

krakow10 commented May 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Directly Build UstrMap For Properties #525

Are you sure you want to change the base?

Directly Build UstrMap For Properties #525

Uh oh!

Conversation

krakow10 commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dekkonot commented May 3, 2025

Uh oh!

krakow10 commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kennethloeffler commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krakow10 commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

krakow10 commented May 3, 2025 •

edited

Loading

krakow10 commented May 4, 2025 •

edited

Loading

kennethloeffler commented May 5, 2025 •

edited

Loading

krakow10 commented May 6, 2025 •

edited

Loading