-
Notifications
You must be signed in to change notification settings - Fork 52
Directly Build UstrMap For Properties #525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
For what it's worth, this was a performance decision because a lot of time is spent inserting into that hashmap: #467 On my machine the change in this PR actually degrades performance by quite a lot: Are you able to replicate the boost in performance? What are your system specs like? |
Excellent, I assumed it must be that way for performance reasons, but I still don't understand how it's faster to build a vec and then build a hashmap instead of just directly building a hashmap. Is is because of rehashing while allocating more capacity? I'm running the benchmarks on Linux using a 5950X like I mentioned on #529. I reran the benchmarks first on
Seems positive and reproducible. |
This change overall degrades performance on my M1 MacBook Air. Here is the deserialize Miner's Haven benchmark plotted (n = 1000, red is latest master, blue is this branch): I'm not entirely sure why there is such a significant difference, but I did some cursory profiling and noticed that these changes cause ~10% more L1 misses and ~12% more L2 misses when deserializing miners-haven.rbxl on my machine. So, one reason could be that using a scratch Vec for instance builder properties somehow improves cache locality over UstrMap (at least, on my machine and on this workload). I highly recommend resolving #529 before moving on with any performance work! The benchmarks are only as good as the data they can collect - if the data is polluted with e.g. CPU boosts then it's much, much harder to get a good signal |
I looked up how to set the cpu performance governor, and it can be done using
Which supposedly just sets it to always run at the maximum frequency. The default governer was called "powersave". Running benchmarks repeatedly on the Ustr uses precomputed hashes which I assume are the same for every hashmap. Could it be accidental quadratic behaviour when doing |
If there's no reason why InstanceBuilder.properties can't be directly assigned to Instance.properties, then let's do that. The same logic is applied to building the properties in rbx_xml. The "Miner's Haven/Deserialize" benchmark says throughput is increased by 30%.