You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cpu: AMD Ryzen 9 7950X 16-Core Processor
clang: 20
libc++: 19
options: O3, avx2 (haswell like, but avx512 available for avx512_crc32c target and checked in runtime for folly sizes >= 4097)
abseil and folly versions are close to current master
For some sizes, approximately (129..255] and [2048..4096], folly function is a little faster.
For small sizes abseil is significantly faster (because have better branches for small sizes)
For large sizes abseil is also significantly faster (don't know why, probably because CPU specific count of pcmul streams)
folly implementation is changed from 4097 (included), (avx2 folly slower than avx512)
Such patch fix it for this CPU, maybe even 4097 (aka 4096 + >= instead of >) better
In general diff isn't large, so it's more just curiosity how these values was chosen and why didn't you have more sizes around these branch points in crc32c microbenchmark.
Also abseil already have CPU specific implementation so maybe it make sense to do:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
cpu:
AMD Ryzen 9 7950X 16-Core Processor
clang: 20
libc++: 19
options: O3, avx2 (haswell like, but avx512 available for avx512_crc32c target and checked in runtime for folly sizes >= 4097)
abseil and folly versions are close to current master
For some sizes, approximately (129..255] and [2048..4096], folly function is a little faster.
For small sizes abseil is significantly faster (because have better branches for small sizes)
For large sizes abseil is also significantly faster (don't know why, probably because CPU specific count of pcmul streams)
folly implementation is changed from 4097 (included), (avx2 folly slower than avx512)
Such patch fix it for this CPU, maybe even 4097 (aka 4096 +

>=
instead of>
) betterIn general diff isn't large, so it's more just curiosity how these values was chosen and why didn't you have more sizes around these branch points in crc32c microbenchmark.
Also abseil already have CPU specific implementation so maybe it make sense to do:
Beta Was this translation helpful? Give feedback.
All reactions