Skip to content

compare with jetscii aarch64 simd #178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Dr-Emann
Copy link

The new jetscii aarch64 algorithm supports an arbitrary set of bytes (though currently limited to 16 to match the existing limitation of the x86 implementation).

It seems to be pretty competitive with memchr3, being a bit faster for smaller haystacks, or when iterating over more common bytes. I think this is probably largely because iterating uses a 64bit bitset of already identified as matching positions, rather than restarting the search every time, and it's able to process 64 bytes at a time without having to do any fixups in case of matches.

I'd like to improve it by using aligned loads like memchr does (using a possibly unaligned load at the start + end)

benchmark                        rust/jetscii/memchr3  rust/memchr/memchr3  rust/memchr/memchr3/fallback  rust/memchr/memchr3/naive
---------                        --------------------  -------------------  ----------------------------  -------------------------
memchr/sherlock/common/huge3     2.1 GB/s (1.00x)      659.4 MB/s (3.23x)   346.0 MB/s (6.16x)            567.4 MB/s (3.76x)
memchr/sherlock/common/small3    7.5 GB/s (1.00x)      759.3 MB/s (10.05x)  1518.6 MB/s (5.02x)           1688.6 MB/s (4.52x)
memchr/sherlock/never/huge3      21.7 GB/s (1.36x)     29.4 GB/s (1.00x)    8.2 GB/s (3.58x)              1795.7 MB/s (16.78x)
memchr/sherlock/never/small3     14.7 GB/s (1.02x)     15.1 GB/s (1.00x)    7.5 GB/s (2.02x)              1688.6 MB/s (9.15x)
memchr/sherlock/never/tiny3      64.3 GB/s (1.00x)     64.3 GB/s (1.00x)    64.3 GB/s (1.00x)             1566.8 MB/s (42.00x)
memchr/sherlock/never/empty3     1.00ns (1.00x)        1.00ns (1.00x)       1.00ns (1.00x)                1.00ns (1.00x)
memchr/sherlock/rare/huge3       20.6 GB/s (1.08x)     22.2 GB/s (1.00x)    7.5 GB/s (2.95x)              1770.3 MB/s (12.86x)
memchr/sherlock/rare/small3      14.7 GB/s (1.02x)     15.1 GB/s (1.00x)    7.5 GB/s (2.02x)              1522.2 MB/s (10.15x)
memchr/sherlock/rare/tiny3       64.3 GB/s (1.00x)     64.3 GB/s (1.00x)    64.3 GB/s (1.00x)             1566.8 MB/s (42.00x)
memchr/sherlock/uncommon/huge3   5.9 GB/s (1.00x)      1812.7 MB/s (3.34x)  1593.0 MB/s (3.80x)           1291.7 MB/s (4.69x)
memchr/sherlock/uncommon/small3  14.7 GB/s (1.00x)     3.7 GB/s (3.98x)     3.7 GB/s (3.98x)              1688.6 MB/s (8.93x)
memchr/sherlock/uncommon/tiny3   64.3 GB/s (1.00x)     792.8 MB/s (83.00x)  1566.8 MB/s (42.00x)          1566.8 MB/s (42.00x)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant