forked from BurntSushi/memchr
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit 65e51a8
committed
all: improve perf of memchr fallback (v2)
Resubmit of PR BurntSushi#151.
That PR was reverted because it broke big endian implementation and
CI did not catch it (see the revert PR BurntSushi#153 for details).
Andrew, thank you for new test cases which made it easier to fix the
issue.
The fix is:
```
--- a/src/arch/all/memchr.rs
+++ b/src/arch/all/memchr.rs
@@ -1019,7 +1019,7 @@ fn find_zero_in_chunk(x: usize) -> Option<usize> {
if cfg!(target_endian = "little") {
lowest_zero_byte(x)
} else {
- highest_zero_byte(x)
+ Some(USIZE_BYTES - 1 - highest_zero_byte(x)?)
}
}
@@ -1028,7 +1028,7 @@ fn rfind_zero_in_chunk(x: usize) -> Option<usize> {
if cfg!(target_endian = "little") {
highest_zero_byte(x)
} else {
- lowest_zero_byte(x)
+ Some(USIZE_BYTES - 1 - lowest_zero_byte(x)?)
}
}
```
Original description:
Current generic ("all") implementation checks that a chunk (`usize`)
contains a zero byte, and if it is, iterates over bytes of this
chunk to find the index of zero byte. Instead, we can use more bit
operations to find the index without loops.
Context: we use `memchr`, but many of our strings are short.
Currently SIMD-optimized `memchr` processes bytes one by one when
the string length is shorter than SIMD register. I suspect it can
be made faster if we take `usize` bytes a chunk which does not fit
into SIMD register and process it with such utility, similarly to
how AVX2 implementation falls back to SSE2. So I looked at generic
implementation to reuse it in SIMD-optimized version, but there
were none. So here is it.1 parent 7fccf70 commit 65e51a8Copy full SHA for 65e51a8
File tree
Expand file treeCollapse file tree
1 file changed
+291
-56
lines changedFilter options
- src/arch/all
Expand file treeCollapse file tree
1 file changed
+291
-56
lines changed
0 commit comments