-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Missing CSE caused by for loop on slice #119573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Maybe this is due to #111603 (comment) and llvm/llvm-project#74228 ? |
Hi! I've investigated a bit and I am sharing what I gathered so far. Disclaimer: I have mostly investigated he fact that the iterator based version is not marked The main difference between the index based version and the iterator one is that the iterator ones increment pointers directly. It is basically equivalent to the following C code (which LLVM doesn't manage to mark as size_t __attribute__ ((noinline)) add_iter(const bool *begin, const bool *end) {
size_t count = 0;
for (const bool *ptr = begin; ptr < end; ++ptr) {
if (*ptr) {
count += 1;
}
}
return count;
} The add_index
add_iter
|
I have investigated this further.
So it turns out that the CSE happens before functions are marked I have not figured why the CSE does not happen. However, I came up with a slightly different MRE that includes the CSE-able function being memchr. #[inline(never)]
fn has_zero_index(xs: &[u8]) -> bool {
for i in 0..xs.len() {
if xs[i] == 0 {
return true
}
}
false
}
#[inline(never)]
pub fn has_zero_memchr(xs: &[u8]) -> bool {
xs.contains(&0)
}
#[inline(never)]
pub fn has_zero_iter(xs: &[u8]) -> bool {
xs.iter().any(|&x| x == 0)
}
#[inline(never)]
fn has_zero_ptr(xs: &[u8]) -> bool {
let range = xs.as_ptr_range();
let mut start = range.start;
let end = range.end;
while start < end {
unsafe {
if *start == 0 {
return true
}
start = start.add(1);
}
}
false
}
pub fn foo_index(xs: &[u8]) {
// CSE
println!("a: {}", has_zero_index(xs));
println!("b: {}", has_zero_index(xs));
}
pub fn foo_memchr(xs: &[u8]) {
// No CSE
println!("a: {}", has_zero_memchr(xs));
println!("b: {}", has_zero_memchr(xs));
}
pub fn foo_iter(xs: &[u8]) {
// No CSE
println!("a: {}", has_zero_iter(xs));
println!("b: {}", has_zero_iter(xs));
}
pub fn foo_ptr(xs: &[u8]) {
// No CSE
println!("a: {}", has_zero_ptr(xs));
println!("b: {}", has_zero_ptr(xs));
} |
The reason should be the lack of |
It's a similar issue. Based on llvm/llvm-project#74228, the following changes can solve this issue. --- a/llvm/lib/Transforms/IPO/FunctionAttrs.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionAttrs.cpp
@@ -118,7 +118,7 @@ static void addLocAccess(MemoryEffects &ME, const MemoryLocation &Loc,
if (isNoModRef(MR))
return;
- const Value *UO = getUnderlyingObject(Loc.Ptr);
+ const Value *UO = getUnderlyingObjectLookThrough(Loc.Ptr);
assert(!isa<AllocaInst>(UO) &&
"Should have been handled by getModRefInfoMask()");
if (isa<Argument>(UO)) { Maybe @caojoshua will submit a new PR after this one. cc @caojoshua. |
…gressive` (#100102) Thanks to #99509, we can fix rust-lang/rust#119573 too.
Fixed by llvm/llvm-project#100102 at LLVM 20. |
…gressive` (#100102) Summary: Thanks to #99509, we can fix rust-lang/rust#119573 too. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251024
While examining a weird performance regression on Zulip, we noticed a peculiar thing caused by a simple for loop.
Consider this code:
These two functions do the same thing, but the first one uses slice indexing, while the other one uses a more natural for loop. One would expect that the second version will optimize better (or same).
However, when these functions are called multiple times, only the first function is eligible for CSE (common subexpression elimination):
Link to Godbolt.
When examining these two functions, I noticed that the indexed version uses
nocapture
for its argument, while the for loop version doesn't:which might be the reason of why CSE isn't applied.
It seems weird that the canonical way of iterating over a slice produces worse code (or at least function attributes) than manual indexing.
The text was updated successfully, but these errors were encountered: