Missed optimization opportunity for bounds checks

tczajka · January 4, 2022, 11:11am

This code

fn sum(a: &[i32]) -> i32 {
    a[0] + a[1] + a[2]
}

is compiled to something with 3 bounds checks, like this:

fn sum(a: &[i32]) -> i32 {
    if a.len() == 0 {
        panic!("0 out of range");
    } else if a.len() == 1 {
        panic!("1 out of range");
    } else if a.len() == 2 {
        panic!("2 out of range");
    } else {
        unsafe { a.get_unchecked(0) + a.get_unchecked(1) + a.get_unchecked(2) }
    }
}

However, if LLVM can see that the first 3 cases are unlikely because they go to panic code, why doesn't it optimize it to something with only 1 bounds check on the hot path, such as:

fn sum(a: &[i32]) -> i32 {
    if a.len() < 3 {
        if a.len() == 0 {
            panic!("0 out of range");
        } else if a.len() == 1 {
            panic!("1 out of range");
        } else {
            panic!("2 out of range");
        }
    } else {
        unsafe { a.get_unchecked(0) + a.get_unchecked(1) + a.get_unchecked(2) }
    }
}

Neutron3529 · January 4, 2022, 11:46am

that would do the thing you want to do

tczajka · January 4, 2022, 11:50am

I understand it's possible for programmers to optimize the code manually, but that's not the point. The point is that the compiler could optimize the code as written, by itself, and is not doing it for some reason. Question is, why is it not doing it, and whether it can be fixed.

mathstuf · January 4, 2022, 12:31pm

The "as-if" rule. The compiler sees this:

(a[0] + a[1]) + a[2]

If a[0] + a[1] overflows, there's a panic in there before it gets to the [2] indexing. Your proposed optimization changes the panic that arises in that case (for non-release builds at least). A release build could probably "see" more.

tczajka · January 4, 2022, 12:34pm

In my original post I used the release mode, see the linked compiler output. There is no addition overflow check in the assembly.

SadiinsoSnowfall · January 4, 2022, 1:16pm

AFAIK this is only a problem if "touching" the a[2] momory address result in a invalid access or has side effects (and this might be why LLVM doesn't completly remove all but one bound checks), but, correct me if I'm wrong, this should not be the case in (safe) Rust. Nevermind, I misunderstood the problem.

elidupree · January 4, 2022, 2:54pm

Previous discussion on this topic: More musings about slices and arrays

kornel · January 4, 2022, 5:12pm

I think the existing compiler rules for side effects don't cover this case properly.

panic is still a side effect, so it can't be simply removed. It can't be freely reordered either, because you can't end up with executing the read before the panic. It can't freely moved/hoisted earlier either, because you don't want a panic from a[x] escape something like if x < len { a[x] }.

So it's a tricky case where the compiler would probably have to explicitly understand the concept of bounds checks, and track relationships between them to properly reorder them to remove redundancy.

tczajka · January 4, 2022, 5:24pm

I don't think this is it. What the code actually compiles to, and what I proposed it should compile to, actually do all the same reads and panics in the same order, and neither ever attemps to read anything out of bounds.

SlightlyOutOfPhase · January 4, 2022, 5:44pm

I think it's certainly important to keep in mind that the "standalone" codegen for functions is nearly always entirely unlike the codegen seen in contexts where they're actually being called.

See here, for example. rustc optimizes both the valid and invalid calls to sum out entirely, instead just immediately printing the literal number 3 and then immediately panicking.

tczajka · January 4, 2022, 5:48pm

Obviously when a.len() is a compile time constant then all the ifs will get constant-folded. The issue is when the length is not known at compile time.

SlightlyOutOfPhase · January 4, 2022, 5:51pm

It's still likely to not look exactly as it does in your original link. You'd have to examine various realistic uses of it to get an idea of the "average" contexual codegen quality.

scottmcm · January 5, 2022, 5:15pm

I would suggest opening an issue on LLVM -- their issues are on github, now, so it's much easier than it was before.

Rust's panicking functions are marked cold, so this transformation should probably happen in general in LLVM for all things that lead to cold paths with only side-effect-free things between them -- not just indexing and panics.

(The panicking things being marked cold is why LLVM puts the panicking paths at the end of the assembly for the function, for example.)

tczajka · January 5, 2022, 6:45pm

Opened an issue on LLVM .

system · April 5, 2022, 6:46pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Possible optimization for boundary check when accessing without side effects in-between? compiler	1	668	August 6, 2020
Missing optimizations with slices? compiler	7	789	March 25, 2019
Zipping slices sucks libs	12	2374	March 25, 2019
Array bound tests	2	610	August 25, 2020
Array bound tests 2 language design	4	863	September 6, 2020

Missed optimization opportunity for bounds checks

Related topics