Indexing into a slice with a known length

mgeisler · January 11, 2023, 9:45am

Hi all,

While reviewing Adds speaker notes to Compound Types section by rastringer · Pull Request #133 · google/comprehensive-rust · GitHub, I wanted to reproduce an optimization I believe Rust normally supports.

That is, I wanted to show that adding an assert!(n + 20 < a.len()) will remove the bounds checks in a function like this:

fn foo(n: usize, a: &[i32]) -> i32 {
    let mut sum = 0;
    for i in 0..20 {
        sum += a[n + i];
    }
    sum
}

However, when trying to actually reproduce this on the Compiler Explorer, I don't see the diff I expect. I was hoping to see no comparisons in the function body after the assembly.

Now, I'm not very well-versed in assembly, but I believe I see a per-indexing comparison here:

example::foo:
        push    rax
        mov     rax, rdi
        add     rax, 20  ; <-- the assertion
        jb      .LBB0_3
        cmp     rax, rdx
        jae     .LBB0_2
        cmp     rdi, rdx
        jae     .LBB0_6
        lea     rcx, [rdi + 1]
        cmp     rcx, rdx  ; <-- what is this?
        jae     .LBB0_7
        mov     eax, dword ptr [rsi + 4*rdi]
        add     eax, dword ptr [rsi + 4*rdi + 4]
        jo      .LBB0_47
        lea     rcx, [rdi + 2]
        cmp     rcx, rdx ; <-- what is this?
        jae     .LBB0_7
        add     eax, dword ptr [rsi + 4*rdi + 8]
        jo      .LBB0_47
        lea     rcx, [rdi + 3]
        cmp     rcx, rdx ; <-- what is this?
        jae     .LBB0_7
        add     eax, dword ptr [rsi + 4*rdi + 12]
        jo      .LBB0_47
        lea     rcx, [rdi + 4]

Am I reading the assembly wrong? I turned on -C opt-level=3 and -C overflow-checks, but I still see the comparisons. What am I missing here?

SkiFire13 · January 11, 2023, 10:20am

AFAIK this is a problem in LLVM. I don't know what kind of abstract domain it uses to do optimizations, but it probably can't express relations between 3 variables (which it needs to see that n + i < a.len()`).

A way to "fix" this is by removing the need for the third variable (n), resulting in a relation between 2 variables which LLVM seems to be able to handle:

pub fn foo(n: usize, a: &[i32]) -> i32 {
    let a = &a[n..];
    assert!(20 < a.len());
    let mut sum = 0;
    for i in 0..20 {
        sum += a[i];
    }
    sum
}

Here I do this by considering the slice after the first n elements of a. This updates its length and thus the two variables n and a.len() are "condensed" into one, the new a.len(). See how it optimizes on Compiler Explorer

steffahn · January 11, 2023, 10:20am

Soo… the situation is quite subtle and complex to analyze for the compiler AFAICT.

My best guess is that LLVM doesn't get the information or can't sufficiently exploit the information that a.len() has to be quite small. Since a is a valid slice, it can at most be usize::MAX / 4 long… maybe even / 8 if we consider that allocations can be at most isize::MAX size IIRC.

If it could reason this way, it could reason that

either n cannot overflow because it’s small enough
or the first access, a[n+0], will fail

thus allowing to eliminate all checks for subsequent accesses, because if n is small enough then a + 20 < a.len() proves that all accesses succeed.

Let’s focus in the case without -C overflow-checks, I haven’t thought too deeply about the case with them enabled yet. If we consider a to be as long as usize::MAX to be a possibility, then the compiler behavior would be simply the only option. The value n could be arbitrarily large, and n+20 would overflow to a small number, the assert!(n+20 < a.len()) is worthless. Then any step in the loop could be the first out-of-bounds access.

Instead of trying to analyze further cases, I can present two asserts that do turn out to be sufficient.

First option: assert!(n < n + 20) which asserts that none of the n+i can overflow n; second option: something like assert!(n < (isize::MAX as usize)), more explicitly calling out the fact that a can only be of limited size, and too large values of n will thus lead to failure anyways. Here’s a third spelling of the first assertion in the style of the second: assert!(n < usize::MAX - 20), and this one is sufficient, too.

By the way, if your goal is to have the asserts not to be any further restictions, you have an off-by-one error. n + 19 < a.len() is sufficient (and similarly, in the context of my above suggested solution,n < n+19 or n < usize::MAX - 19 would be sufficient).

As a side-note: As far as I understand, optimizing code like this better / more easily is one of the rationales why making integer overflow UB can sometimes be useful. If information like n < n + 20 wouldn’t need to be asserted, the optimizer has an easier job.

the8472 · January 11, 2023, 10:35am

It can also be turned into type-level information by using split_array_ref. Compiler Explorer

afetisov · January 11, 2023, 11:09am

The subslicing trick shown by @SkiFire13 is indeed the recommended way to avoid bounds checks. I'd go even further and avoid manual indexing altogether. Godbolt:

pub fn foo(n: usize, a: &[i32]) -> i32 {
    let xs = &a[n..n + 20];
    let mut sum = 0;
    for x in xs {
        sum += x;
    }
    sum
}

As you can see, the compiler does slice bound checks at the start of the function, and then unrolls the loop. The only jumps left are addition overflow checks.

P.S.: Since this is not a question about changes to Rust, this is more suited on URLO.

mgeisler · January 11, 2023, 2:17pm

Thank you very much, that is a very nice way to do it! I'll add that to the speaker notes for my students.

Nice, thank you for finding those! I had not thought of adding extra constraints like this.

That's my understanding too: the optimizer can assume that it never happens.

In this case, my mental model is that -C overflow-checks=y, inserts implicit asserts around every arithmetic operation. So when I write assert!(n + 20 < a.len());, I was indirectly hoping that the compiler would reason that n + 20 doesn't overflow (your assert!(n < n + 20)).

Does the optimization perhaps kick in at an unfortunate stage of the compiler? Or are the overflow checks perhaps implemented via a completely different technique so that they don't flow into optimizations in this way?

Sorry about that — I considered posting it there first, but then I figured that it was okay here since it's a question about the compiler and its optimizations I see this forum for being about "internal" questions like that.

steffahn · January 11, 2023, 4:03pm

Sounds reasonable. Made me play around with various versions of using .checked_add … .unwrap() (with -C overflow-checks off).

Apparently

pub fn foo(n: usize, a: &[i32]) -> i32 {
    n.checked_add(20).unwrap();
    assert!(n + 20 < a.len());
    // assert!(n.checked_add(20).unwrap() < a.len());
    let mut sum = 0_i32;
    for i in 0..20 {
        sum = sum + a[n + i];
    }
    sum
}

optimizes well while

pub fn foo(n: usize, a: &[i32]) -> i32 {
    n.checked_add(20).unwrap();
    // assert!(n + 20 < a.len());
    assert!(n.checked_add(20).unwrap() < a.len());
    let mut sum = 0_i32;
    for i in 0..20 {
        sum = sum + a[n + i];
    }
    sum
}

doesn’t… ? At this point, I can only imagine that LLVM is simply overwhelmed by too much branching with all the checks; or something like that; really IDK

I, too, feel like it’s not totally out of place here. IMO, either place seems acceptable.

newpavlov · January 11, 2023, 4:36pm

IIUC when LLVM gets &[u32] it only sees *const u32 and usize, without any additional semantic information about the length. I think that modifying the len method to something like:

#[inline]
pub const fn len(&self) -> usize {
    let len = ptr::metadata(self);
    let size = core::mem::size_of::<T>();
    // on most arches we can use `isize::MAX as usize`
    // instead of `usize::MAX`, but IIRC not everywhere
    if size != 0 && len > usize::MAX / size {
        unsafe { core::hint::unreachable_unchecked() }
    }
    len
}

Should give LLVM the necessary information to help with optimizations in some cases, but not in this one. As you correctly wrote, the main issue here is with the n + 20 part and its potential overflow.

UPD: I've opened an issue for the proposed code change.

scottmcm · January 11, 2023, 5:06pm

I suggest phrasing this in terms of what I call "reslicing", like a[n..][..20].

Some other posts about this:

Rust auto-vectorisation difference? - #2 by scottmcm - The Rust Programming Language Forum
Understanding Rusts Auto-Vectorization and Methods for speed increase - #5 by scottmcm - help - The Rust Programming Language Forum

system · April 11, 2023, 5:06pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing syntax borrow checking should work as with functions language design	8	949	July 8, 2021
Array bound tests 2 language design	4	863	September 6, 2020
Add bound check to get_unchecked for enabled debug assertions? language design	9	1309	March 25, 2019
Bounds checking at initialization ideas (deprecated)	10	3720	March 25, 2019
Is std::ops::Index just a historical artefact? language design	14	1946	March 25, 2019

Indexing into a slice with a known length

Related topics