Optimization comparison: Vec vs array and for vs while

"Reslicing" like this is basically always the right thing to do before a bunch of indexing. See this recent post for another example of it: Mir optimization pass that implements auto-vectorization - #6 by scottmcm

And I don't know if it's still true, but at least 1½ years ago it was the case that indexing through Vec optimized worse than through a slice (We all know `iter` is faster than `loop`, but why? - #3 by scottmcm - The Rust Programming Language Forum), which just gives another reason that you should reslice it.

Note that you can opt into more focused optimization for them by changing your for p in STUFF { to STUFF.for_each(|p| {.

This loop is kinda weird, BTW.

Anything | 1 will be > 0, and you break at the bottom before i can get down to 0.

So I think it should be a loop?

But more importantly, I think you can reduce this report down to something more specific, by asking whether LLVM more directly whether it knows certain things.

For example, I'm extra confident about the while condition because if you ask LLVM to compile

pub fn demo_lower(n: usize, p: usize) -> bool {
    assert!(p >= 3);

    let i = (n / p - 1) | 1;
    i > 0
}

the output is clearly "oh yeah, that either asserts or returns true".

Whereas if you ask it about

pub fn demo_upper(n: usize, p: usize) -> bool {
    assert!(p >= 3);

    let i = (n / p - 1) | 1;
    i < (n + 1)
}

Then it obviously has no idea

define noundef zeroext i1 @_ZN7example10demo_upper17hed0063195a0d922bE(i64 %n, i64 %p) unnamed_addr #0 !dbg !13 {
  %_4 = icmp ult i64 %p, 3, !dbg !14
  br i1 %_4, label %bb1, label %bb3, !dbg !15

bb1:                                              ; preds = %start
  tail call void @_ZN4core9panicking5panic17hef60c19188bfa7ddE([0 x i8]* noalias noundef nonnull readonly align 1 bitcast (<{ [24 x i8] }>* @alloc14 to [0 x i8]*), i64 24, %"core::panic::location::Location"* noalias noundef readonly align 8 dereferenceable(24) bitcast (<{ i8*, [16 x i8] }>* @alloc16 to %"core::panic::location::Location"*)) #2, !dbg !15
  unreachable, !dbg !15

bb3:                                              ; preds = %start
  %_8 = udiv i64 %n, %p, !dbg !16
  %_7 = add nsw i64 %_8, -1, !dbg !17
  %i = or i64 %_7, 1, !dbg !17
  %_13 = add i64 %n, 1, !dbg !18
  %0 = icmp ult i64 %i, %_13, !dbg !20
  ret i1 %0, !dbg !21
}

And thus the problem isn't that it doesn't know the bounds, it's that regardless of whether it knows the vec/slice bound or not, the indexing logic you're using is too complicated for it to figure out.

So you might consider other ways to write that logic that it might understand better, or consider filing an LLVM issue about it if you think that LLVM should understand what you're doing.

(Repro for those two snippits: https://rust.godbolt.org/z/qYcnM1E5G)

4 Likes