MIR: Optimize for-loop on integer ranges

Nugine · November 11, 2022, 1:18pm

https://rust.godbolt.org/z/195ee687T

pub fn test_v1(n: usize) -> usize {
    let mut s = 0;
    for i in 1..n {
        s += i;
    }
    s
}

pub fn test_v2(n: usize) -> usize {
    let mut s = 0;
    let mut i = 1;
    while i < n {
        s += i;
        i += 1;
    }
    s
}

test_v1 and test_v2 are equivalent. But test_v1 generates tons of MIR when fully inlined.

What if rustc can transform test_v1 to test_v2 when optimizing MIR? I believe it will give some perf wins on compiletime and may generate better machine code for more complicated cases.

The unoptimized for-loop has caused an issue for SIMD code. LLVM fails to optimize it.

https://github.com/rust-lang/rust/issues/102220

Neutron3529 · November 11, 2022, 1:58pm

Excellent idea.

But it seems hard for rust to achieve it.

IIRC, for logically call the next function for Iterators, and Rust have less knowledge with iterators.

i.e., if you wrote:

let mut a:i32;
let mut b:i32=0i32;
for i in 0..10 {
    b+=i*i;
    a=b;
}
dbg!(a);

the program won't compile since Rust could not inference whether a is initialized:

4 | for i in 0..10 {
  |          ---- if the `for` loop runs 0 times, `a` is not initialized

It might be hard to do such optimize.

Nugine · November 11, 2022, 2:09pm

I think it would be easier because:

Rust has built-in range syntax.
rustc knows whether the range is Range<{integer}>.
rustc has MIR optimizations which can transform special cases like slice::len.

Neutron3529 · November 11, 2022, 2:44pm

maybe we could talking about another Trait, FixedSizeIterator<T,Idx>

where FixedSizeIterator::next(&mut self) returns T, and const fn FixedSizeIterator::next(&self) return Idx, where Idx is either u8,u16,u32,u64 or u128

After that, when we wrote Range::intoIter, we would got a FixedSizeIterator

But actually it is an API-break.

for i in 0u8..{
//triggers a overflow panic! when we try to yield its 256-th element
}
for i in FixedSizeIterator(0u8..){
// should not panic, since we should got element count of u8, thus up to 255.
}

Maybe a discussion thread is needed, since we are changing something.

afetisov · November 11, 2022, 2:56pm

Wow, there is a lot of garbage in the MIR of test_v1! Variables created and instantly moved out of, references created and instantly reborrowed.

_8 = &mut _5;                    // scope 2 at /app/example.rs:3:14: 3:18
// _7 is not used anywhere else
_7 = &mut (*_8);                 // scope 2 at /app/example.rs:3:14: 3:18
StorageLive(_12);                // scope 5 at /rustc/57f097ea25f2c05f424fc9b9dc50dbd6d399845c/library/core/src/iter/range.rs:711:9: 711:25
_12 = &mut (*_7);                // scope 5 at /rustc/57f097ea25f2c05f424fc9b9dc50dbd6d399845c/library/core/src/iter/range.rs:711:9: 711:25

StorageLive(_11);                // scope 3 at /app/example.rs:4:14: 4:15
// _11 is otherwise unused
_11 = _10;                       // scope 3 at /app/example.rs:4:14: 4:15
_0 = Add(_0, move _11);          // scope 3 at /app/example.rs:4:9: 4:15
StorageDead(_11);                // scope 3 at /app/example.rs:4:14: 4:15

I don't know whether the MIR for test_v1 can be made as succinct as for test_v2, but there is a ton of low-hanging fruits here. Simple elimination of redundant variables should decrease MIR size in half if not more. There is no reason to do lots of reborrows, or to store a range as a tuple and keep destructing it. I wonder if there is some extra flag to enable those optimizations on current Rust?

bjorn3 · November 11, 2022, 4:29pm

While in this case it is pretty pointless, in case of calls copies can be necessary for correctness. rustc_mir_build doesn't know when it is necessary and when not (doing so requires a non-local analysis), so it unconditionally inserts the copies just in case. We also don't yet know the exact rules. In any case you might be interested in following

github.com/rust-lang/rust

Fix Dest Prop

rust-lang:master ← JakobDegen:dest-prop

opened 08:22PM - 26 Apr 22 UTC

JakobDegen

+1039 -1224

Closes #82678, #79191 . This was not originally a total re-write of the pass …but is has gradually turned into one. Notable changes: 1. Significant improvements to documentation all around. The top of the file has been extended with a more precise argument for soundness. The code should be fairly readable, and I've done my best to add useful comments wherever possible. I would very much like for the bus factor to not be one on this code. 3. Improved handling of conflicts that are not visible in normal dataflow. This was the cause of #79191. Handling this correctly requires us to make decision about the semantics and specifically evaluation order of basically all MIR constructs (see specifically #68364 #71117. The way this is implemented is based on my preferred resolution to these questions around the semantics of assignment statements. 4. Some re-architecting to improve performance. More details below. 5. Possible future improvements to this optimization are documented, and the code is written with the needs of those improvements in mind. The hope is that adding support for more precise analyses will not require a full re-write of this opt, but just localized changes. ### Regarding Performance The previous approach had some performance issues; letting `l` be the number of locals and `s` be the number of statements/terminators, the runtime of the pass was `O(l^2 * s)`, both in theory and in practice. This version is smarter about not calculating unnecessary things and doing more caching. Our runtime is now dominated by one invocation of `MaybeLiveLocals` for each "round," and the number of rounds is less than 5 in over 90% of cases. This means it's linear-ish in practice. r? @oli-obk who reviewed the last version of this, but review from anyone else would be more than welcome

Optimizing this is done using SROA (scalar replacement of aggregates), which

github.com/rust-lang/rust

Perform simple scalar replacement of aggregates (SROA) MIR opt

rust-lang:master ← cjgillot:deagg-debuginfo

opened 10:07AM - 02 Oct 22 UTC

cjgillot

+1228 -57

This is a re-open of https://github.com/rust-lang/rust/pull/85796 I copied th…e debuginfo implementation (first commit) from @eddyb's own SROA PR. This pass replaces plain field accesses by simple locals when possible. To be eligible, the replaced locals: - must not be enums or unions; - must not be used whole; - must not have their address taken. The storage and deinit statements are duplicated on each created local. cc @tmiasko who reviewed the former version of this PR.

will implement.

SkiFire13 · November 11, 2022, 6:27pm

This seems completly different than what OP's talking about. OP just wants the resulting codegen to be the same, while you want different semantics from for. Moreover, the current semantics of the while in OP's post still don't allow the compiler to infer the loop body will run at least once.

scottmcm · November 11, 2022, 7:40pm

Can you explain more about this? LLVM optimizes test_v1 down to the closed-form solution (https://rust.godbolt.org/z/88obnj1jK), so the extra noise in MIR clearly doesn't impact it.

Nugine · November 11, 2022, 11:55pm

LLVM fails to optimize the loop when the function is annoated with targer_feature(enable="neon"). I'm not sure whether it is a LLVM bug. https://rust.godbolt.org/z/1M6WvWqjG

the8472 · November 12, 2022, 8:59am

That's probably due to ABI differences preventing inlining. If you specify -Ctarget-feature=+neon as compiler option instead it works.

Nugine · November 12, 2022, 9:42am

Interesting that a simple range iterator works. https://rust.godbolt.org/z/E4Y8Tv1o5

It seems that the std range implementation is too complex for the optimizer.

system · February 10, 2023, 9:43am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Mir optimization pass that implements auto-vectorization compiler	12	3286	June 26, 2022
About optimizations of `for` loops compiler	6	1145	August 25, 2023
How can we improve the rust-specific optimization? compiler	2	328	August 22, 2024
Optimization comparison: Vec vs array and for vs while compiler	5	1184	July 2, 2022
Decide if for-loop iterator is empty given a literal/const iterator compiler	6	293	August 20, 2024

MIR: Optimize for-loop on integer ranges

Related topics