pub fn test_v1(n: usize) -> usize {
let mut s = 0;
for i in 1..n {
s += i;
}
s
}
pub fn test_v2(n: usize) -> usize {
let mut s = 0;
let mut i = 1;
while i < n {
s += i;
i += 1;
}
s
}
test_v1 and test_v2 are equivalent. But test_v1 generates tons of MIR when fully inlined.
What if rustc can transform test_v1 to test_v2 when optimizing MIR? I believe it will give some perf wins on compiletime and may generate better machine code for more complicated cases.
The unoptimized for-loop has caused an issue for SIMD code. LLVM fails to optimize it.
maybe we could talking about another Trait, FixedSizeIterator<T,Idx>
where FixedSizeIterator::next(&mut self) returns T, and const fn FixedSizeIterator::next(&self) return Idx, where Idx is either u8,u16,u32,u64 or u128
After that, when we wrote Range::intoIter, we would got a FixedSizeIterator
But actually it is an API-break.
for i in 0u8..{
//triggers a overflow panic! when we try to yield its 256-th element
}
for i in FixedSizeIterator(0u8..){
// should not panic, since we should got element count of u8, thus up to 255.
}
Maybe a discussion thread is needed, since we are changing something.
Wow, there is a lot of garbage in the MIR of test_v1! Variables created and instantly moved out of, references created and instantly reborrowed.
_8 = &mut _5; // scope 2 at /app/example.rs:3:14: 3:18
// _7 is not used anywhere else
_7 = &mut (*_8); // scope 2 at /app/example.rs:3:14: 3:18
StorageLive(_12); // scope 5 at /rustc/57f097ea25f2c05f424fc9b9dc50dbd6d399845c/library/core/src/iter/range.rs:711:9: 711:25
_12 = &mut (*_7); // scope 5 at /rustc/57f097ea25f2c05f424fc9b9dc50dbd6d399845c/library/core/src/iter/range.rs:711:9: 711:25
StorageLive(_11); // scope 3 at /app/example.rs:4:14: 4:15
// _11 is otherwise unused
_11 = _10; // scope 3 at /app/example.rs:4:14: 4:15
_0 = Add(_0, move _11); // scope 3 at /app/example.rs:4:9: 4:15
StorageDead(_11); // scope 3 at /app/example.rs:4:14: 4:15
I don't know whether the MIR for test_v1 can be made as succinct as for test_v2, but there is a ton of low-hanging fruits here. Simple elimination of redundant variables should decrease MIR size in half if not more. There is no reason to do lots of reborrows, or to store a range as a tuple and keep destructing it. I wonder if there is some extra flag to enable those optimizations on current Rust?
While in this case it is pretty pointless, in case of calls copies can be necessary for correctness. rustc_mir_build doesn't know when it is necessary and when not (doing so requires a non-local analysis), so it unconditionally inserts the copies just in case. We also don't yet know the exact rules. In any case you might be interested in following
Optimizing this is done using SROA (scalar replacement of aggregates), which
This seems completly different than what OP's talking about. OP just wants the resulting codegen to be the same, while you want different semantics from for. Moreover, the current semantics of the while in OP's post still don't allow the compiler to infer the loop body will run at least once.
Can you explain more about this? LLVM optimizes test_v1 down to the closed-form solution (https://rust.godbolt.org/z/88obnj1jK), so the extra noise in MIR clearly doesn't impact it.
LLVM fails to optimize the loop when the function is annoated with targer_feature(enable="neon"). I'm not sure whether it is a LLVM bug.
https://rust.godbolt.org/z/1M6WvWqjG