I'm not 100% sure what the process for making this change would be, or exactly which files in the std crate would need to be changed, but there is an optimization for sums of consecutive unsigned integers that I would like to implement, if it's welcome. (I would like to special-case Iterator::sum for ops::Range<u*>, and ops::RangeInclusive<u*>).
Basically for any consecutive sequence of integers from 1 to n inclusive has a sum equal to n*(n+1)/2. This also works for ranges from 0 to n (inclusive) because sum(0..=n) == 0 + sum(1..=n). We can generalize this to work for any range by simply subtracting the lower sum from the upper sum. For example, to do sum(5..=10) we would do sum(1..=10) - sum(1..=4) (10*11/2 - 4*5/2 == 55-10 == 45 == 5+6+7+8+9+10 == sum(5..=10).
Thus for any RangeInclusive<u*>: range.start..=range.end we can replace the current implementation of Iterator::sum (which does a slow iterative sum) with range.end*(range.end+1)/2 - range.start*(range.start-1)/2.
The same solution can be extended to ops::Range<u*> except when range.end == u*::max (and Iterator::sum overflows in that case anyway).
I think this will yield a significant performance increase in some (perhaps uncommon) cases, with no drawbacks, but I'm curious for other's thoughts.
This also works, happy to implement it this way. Casting to a larger type may be faster, since it would prevent the branching logic of the if statement, but that's a good solution for something like u128.
Consider computing the sum of the range directly. One easy way to derive the formula is that it is the number of terms times the average of terms, and with a consecutive range like this, those are just last-first+1 and (last+first)/2.
So I don't know that it's worth bothering to do it in the Rust code -- it's not "no drawbacks" because it's most code to maintain and more metadata to load in libcore (which slows down compilation for everyone, if perhaps not by that much).
Also, I'm curious what real code you have where this actually happens? Summing something that's just a range seems rare.
And it goes for the version with a branch. Whoops. I'm curious if there's any benefit to guaranteeing this optimization from the rust side; maybe the only benefit is be to debug builds. That actually does seem marginally beneficial, as it's an algorithmic difference between the two compiled versions, not just an "optimized is faster".