Let's consider 3 ways of calculating the sum of Vec
simple
#[no_mangle]
pub fn simple_sum_vec(v: &Vec<i32>) -> i32 {
v.iter().fold(0i32, |a,b| simple_sum(a, *b))
}
fn simple_sum(a: i32, b: i32) -> i32 {
a + b
}
curried
Using a sum function implemented as curried function. (Trust me, this isn't just me complicating things for the sake of complication)
#[no_mangle]
pub fn curried_sum_vec(v: &Vec<i32>) -> i32 {
v.iter().fold(0i32, |a,b| curried_sum(a, *b))
}
fn curried_sum(a: i32, b: i32) -> i32 {
fn_curry(a)(b)
}
// This could return `impl Fn` and everything would be fine, but impl Fn works only for one argument.
// It uses Box<dyn Fn> to demonstrate problem that any curried 3+ arg function would have
fn fn_curry(a: i32) -> Box<dyn Fn(i32) -> i32> {
Box::new(move |b| a + b)
}
curried with continuations
Using curried function rewritten as small-step operational semantics and thus avoding Box
, but still using dyn Fn
:
#[no_mangle]
pub fn continuation_sum_vec(v: &Vec<i32>) -> i32 {
v.iter().fold(0i32, |a,b| continuation_sum(a, *b))
}
fn continuation_sum(a: i32, b: i32) -> i32 {
fn_continuation(a, &|after_a| /*_*/ {
after_a(b)
})
}
fn fn_continuation(
a: i32,
after_a: &dyn Fn(
&dyn Fn(/*b*/ i32) -> i32, //
) -> i32,
) -> i32 {
after_a(&move |b| /*-> i32 */ {
a + b
})
}
The fold helper functions simple_sum
, curried_sum
, continuation_sum
compile to same assembly % __rust_no_alloc_shim_is_unstable
(Missed optimization/perf oddity with allocations · Issue #128854 · rust-lang/rust · GitHub) in curried_sum
. No dyn
calls left.
The simple_sum_vec
and continuation_sum_vec
compile to the same assembly code under 1.84.0 compiler.
The curried_sum_vec
compiles differently and runs much slower (~8x) on my machine.
Question 1
Can this be considered something a compiler would want to optimize?
I would very very much prefer the "curried" to the "continuations" if I can't use the "simple" way. The "continuations" way is much harder to write, especially for functions with more arguments (4-arg example).
Question 2
I wanted to see if Missed optimization/perf oddity with allocations · Issue #128854 · rust-lang/rust · GitHub would help, so I build rustc locally (off 8c61cd4d commit): once unmodified and once patched.
I am probably doing something wrong, but simple_sum_vec
and continuation_sum_vec
perform same with my local rustc build (both unmodified and patched).
But curried_sum_vec
becomes 500x slower than the baselines (both unmodified and patched). Also, I see same results with cargo +nightly bench
.