I’ve always really liked this site! Thanks for your effort.
One thing I’ve noticed - coming from the C++ world - the rust code generation seems a little substandard. Take for example the one and only example I’ve gotten working
I tweaked the code tiny a bit and got “better” codegen (not sure if it’s actually faster…):
#[allow(unstable)]
pub fn max_array(x: &mut[f64; 65536], y: &[f64; 65536]) {
unsafe {
std::intrinsics::assume(x.as_ptr() as usize % 64 == 0);
std::intrinsics::assume(y.as_ptr() as usize % 64 == 0);
}
for i in (0..65536) {
x[i] = if y[i] > x[i] { y[i] } else {x[i]};
}
}
(The assumes are a hacked up version of __builtin_assume_aligned, but the code only changes movapd to movupd, as one might expect, without it.)
Gives
.LBB0_3:
movapd (%rsi,%rax,8), %xmm0
movapd 16(%rsi,%rax,8), %xmm1
maxpd (%rdi,%rax,8), %xmm0
maxpd 16(%rdi,%rax,8), %xmm1
movapd %xmm0, (%rdi,%rax,8)
movapd %xmm1, 16(%rdi,%rax,8)
.Ltmp3:
addq $4, %rax
cmpq $65536, %rax
jne .LBB0_3
for the loop, vs.
.LBB0_1: # %vector.body
movupd (%rsi,%rax,8), %xmm0
movupd (%rdi,%rax,8), %xmm1
maxpd %xmm0, %xmm1
movupd %xmm1, (%rdi,%rax,8)
addq $2, %rax
cmpq $65536, %rax # imm = 0x10000
jne .LBB0_1
for clang.
the one and only example
I ported the “sum array” one:
#[allow(unstable)]
pub fn sum_array(x: &[i32]) -> i32 {
unsafe {
std::intrinsics::assume(x.as_ptr() as usize % 64 == 0);
}
x.iter().fold(0, |sum, next| sum + *next)
}
(The unsafe and assume is not necessary, but it mirrors the “opt” versions of the C code, and gives slightly nicer code.)