Announcing support for rust 1.0 in "Rust Explorer"


#1

Hi there,

In light of the recent rust 1.0 release I’ve added support for rust 1.0 to my compiler assembly explorer. It allows you to write code and see the assembly code emitted interactively. It’s online at http://rust.godbolt.org/ - you’ll need to make a “pub fn testFunc” or similar to get code actually visible.

I’m a complete rust newbie so thoughts welcomed: the source is on github if you care to look.

One thing I’ve noticed - coming from the C++ world - the rust code generation seems a little substandard. Take for example the one and only example I’ve gotten working. Visiting http://gcc.godbolt.org/ and trying the same example in clang with appropriate options (http://goo.gl/4Zexu0 for example) yields beautiful vectorized code. Is there an a command-line switch I’m missing?

Hope you guys find the tool helpful, and thanks in advance for any ideas on improvements and/or how to showcase the blazing fast speed of rust better.

Thanks! Matt


#2

It should be noted that we just released 1.0 alpha which is largely a symbolic release with weak back-compat guarantees. Big changes are still coming!

That said, rustc is well-known to have substandard codegen and compile-times. As I understand it, this is due to us largely doing the minimum amount of work to generate a valid representation of the program for LLVM. It has to do a lot of work to rip through all the sloppy IR we generate to produce something reasonable, and we often aren’t giving the optimal metadata/flags for it to do what you would want.

There are some longterm plans to improve this situation after 1.0-stable lands (12+ weeks away), but for now the focus is on semantics and correctness – with the caveat that our abstractions “should” optimize well with a sufficiently good compiler implementation.


#3

I’ve always really liked this site! Thanks for your effort.

One thing I’ve noticed - coming from the C++ world - the rust code generation seems a little substandard. Take for example the one and only example I’ve gotten working

I tweaked the code tiny a bit and got “better” codegen (not sure if it’s actually faster…):

#[allow(unstable)]
pub fn max_array(x: &mut[f64; 65536], y: &[f64; 65536]) {
  unsafe {
    std::intrinsics::assume(x.as_ptr() as usize % 64 == 0);   
    std::intrinsics::assume(y.as_ptr() as usize % 64 == 0);
  }
  for i in (0..65536) {
      x[i] = if y[i] > x[i] { y[i] } else {x[i]};
  }
}

(The assumes are a hacked up version of __builtin_assume_aligned, but the code only changes movapd to movupd, as one might expect, without it.)

Gives

.LBB0_3:
    movapd    (%rsi,%rax,8), %xmm0
    movapd    16(%rsi,%rax,8), %xmm1
    maxpd    (%rdi,%rax,8), %xmm0
    maxpd    16(%rdi,%rax,8), %xmm1
    movapd    %xmm0, (%rdi,%rax,8)
    movapd    %xmm1, 16(%rdi,%rax,8)
.Ltmp3:
    addq    $4, %rax
    cmpq    $65536, %rax
    jne    .LBB0_3

for the loop, vs.

.LBB0_1:                                # %vector.body
    movupd    (%rsi,%rax,8), %xmm0
    movupd    (%rdi,%rax,8), %xmm1
    maxpd    %xmm0, %xmm1
    movupd    %xmm1, (%rdi,%rax,8)
    addq    $2, %rax
    cmpq    $65536, %rax            # imm = 0x10000
    jne    .LBB0_1

for clang.

the one and only example

I ported the “sum array” one:

#[allow(unstable)]
pub fn sum_array(x: &[i32]) -> i32 {
  unsafe {
    std::intrinsics::assume(x.as_ptr() as usize % 64 == 0);   
  }
  x.iter().fold(0, |sum, next| sum + *next)
}

(The unsafe and assume is not necessary, but it mirrors the “opt” versions of the C code, and gives slightly nicer code.)


#4

Thanks huon! If you don’t mind I’ll put those in as the “opt” versions for rust.


#5

I can see two differences between the Rust and C++ versions, aside from the aligned assumption: the C++ one is on doubles rather than int32s, and the Rust one does a conditional rather than unconditional store. If those are equalized…

void maxArray(int* __restrict x, int* __restrict y) {
    for (int i = 0; i < 65536; i++) {
        x[i] = ((y[i] > x[i]) ? y[i] : x[i]);
    }
}
pub fn max_array(x: &mut[i32; 65536], y: &[i32; 65536]) {
  for i in (0..65536) {
     x[i] = if y[i] > x[i] { y[i] } else { x[i] };
  }
}

the code generated by clang and rustc looks pretty similar, except that the Rust one goes 8 elements at a time rather than 4 for some reason. Different tuning options?

(Same goes if you change Rust to f64 instead, and compare against the non-‘opt’ C++ version which does the conditional store: similar code, but Rust goes two at a time.)


#6

Wow; I hadn’t even spotted that I’d switched doubles for i32s - thanks for noticing! I’ll update the code as you suggest, and also add the “opt” tweaks that huon indicated! Thanks!