What's the next step towards the stabilization of SIMD?


#1

Now that the target_feature RFC has some consensus it is time to discuss the next step towards SIMD on stable Rust.

From the discussions in the target_feature RFC it seems clear that the next step is having an RFC about SIMD types. This summarizes the current state of affairs:

  • Background:

  • Implementation status: The feature repr(simd) is currently implemented, its semantics are not sufficiently specified in any RFC, and some ABI issues must be resolved, but it can be used on nightly today to write SIMD types “just fine”.

  • Consensus from previous discussions: @burntsushi achieved consensus on trying to stabilize concrete SIMD types like f32x4 without stabilizing repr(simd). However, during the target_feature RFC some ABI issues where discovered, and resolving them is going to require a more detailed specification of repr(simd). Also, @nrc raised the question of whether we shouldn’t be using #[repr(simd)] [T; N] on arrays instead of tuple structs. Since the type-level const RFC is (almost) merged we might want to reconsider this.

Anyhow, the purpose of this post is not to discuss the details of SIMD types, but to gather consensus on what is the next step. I think we have two main choices:

  • have an RFC for the concrete SIMD types that we want to stabilize and their semantics, or
  • have an RFC to specify repr(simd) first, and then decide whether we want to stabilize repr(simd) or only some SIMD types after that discussion is over.

I think that any discussion on the semantics of SIMD types is going to implicitly touch repr(simd), so I find the second alternative, specifying repr(simd), less messy. After that is done, an RFC for concrete SIMD types will be trivial to write anyway.

Thoughts?


#2

Could you please summarize these issues, so that those who have not followed the target_feature RFC PR (like me) can get an idea of what sorts of changes to repr(simd) would be needed?


#3

@rkruppe in a nutshell (live example):

// Given a SIMD vector type:
#[repr(simd)]
struct f32x8(f32, f32, f32, f32, 
             f32, f32, f32, f32);

// and the following two functions:

#[target_feature(enable = "avx")]
fn foo() -> f32x8;  // f32x8 will be a 256bit vector

#[target_feature(enable = "sse3")]
fn bar(arg: f32x8);  // f32x8 will be 2x128bit vectors

// what are the semantics of the following when
// executing on a machine that supports AVX?
fn main() { bar(foo()); }

The compiler will compile bar for SSE3 and foo for AVX. Even if you run this on a machine that supports AVX, the calling conventions / ABIs of these two functions are different. On nightly, this produces no warning, and bar will get passed garbage. Some C++ compilers warn about this in some cases.

First, this problem is solvable: we just haven’t chosen a solution yet. Second, independently of whether we stabilize repr(simd) or not, we need to solve this anyways to make SIMD types sound. Third, this is a pure repr(simd) issue. I used target_feature for simplicity, but if you put the functions in different crates, compile for different --target-cpus (AVX, SSE3), and then link them and run them on a CPU that is a superset (e.g. AVX2), you reproduce this without using target_feature.

This problem is basically that portable vector types (which is what repr(simd) introduces) have a different layout depending on the features enabled for a particular piece of code, and this layout is not part of the type.

So what I am wondering is whether we should nail down the semantics of repr(simd) first, and then discuss what we actually want to stabilize (e.g. repr(simd) or some subset of std::f32x8 types), or whether we should pursue the fixed sub-set of vector types directly, and clear these issues in that RFC.


EDIT: this post is only about how to follow the roadmap, if there is interest in discussing possible solutions to the repr(simd) issues we should open a different thread for that.


EDIT2: I’ve filled an issue so that those interested can discuss this there.


#4

@gnzlbg Thanks for writing this, and thanks for all your work on target_feature. It’s immensely helpful. :slight_smile:

I think what I’d personally like to do as a next step is polish up stdsimd and write a few guides for folks to start contributing to it, and maybe lobby to have it move to rust-lang-nursery. My hope is that we can get it into shape during the impl period.

I don’t really know what to do about repr(simd) and the various ABI issues. I think at this point the issues have exceeded my own mental capacity at this point in time. :-/ I still think that the ultimate goal is to get something like stdsimd—including the platform independent types and the platform dependent vendor intrinsics—exported via core (and consequently std as well).


#5

I think the reason we have repr(simd) is to be able to do this in the first place, so we should definetely work on stdsimd in parallel to the implementation of target_feature and the repr(simd) to make sure that these features solve the problem they intend to solve.

I still think that the ultimate goal is to get something like stdsimd—including the platform independent types and the platform dependent vendor intrinsics—exported via core (and consequently std as well).

I don’t know about this yet, but I think having stdsimd working on stable Rust would already be a huge success. By then we will probably have a better idea of whether it belongs into std, core, or not (it might well be that it must be part of std for “reasons” but currently it’s too soon to tell).


#6

Thank you for the summary. It seems like this issue is relatively independent of how vector types are presented at the source level? That is, whatever solution we adopt would probably work uniformly for repr(simd) types and f32x8 types? In that case I see no reason for a push to spec out repr(simd). Even just an RFC for f32x8 and friends would have to solve the problem, and if that implies changes to repr(simd) as an implementation detail, so be it.


#7

Our choices are severely constrained. It needs to either be blessed or shipped as part of core/std because it needs to access compiler intrinsics:

#![feature(
    const_fn, link_llvm_intrinsics, platform_intrinsics, repr_simd, simd_ffi,
    target_feature,
)]

There are lots of examples here: https://github.com/BurntSushi/stdsimd/blob/master/src/x86/sse2.rs


#8

I see. Time is then probably better spent fixing the ABI “bugs” with repr(simd) and just finish stdsimd with an attempt to stabilize that.


#9

What do we need to safely use stdsimd on stable if it were part of std?

We would, at least, need:

  • #[target_feature] for unconditional compilation
  • cfg(target_feature) for conditional compilation
  • run-time feature detection (x86 at first, probably followed by ARM) for those that want to safely generate binaries targetting multiple feature sets

So, in more actionable terms:

  • [ ] merge and implement “something like” RFC 2045: target_feature
  • [ ] fix bug 42515 (propagate #[target_feature] to cfg(target_feature))
  • [ ] fix bug 44367 (ABI issues with repr(simd))

Anything else?


#10

@gnzlbg the checklist of things you’ve got at the end here sound great to me! I can’t think of much else that’d block the stabilization of simd-in-std on a technical level.


#11

@alexcrichton Shall we have a SIMD roadmap written down somewhere, so that maybe we can split the work? For example, I am working on a stdfeatureid crate to complement @burntsushi 's stdsimd crate.


#12

Certainly! My best guess would be a SIMD tracking issue in the repo? (I forget if we already have one)


#13

We have this one: https://github.com/rust-lang/rust/issues/27731


#14

Can ABI issues and checks be dodged by stabilizing only SSE2 on x86_64? (i.e. it works there unconditionally)

It’s not to say that other platforms or higher levels aren’t important, but even just having SSE2 is a major improvement over not having any SIMD anywhere.


#15

Sounds like a great location to me! If you want to write something up I can copy it to the OP


#16

I think that without resolving ABI issues we could stabilize up to SSE4.2, and also bit manipulation intrinsics BMI/BMI2/TBM/ABM.