Pre-RFC: SIMD groundwork

nagisa · July 9, 2015, 7:05pm

Rereading original post now, I think I misunderstood what the original post meant by checking (overflow checking (?)), while I meant something along the lines of cast validity (e.g. from u16x4 to u8x4 is valid (?) and u16x4 to u8x6 is not) checking.

oli-obk · July 9, 2015, 7:18pm

I got that, but misunderstood the rest. Those debug_asserts definitely don't make sense at all. I think I now understand.

Macros could be the solution until we get value generics:


macro_rules! do_something_with_const {
    ($a: expr) => ({
        const A: usize = $a;
        call_only_with_const_arg(A);
    })
}

fn call_only_with_const_arg(_: usize) {}

fn main() {
    let a = 42;
    do_something_with_const!(a);
}

yields the following:

<anon>:13:30: 13:31 error: attempt to use a non-constant value in a constant
<anon>:13     do_something_with_const!(a);
                                       ^
<anon>:2:1: 7:2 note: in expansion of do_something_with_const!
<anon>:13:5: 13:33 note: expansion site
<anon>:13:30: 13:31 error: unresolved name `a`
<anon>:13     do_something_with_const!(a);
                                       ^
<anon>:2:1: 7:2 note: in expansion of do_something_with_const!
<anon>:13:5: 13:33 note: expansion site

together with static_assert it should even be possible to do bounds checks.

It's not pretty, but it does the job

eternaleye · July 9, 2015, 8:58pm

This is exactly what I see as "quirky structural typing" - "these types are equivalent if their representations are equivalent."

Treating this as some special-case of intrinsics (and only some intrinsics at that!) strikes me as strange and problematic; if structural typing is valuable, I think it needs to be handled carefully. Until then, I honestly think that requiring exact types is Good Enough.

huon · July 10, 2015, 12:03am

Which exact types? If we required specific types, we'd need some way to inform the compiler of them, and the specificity requirement would imply that there's exactly one type that can be used? If so, that would impose the requirement that there's only one definition of SIMD types in a given tree of dependencies, and I really really don't want that requirement. (For one, I personally don't expect to get everything right first time, so it'd be very good if people are free to experiment themselves without "accidental"/arbitrary restrictions.)

I'm moderately concerned that introducing a vein of "relaxed" typing into the compiler will leave the door open for abuse/crazy tricks, but I'm unsure. However it seems quite restricted, so it's not clear to me that one can do anything even slightly useful with it. Note, in practice, people using SIMD won't need to worry about this at all: libraries will define things to ensure type safety (even at the intrinsic level).

NB. for the platform specific intrinsics we can/will require that they aren't generic, so can type-check them in the type checker, properly (properly == answering is this a SIMD type of the appropriate length with the appropriate element type?). Hence, this discussion is basically the difference between being able to write fn simd_shuffle2<T, U>(v: T, w: T, ...) -> Simd2<U> and having the option to impose type safety (which is what will happen in practice), or being forced to have scheme by which the compiler can be totally assured that things will work. This would either require writing separate shuffle/comparison intrinsics for every concrete SIMD type, or would require compiler-known traits (like #[simd_primitive_trait], but also at least one more, I think) with some compulsory associated types and so on.

#[simd_primitive_trait]
trait SimdPrim {
     type Bool: SimdPrim;
}
#[simd_vector_trait]
trait SimdVector {
     type Elem: SimdPrim;
     type Bool: SimdVector<Elem = Self::Elem::Bool>;
}

#[repr(simd)]
struct Simd2<T: SimdPrim>(T, T);

impl<T: SimdPrim> SimdVector for Simd2<T> {
    type Elem = T;
    type Bool = Simd2<T::Bool>;
}

extern {
    fn simd_shuffle2<T: SimdVector>(v: T, w: T, i0: u32, i1: u32) -> Simd2<T::Elem>;
    // ...

    fn simd_lt<T: SimdVector>(v: T, w: T) -> T::Bool;
    // ...
}

We'd need to have careful restrictions about how the implementations of SimdPrim and SimdVector can work, and especially around generic types. It seems very complicated, and I'm not sure it's worth it.

huon · July 10, 2015, 12:06am

Seems like a good work around for prototyping/while we wait, yeah. Thanks!

eternaleye · July 10, 2015, 5:06pm

Considering the requirements of #[repr(simd)] are exactly "has [T; n] layout where T: SimdPrimitve (modulo some potential alignment voodoo and constraints on n)", and the intrinsics are defined as "frobnicates a vector of four 32-bit integers", I'd consider that a relatively easy decision.

From there, I'd say that creating something like

trait Structural {
    type Layout: From<Self> + Into<Self>;
    // note: the conversions really ought to just be "safe transmutes";
    // might be worth making them methods on Structural
}

fn some_simd_thing<T>( a: T, b: T ) -> T where T: Structural<Layout=[u32; 4]> {
    ...
}

along with support for #[derive(Structural)] would do the job quite nicely. The #[derive] would define Layout as an array if possible, and a tuple if not. It would structuralize the members as well - so a newtyped array of structs would become an array of tuples, say.

That way, structs opt in to being capable of structural typing, and consumers of those structs (generically) opt in to using that functionality. No spooky magic intrinsics, callers don't need to futz about too much with their types, ponies for everyone.

huon · July 10, 2015, 5:30pm

So you’re suggesting that we require that the platform-specific intrinsics are defined either like

extern {
    fn x86_mm_abs_epi16(a: [i16; 8]) -> [i16; 8];
}

Or like

extern {
    fn x86_mm_abs_epi16<T: Structural<Layout=[i16; 8]>>(a: T) -> T;
}

?

(It’s not clear to me which one.)

I don’t really see how this has much benefit over just allowing any repr(simd) type with the right length/type. I agree that Structural seems to be one way to solve that, but there seem to be simpler ways that work fine.

Also, could you expand on how you see this solving the shuffle/comparison intrinsics? I suspect we’ll need a few more type system features for it to work for that case. (That’s not saying we can’t get them, but it’s not necessarily trivial.)

Lastly, there’s more to this SIMD structural typing than “layout looks like an array”, e.g. alignment of SIMD types is higher (which I forgot to mention in the pre-RFC…), and it doesn’t make sense to take a reference to the internals of a SIMD register while it makes perfect sense for any old array/struct (sure there are very good reasons to write it to memory, but one is usually wanting these values to stay literally in registers, especially when calling intrinsics).

(BTW, this SIMD work is likely to remain unstable for a while, so we’ve got scope for making changes to whatever design lands. )

huon · July 10, 2015, 5:44pm

Oh, I missed replying to this: I think it makes sense, but it can be done in future.

huon · July 10, 2015, 6:04pm

Opened SIMD groundwork by huonw · Pull Request #1199 · rust-lang/rfcs · GitHub. Thanks for the initial feedback everyone, I think it improved!

(Now the only new attribute is repr(simd).)

eternaleye · July 10, 2015, 7:48pm

What I was saying was a bit more than that.

From my earlier post, these may not need to be intrinsics at all - one-instruction-of-asm!() functions with #[inline(always)] and proper register specifiers on the asm!() can do the job except for the quirky magic structural typing.
The ergonomics of strict types, whether [u32; 4] or Simd4<u32>, really aren’t that bad for low-level building blocks that will mostly live behind prettier interfaces.
What problems there are with the ergonomics can be largely resolved with T: Structural<Layout=[u32; 4]>> + SimdSafe, where SimdSafe is a marker trait denoting the same things as #[repr(simd)], and possibly added by it.

And yeah, I edited “alignment voodoo” into my post before you mentioned that

Anyway, the result of the above is that one only really needs two changes to the compiler:

#[repr(simd)]
#[lang_item="simd_repr_marker"] (added by #[repr(simd)])

Structural can be done without any help from the compiler, but would benefit a lot from a #[derive]

But as far as benefits, this avoids a large mass of worryingly magical (regarding parameter types) intrinsics being added to the compiler.

kornel · July 14, 2015, 3:42pm

Could you use Rust namespaces instead of C-style prefix-namespacing for functions?

i.e. instead of x86_mm_abs_epi16() I’d prefer simd::x86::sse::mm_abs_epi16(), so I could use:

use simd::x86::sse::*;
mm_abs_epi16();

Namespacing by SSE/AVX generation would be useful too, so e.g. without importing simd::x86::avx I wouldn’t be able to use these functions accidentally (similar how C has different headers), which is important, since it requires appropriate cpuid checks, etc.

kornel · July 14, 2015, 3:49pm

I don’t like cfg() for this. It’s useful only when you can set lowest-common-denominator for the whole program, but to use newer instructions you must have a runtime check:

fn foo_plain() {…}
fn foo_sse() {…}
fn foo_avx() {…}
fn foo() {
   match(cpu_runtime_check) {
      avx => foo_avx(),
      sse => foo_sse(),
      _ => foo_plain(),  
   }
}

or a variant of this using function pointers:

github.com

mozilla/mozjpeg/blob/5198654f739552ed24c7f014574d1e74ee9ef8ac/jddctmgr.c#L163


case JDCT_ISLOW:
  if (jsimd_can_idct_islow())
    method_ptr = jsimd_idct_islow;
  else
    method_ptr = jpeg_idct_islow;
  method = JDCT_ISLOW;
  break;
#endif
#ifdef DCT_IFAST_SUPPORTED
case JDCT_IFAST:
  if (jsimd_can_idct_ifast())
    method_ptr = jsimd_idct_ifast;
  else
    method_ptr = jpeg_idct_ifast;
  method = JDCT_IFAST;
  break;
#endif
#ifdef DCT_FLOAT_SUPPORTED
case JDCT_FLOAT:
  if (jsimd_can_idct_float())
    method_ptr = jsimd_idct_float;

huon · July 14, 2015, 4:52pm

(The discussion has now moved to Sign in to GitHub · GitHub)

Both of those points are discussed above in Pre-RFC: SIMD groundwork - #20 by huon

In summary, the namespacing of intrinsics is because the compiler needs them to be namespaced internally, and cfg is used because that's the only scheme I can possibly think of that works without some extensive compiler (and possibly LLVM) changes. It should be possible to build better systems backwards-compatibly later. (NB. that this is essentially the same thing that C/C++ do: they have a separate compilation unit for each version of a function and link them together so they can set the right target features. This is something cargo could assist with.)

This would require duplicating the semantics of a very large number of intrinsics. Having shims might be nice, but it's a non-trivial amount of grunt work. I don't think it's a valuable use of time right now.

fyl2xp1 · July 16, 2015, 6:28pm

I clearly like the idea of having data types and/or methods explicitly supporting SIMD-instruction without relying on the compiler to deduct such semantics.

But I wonder if your proposal isn too heavily specialized on existing instructions sets. I'd prefer a more generic solution resulting in a superset of existing functions, which might be backed by more hardware support over time, thus preventing "vendor extensions" as seen in OpenGL/CL

What about supporting matricies? Can your approach be generalized somehow (maybe in the future)? Specialized compile targets (like GPGPUs) might already benefit. (I'm still waiting for the Vulkan-API to see where this is heading)

I vote for "yes". It could be used for multidimensional wrap-arounds e.g. in texture-coordinates or splitting a vector into nested coordinate systems. Otherwise the developer has to decompose the vector.

huon · July 16, 2015, 6:37pm

The goal is to provide the tools needed for more abstract libraries to be written built on them. It may turn out that more compiler magic is needed/desired for certain things, but this is the first step to exploring the space, and working out what magic we want.

Matrices are just vectors interpreted differently, i.e. a 2 by 2 matrix of f32s can be represented as a 4-vector of f32s, along with some extra operations (but, say, matrix addition is the same as vector addition). This is again something I'd like to see experimented-with in external libraries before we dive into hard-coding it into the language.

There's absolutely no debate whether division/modulo is useful or not, it's a question of whether we should implicitly have the performance penalty of (in the general case) doing a decomposition and 2 (or 4 or 8, ...) successive integer divisions, or instead make it more explicit.

fyl2xp1 · July 17, 2015, 11:29am

This is true for all component-wise operations. But when it comes to determinants, transponation, matrix-vector-multiplication, etc. representing a matrix as flattened vector will become an unhandy work-around.

I fully agree. I just wanted to be ensure, that whatever syntax and method signatures will be chosen, there's enough room for a later extension to matricies (like f64x4x3 or f64_4x3 - which I personally prefer).

Maybe I should have make this more explicit: I'd like to see those operations to be handled implicitly. Explicit decomposition by the developer would result in less maintainable code.

The fact that those functions might not be backed by an appropriate CPU instruction, should be noted in the documentation.

huon · July 17, 2015, 8:15pm

These operations fundamentally need to be written in terms of the machine instructions, which (in general) operate on flattened vectors. Of course, there are instructions useful for matrix operations (e.g. shuffles) but still: this whole design is to allow building whatever sort of fancy functionality one might want without requiring it to be at all hardcoded. I.e. my intention is for it to be literally possible for anyone to dive in and define their own SIMD library with their own types/method signatures.

It's not clear to me if these operations occur enough to be worth the performance cliff of using the operator itself.

pcwalton · July 19, 2015, 5:56pm

No, as mentioned in the RFC, that destroys any SIMD-related algebraic optimizations LLVM would want to do. I think using inline assembly would be a big mistake.

gnzlbg · July 21, 2015, 11:30am

Hardcoding matrices into the language? Are there any SIMD instruction for working with matrices?

I have a general question. Supposing we had type-level integers, would this RFC look similar, or completely different to what it looks now?

huon · July 21, 2015, 4:09pm

Not that I know of, which is exactly my point: there doesn't seem to be much point in hardcoding them.

This was touched on a little above:

(BTW, for people coming here new from TWiR, discussion has moved to 1199.)

Topic		Replies	Views
[Pre-Pre-RFC] Target restriction contexts language design	8	2927	March 25, 2019
What's the next step towards the stabilization of SIMD? language design	16	3777	March 25, 2019
[Pre-RFC] Meta-target feature for 128-bit SIMD language design	19	1397	March 25, 2019
Getting explicit SIMD on stable Rust	336	43842	March 25, 2019
Stabilizing SIMD-aligned types ahead of the rest of SIMD language design	3	1840	March 25, 2019

Pre-RFC: SIMD groundwork

Related topics