I wanted to jot down some thoughts as well that we discussed in libs triage yesterday, but this is also primarily my own personal opinion about how we should proceed here.
-
We should strive to not implement or stabilize a “high level” or “type safe” wrapper around simd in the standard library. That is, there will be no ergonomic interface to simd in the standard library. My thinking is that we may not quite have all the language features necessary to even do this, and it’s also not clear what this should look like. This of course means that this sort of wrapper should not exist (e.g. the simd
crate), I’m just thinking it should be explicilty built on crates.io rather than the standard library. In that sense, I’d personally like to table all discussion (just for now) about a high-level, type-safe, generic, etc, wrapper around simd.
Note that this sentiment is echoing @emoon’s thoughts which I’ve also heard from many others who desire SIMD, which is that intrinsics are crucial and a type-safe interface isn’t. Also note that the simd subset @stoklund’s working on would be a great candidate for basing the simd
crate off (I believe).
-
We should not stabilize intrinsics as literally intrinsics. Consider, for example, the transmute
function. This is an intrinsic but it is not stable to define it as such, but rather it’s reexported as mem::transmute
. This is essentially my preferred strategy with simd intrinsics as well. That is, I think we should table all discussion of how these intrinsics and functions are defined, and rather focus on where they’re defined, the APIs, the semantics, etc. Note that this would also table discussion about whether these are lang items or not as it wouldn’t be part of the stable API surface area.
-
Given that information, I’d like to propose that we follow these guidelines for stabilizing access to SIMD:
- A new module of the standard library is created,
std::simd
- All new intrinsics are placed directly inside this
simd
module. (or perhaps inside an inner intrinsics
modoule)
- Method names match exactly what’s found in official architecture documentation. That is, these aren’t tied to LLVM at all but rather the well-known and accepted documentation for these intrinsics. Note that this helps porting code, reading official docs and translating to Rust, searching Rust docs, adding new intrinsics, etc. Recall that our goal is not to have an ergonomic interface to simd in the standard library yet, so that’s not necessarily a requirement here.
To me, this is at least a pretty plausible route to stabilization of the APIs involved with SIMD. Unfortunately, SIMD has lots of weird restrictions as well. For example @huon’s shuffle intrinsics, a critical operation with simd, has the requirement that the indices passed are a constant array-of-constants. I suspect that there are other APIs like this in SIMD which have requirements we can’t quite express in the type system per se.
@eddyb assures me, however, that we can at least encode these restrictions in the compiler. That is, we could make it an error to call a shuffle intrinsic without a constant-array-of-constants, and this error could also be generated pre-monomorphization to ensure that it doesn’t turn up as some weird downstream bug.
@burntsushi and I will be looking into the intrinsic space here to ensure we have a comprehensive view of what all the intrinsics look like. Basically we want to classify them into categories to ensure that we have a semi-reasonable route to expressing them all in Rust.
Ok, so with that in mind, we’re in theory in a position to add all intrinsics to the standard library, give a stable interface, and remain flexible to change implementation details in the future as we see fit. I’d imagine that this set of intrinsics can cover anything that standard simd features enable. That is, @burntsushi’s other desired intrinsics, which come with SSE 4.2, @nrc’s thoughts about more than just SIMD, and @gnzlbg’s desire for perhaps other intrinsics (e.g. ABM/TBM/AES) could all be defined roughly in this manner. Perhaps not all literally in std::simd
itself, but we could consider a std::arch
module as well (maybe)
The last major problem which needs to be solved for SIMD support (I believe) is the “dynamic dispatch” problem. In other words, compiling multiple versions of a function and then dispatching to the correct one at runtime. @jethrogb pointed out that gcc has support for this with multi versioning which is indeed pretty slick!
I think this problem boils down to two separate problems:
- How do I compiling one function separately from the rest (with different codegen options).
- How do I dispatch at runtime to the right function.
Problem (1) was added to LLVM recently (e.g. support at the lowest layer) and I would propose punting on problem (2). The dispatch problem can be solved with a feature called ifunc
on ELF targets (read, Linux), which is quite efficient but not portable. In that sense I think we should hold off on solving this just yet and perhaps leave that to be discussed another day.
So for the problem of compiling multiple functions, you could imagine something like:
#[target_feature = "avx"]
fn foo() {
// the avx feature is enabled for this and only this function
}
#[target_feature = "ssse3"]
#[target_feature = "avx2"]
fn bar() {
// avx2/ssse3 are both defined for this function
}
fn dispatch() {
if cpuid().supports_avx2() {
bar();
} else {
foo();
}
}
That is, we could add an attribute to just enable features like __attribute__((target("avx2"))
in C. My main question and hesitation around this feature is what it implies on the LLVM side of things. That is, what happens if we call an avx2 intrinsic in a function which hasn’t enabled the avx2 feature set? If LLVM silently works then it’s perhaps quite flexible to basically be done at that point, but if LLVM hits a random codegen error then we’ll have to ensure stricter guarantees about what functions you can call and where you can call them. Put another way, our SIMD support should literally never show you an LLVM error, in any possible usage of SIMD.
I believe @aturon is going to coordinate with compiler folks to ensure that the LLVM and codegen side of things is ready for this problem.
Ok, that was a bit long than anticipated, but curious to hear others’ thoughts about this!