[Pre-RFC #2]: Inline assembly

I much prefer a solution which does not require the use of string literals like what I proposed here.

Not using string literals allows embedding Rust code directly in the assembly, which can be convenient. Letting inline assembly return value makes thing feel very Rust-like too, here's an example:

let val = asm {
	inw {reg => return}, {const = 0x40 + PORT_OFFSET}
};

I'm not a fan of Unix-style mesh together words like inlateout and those don't really fit with the naming conventions of Rust either.

1 Like

@Zoxc What you propose can be done with a proc macro that wraps around the lower-level string-based asm! macro. I would prefer keeping the low-level interface string-based, since it gives you much more control over the asm code.

2 Likes

That's kind of a cop-out answer though. Any assembly syntax should be able to lower to any other. I'm also not sure anyone will bother to add a dependency (with the compile time overhead and other complications that comes with it) for a better syntax for a tiny part of their code.

A string-based interface seems ideal in the medium-term to me, because the best assembly syntax is highly controversial, has multiple competing standards as it is, is constantly evolving in ways we don't control, seems largely orthogonal to the compiler integration issues that make inline assembly a meaningful feature different from "outline" assembly files, and imo implies the compiler understands the assembly much more than it actually does (whereas it needs to understand constraints, so "real syntax" for constraints feels right).

At least, I assume no one is proposing that we put every assembly language Rustaceans want to use directly into the Rust grammar, or add std macros for transpiling or stringifying all of them. I could see it happening for a few of the most popular assemblies, but if we need string asm as a fallback anyway, non-string asm seems like a clear post-MVP feature.

9 Likes

I don't think anyone has brought this up: should we make an attempt to specify how inline assembly behaves w.r.t. the memory model? Like, we totally don't have a memory model as far as I know, but we did knock off C++'s atomics, so I think it might be wise to say something along the lines of "such and such loads and stores will never be reordered across inline assembly"; in particular, I want to know if I should expect

let my_ptr: *const usize = ...;
let my_val: usize;
asm!("lw {0}, 0({1})", out(my_val), in(my_ptr));

to have the exact same semantics as read_volatile. (Similarly, we come again to the question of what kind of barrier asm!("") is: is it a full compiler-level mfence (processor OoO notwithstanding)?)

1 Like

This is a frustratingly vague notion, but anything that is like a CPU-level fence in that it affects all memory (or at least, entire components of the memory hierarchy) seems irreconcilable with basic optimizations in the presence of calls to other functions (which may include some inline assembly). Even optimizations that are not particularly Rust-specific. The only tenetable solution appears to be what @Amanieu proposed earlier: inline asm can affect memory in the same ways any external function call can, no more.

5 Likes

I agree, whether volatile or non-volatile is chosen as the implicit default for inline asm (with the other alternative needing explicit specification).

Having a "this touches all memory" attribute on ASM is a useful tool to have as an option but is definitely not what you want as default.

is not my understanding of volatile. My background includes a lot of embedded development where memory-mapped registers are common. My understanding of volatile is that every access to a memory region marked volatile has potential side effects, so can't be omitted or repeated or reordered by the compiler (e.g., LLVM) as a form of optimization.

correct, volatile means "side effect you don't understand just do it", but separately from that you can also specify an asm block that reads and writes to arbitrarily all of memory.

https://youtu.be/nXaxk27zwlk jump to 40:40 or so

Now of course using this for benchmark or other purposes is rare usage, and possibly will fail in some circumstances etc etc, but it should be possible to try to set up in whatever asm thing Rust goes with.

We should indeed say that explicitly.

By default, asm! acts as a full compiler barrier, and the compiler must never reorder operations from one side to the other.

Specifying the pure, nomem, or readonly flags may allow the compiler to perform some kinds of reordering relative to the asm! block.

2 Likes

I guess in a related thread: a common type of MMIO access is mmaped memory across two processes. The following is an issue that a friend of mine on Chromium complained about, as I recall it roughly:

You have a shared ring buffer for doing IPC. Obviously, this shared ring buffer is a volatile uint8_t* in C++. Unfortunately, this brings with it all of the reordering barriers of volatile, even though we just want "please don't duplicate or delete loads or stores".

If I decide I don't like read_volatile beause I don't like LLVM's load/store volatile instructions, and decide to write my own cool volatile load, is inline assembly the correct avenue, with the correct "yes I have no memory model constraints do whatever as-if transforms you want" applied?

Maybe this is a "we want a separate language feature" thing, but if so maybe "doing silly stuff with atomic reorders" should be an explicit non-goal (since inline assembly is an otherwise attractive way to get compiler-level atomicity).

Since I just thought about it, here's another "fun" way a proc-macro could theoretically implement inline assembly: shell out to clang to do it. (i.e. implement an external C function that includes the inline asm and compile it with clang. With aggressive cross-language inlining, this might even accomplish zero runtime cost, though that relies on LLVM actually inlining across the language border.)

Of course, as a rustc feature or a Rust spec item, specifying as-if behavior by calling the system assembler is still probably the better idea than by calling clang.

To stay somewhat on topic, I agree what the asm block should be allowed to do is the same as what an external library call is allowed to do. You can do whatever you want within the asm and to communicate between asm blocks, but to play fair with the outside world, you have to play fair and abide by some rules.

And you have my vote for stripping the compiler feature to the absolute bare minimum. Chasing ergonomics, expressive power, and analytical power is best left to libraries, with the language/compiler just providing the minimal surface to allow the control needed/desired for dropping to the asm level.

I think it's fair to leave most of the optimization hints out of the first draft, so long as there's clearly specified space for how they would be added in the future.

2 Likes

A volatile load is a subset of "anything an external function call can do". So yes, you can implement a volatile load using inline asm and it will work as you expect it.

I've thought of doing something like that for something else, but how do you handle the proc macro invoking clang with the right flags depending on how rustc is invoked in the first place (whether LTO is enabled, using the right target, etc.) ?

@Amanieu

Thank you, this read very nicely, explained everything in a way that I could understand, made instantaneous sense, and feels like a syntax that I'd love to use for inline assembly.

What follows are some of the notes I wrote down while reading it, so interpret this as feedback that might help make the RFC more clear. While reading some sections, I got some questions, that were addressed by a subsequent section later on. For the real RFC, I'd recommend trying to forward-link sections more aggressively, but Discuss isn't really a good place to do that.

Excess arguments are required in that example for the outputs, but it is unclear to me from the example how they are useful for the inputs. Like, I'd find the following much more readable:

asm!(
    "mul {b}, {a}",
    a = in(reg) a, b = in("eax") b,
    lateout("eax") lo, lateout("edx") hi
);

nomem means that the asm code does not read or write to memory.

I know that this is the introduction, but maybe briefly explain here already how this could happen, e.g., if eax contains a pointer address, the assembly block could read or write through it, e.g., into the stack, or into the heap.

sym

The user-level explanation would benefit from an example about this using a function or a static or both.

xmm[0-7] (32-bit) xmm[0-15] (64-bit)

I suppose that ymm and zmm are also accessible here right? (these are mentioned in the "template modifiers" section below but not here). Also, the note section below should explain what "32-bit" means in this context (e.g. consider x86_64-...-gnu-x32 with 32-bit pointers but on a 64-bit architecture for the distinction).

The stack pointer must be restored to its original value at the end of an asm code block.

I see the danger of allowing to modify sp, but why should be impossible to do so?

The placeholders can be augmented by modifiers which are specified after the : in the curly braces.

An example of this would be helpful in the user-level explanation.

pure : The asm block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the asm block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used.

We don't have this restriction for const fn yet, but maybe we end up having it (e.g. such that const functions can offset pointer values, but they can't read through them and return different values for the same input value). It might make sense to call this attribute const instead of pure, or to leave that as an unresolved question. An alternative name would be readnone to specify that the function does not read or write to memory, and use a different modifier to also specify that it does not have any side-effects, Such that "pure" becomes readnone+nosideeffect.

Mapping to LLVM IR

The Cranelift issue is mentioned later, but I was wondering while reading this section: "What about cranelift?". Maybe add a hint somewhere around here that this is covered later?

Unfamiliarity

FWIW, every time I have to use inline assembly, I need to re-read GCC docs, since clang docs are horrible, and then re-read LLVM LangRef docs, and enjoy a lot of frustration to get anything done with their syntax. So I'd argue that "Unfamiliarity" is a good thing, and this RFC has been a joy to read and I have the feeling that I would actually be able to get some work done or read some code with it, without having to go through a lot of obscure compiler docs.

Should a pure asm statement with no outputs be an error or just a warning?

If we make it an error we can always relax that to a warning later, once we find convincing use cases. Code generators could "just" not emit the assembly block if its pure and has no outputs.

Should we keep the same flags for the template modifiers as LLVM/GCC? Or should we use our own?

I did not understand what the issue was here from reading the RFC. I think an example somewhere showing the template modifiers with flags would have helped.

Should we allow passing a value expression (rvalue) as an inout operand? The semantics would be that of an input which is allowed to be clobbered (i.e. the output is simply discarded).

This sounds useful.

Should we disallow the use of these registers on the frontend (rustc) or leave it for the backend (LLVM) to produce a warning if these are used?

The quality of the backend error messages is often not as good as that of rustc error messages (e.g. we could add a note explaining why this isn't allowed, and pointing users to the docs), so I very much prefer to error here in the frontend.

Do we need to add support for tied operands? Most use cases for those should already be covered by inout .

Should we support x86 high byte registers ( ah , bh , ch , dh ) as inputs/outputs? These are supported by LLVM but not by GCC, so I feel a bit uncomfortable relying on them.

Should we add formatting flags for imm operands (e.g. x to format a number as hex)? This is probably not needed in practice.

I'd wait for a convincing use cases.

Should we support memory operands ( "m" )? This would allow generating more efficient code by taking advantage of addressing modes instead of using an intermediate register to hold the computed address.

I think we should support this as a future extension, but leave this out of the initial version of this RFC.

Should we support some sort of shorthand notation for operand names to avoid needing to write blah = out(reg) blah ? For example, if the expression is just a single identifier, we could implicitly allow that operand to be referred to using that identifier.

This sounds very handy. I wonder if we could do this for the format!ing macros as well. I end up doing name = name a lot. This can be done later, as part of a different RFC that does it for all formatting macros.

What should preserves_flags do on architectures that don't have condition flags (e.g. RISC-V)? Do nothing? Compile-time error?

Compile-time error in the frontend, stating that this architecture does not have condition flags. Otherwise readers of the code might end up worrying whether that actually means something, when it does not.

Future possibilities

These all sound as things worth supporting, but better be supported in subsequent RFCs. This one is already long enough, and I think we should try to agree on the foundation first.

3 Likes

I'm kind of happy that I got to read the already improved version of this RFC.

I managed to get through the comment thread, and some more brief notes:

@comex:

I'd like for asm! to follow the same rules as the rest of the language here, e.g., if you have a MaybeUninit<&mut T> on the stack, and you trash it (e.g. set it to uninitialized()) using asm!, that ought to be ok. In the same way, if you move a variable into an asm! block, and mem::forget it, that is also ok. You can even mem::forget whole stack frames if you want, but some library types like Pin have a safety requirement that says that it is "library UB" if you do that, but those invariants are for the writers of unsafe code to maintain.

Trashing of stack variables can already happen in a number of different ways (e.g. using raw pointers), so I don't think this RFC should have to deal with this (that's more of a UCGs issue).


@mcy

Is it UB to set up an out register, not write to it, then read it on the Rust side? Is it valid garbage value, or is it uninitialized? Rustc can't possibly know, at any rate.

That sets the value to uninitialized, whether that's fine or not depends on the type. I don't like the "Rustc can't possibly know argument". Rustc can't possibly know today, but this isn't any different that a C FFI function returning a value, that C could set to uninitialized, and claiming that Rust can't possible know. A lot of people thought that, and that assumption became wrong the moment with started supporting xLTO. In the same way, we could improve asm! to optimize more aggressively by teaching it the semantic meaning of some instructions in some architectures, just like D and MSVC do. This RFC does not guarantee that this will never happen, so unless it does, code that is written under the assumption that "rustc doesn't know", its only correct as long as it is the case, and any toolchain upgrade can break it.


On the general issue of:

asm!( arch = x86_64, target_features = "avx", asm = "foo", ...);

I see the usefulness of this, e.g., this could just expand to:

#[cfg(all(target_arch = "x86_64", targe_teafure = "avx"))] asm!("foo", ...); 
#[cfg(not(all( ...)))] compile_error!("This inline asm! is only valid for x86_64 with target_features = ...")

but most libraries I work in, e.g., core::arch, which use inline assembly, guard asm! calls at a higher level, e.g., the mod core::arch::x86_64 level, and having to write that everywhere would be a pain. So I'm not sure these annotations should be required. This is the issue that should be resolved before stabilization (whether this is a must, or optional).

Having them as optional annotations seems harmless enough for me, but this means that this could be added in a future RFC since it would be a backward compatible addition. In the meantime, it shouldn't be hard to write a small macro that allows doing this.


@hanna-kruppe

I guess. I could imagine some zany schemes to catch most errors before monomorphization, but they would likely not catch all errors and possibly have some false positives too.

I can't imagine how they could catch all errors, e.g., given that we have trait associated consts that can have input types, such that something like this can fail:

fn foo<T>() { 
    // Neither the type nor the value of `FOO` are known before monomorphization
    asm!(..., imm T::FOO); 
}

I would prefer if this RFC would just ban generic parameters from being used in asm! , and to leave supporting that to a future RFC. That is, with this RFC, I would be ok if the foo example above fails with an "T:FOO is not known in this context" error.


@Lokathor

Is it wise to pick the opposite default from GCC and LLVM?

When I read a GCC asm snippet that does not have the volatile qualifier and no explanatory comment, I need to re-read the whole snippet and prove that it is indeed side-effect free, because without a comment I can't know if that happened by "accident" or was intentional. If you forget to add volatile, GCC will "mis-optimize" your code by default. I think this is a bad default.

I'd very much prefer the proposed defaults, where if people "forget", the compiler makes sure their code works for the worst case, which is that it has side-effects. And if someone goes out-of-their way to prove that their code is side-effect free, they can write the annotation, hopefully with a comment explaining why this is the case.

2 Likes

This is exactly what I feared people would want inline asm to mean. It's understandable from a certain perspective, but it's actually fundamentally incompatible with having a compiler that does any amount of optimizations. That's also why it's not even true in C. Please get this "compiler barrier" idea out of your heads so we can focus on more coherent ways to enable people to do things with inline asm.

Certainly inline asm can do many things: it can read many (not all!) memory locations, write many (not all!) memory locations, and have externally visible side effects. This gives rise to dependencies between inline asm and other operations in the program. These dependencies need to be respected as code is optimized so that correct programs are not transformed incorrectly. But that is not inline-asm-specific and there need to be limits to these dependencies.

As a basic example, we certainly want to be able to do optimizations such as the following:

fn foo() -> i32 {
    let x = 1; // optimization: remove this variable
    unknown_function();
    x // optimization: return the constant 1 here
}

But if unknown_function() includes inline asm and that inline asm is a "full compiler barrier", then this transformation would be illegal, as would literally every other optimization in the presence of calls to unknown functions.

That would be a ridiculous state of affairs, and I don't believe anyone here actually intends to do that, but that's the logical consequence of what you are proposing. And once again, it's not even how inline asm works in C compilers. For example, the "memory" clobber that @Lokathor brought up does not literally mean it can read and write to all memory. Clang and GCC (and likewise rustc via LLVM) will not change their minds about whether unknown code could possibly access a memory location just because it's asm volatile instead of an unknown function call. And for Rust, that also means inline asm must not be exempt from the aliasing restrictions that safe references carry (e.g. even if you pass a &i32 to inline asm, it's UB for the asm to overwrite the i32).

If there is a specific use case for inline asm that someone believes requires "full compiler barrier" semantics, then we can discuss that use case and how to enable it (if at all). For benchmarking purposes, the bench_black_box RFC eventually (after much exhausting arguing) worked out just fine with specifying any kind of "guaranteed optimization barrier". I really really hope that other use cases can also be resolved in a better way.

10 Likes

Given that we have not actually defined "full compiler barrier", I don't think you can make useful statements about which optimizations this forbids. I don't think anyone in this thread expects "full compiler barrier" to mean anything beyond what the block of assembly can expect to observe; I especially don't expect it to be possible for inline assembly to know anything about parent function calls; you should only get the effect you describe if inlining happens, and even then a simple, non-mutable let does not promise generating loads or stores at all.

Is this defined anywhere? It's intuitive what this should mostly mean but if we're going to use it for definitions it really needs to be written down.

I'd expect the same semantics as a GCC asm block with the "memory" constraint, nothing more, nothing less. Those semantics suffice to implement primitives needed for non-blocking synchronization techniques.

I'm not proposing that we make asm! a stronger compiler barrier than that; I'm just proposing we don't make it a weaker compiler barrier than that.