[Pre-RFC]: Inline assembly

Florob · January 1, 2018, 10:47pm

Hello and a happy new year to everyone,

as some of you may be aware I gave a summary talk on inline assembly at the Rust Cologne Meetup in June 2017 (recording, slides). One reason for that was getting information to the Rust community to start a proper discussion on this (which I mostly failed to do, due to being preoccupied). The other reason was getting myself motivated to actually do the research, so I could come up with an RFC.

So this is a first draft of that RFC. It proposes an inline assembly syntax somewhat similar to what is available in gcc and clang, but in my opinion more readable and easier to remember.

Feedback and suggestions are very welcome.

Summary

Define a stable syntax for inline assembly, meant to be portable among various backends and architectures.

Motivation

In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.

The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.

Guide-level explanation

Rust provides support for inline assembly via the asm! macro. It can be used to embed handwritten assembly in the assembly output generated by the compiler. Generally this should not be necessary, but might be where the required performance or timing cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.

Let us start with the simplest possible example:

unsafe {
    asm!("nop");
}

This will insert a NOP (no operation) instruction into the assembly generated by the compiler. Note that all asm! invocations have to be inside an unsafe block, as they could insert arbitrary instructions and break various invariants. The instructions to be inserted are listed in the first argument of the asm! macro as a string literal.

Now inserting an instruction that does nothing is rather boring. Let us do something that actually acts on data:

let x: u32;

unsafe {
    asm!("movl $5, {}", out(reg) x);
}

This will write the value 5 into the u32 variable x. You can see that the string literal we use to specify instructions is actually a template string. It is governed by the same rules as Rust format strings. The arguments that are inserted into the template however look a bit different then you may be familiar with. First we need to specify if the variable is an input or an output of the inline assembly. In this case it is an output. We declared this by writing out. We also need to specify in what kind of location the assembly expects the variable. This is called a constraint specification. In this case we put it in an arbitrary general purpose register by specifying reg. We could also have said mem telling the compiler the assembly expects a memory location for this argument. The compiler will choose an appropriate register, or memory location to insert into the template and read the variable from there after the inline assembly.

Let see another example that also uses an input:

let i: u32 = 3;
let o: u32;
unsafe {
    asm!("
        movl {0}, {1};
        addl {number}, {1};
    ", in(reg) i, out(reg) o, number = in(imm) 5);
}

This will add 5 to the input in variable i and write the result to variable o. The particular way this assembly does this is first copying the value from i to the output, and then adding 5 to it.

The example shows a few things:

First we can see that inputs are declared by writing in instead of out.

Second one of our input operands has a constraint specification we haven’t seen yet, imm. This tells the compiler to expand this argument to an immediate inside the assembly template. This is only possible for constants and literals.

Third we can see that we can specify an argument number, or name as in any format string. For inline assembly templates this is particularly useful as arguments are often used more than once. For more complex inline assembly using this facility is generally recommended, as it improves readability, and allows reordering instructions without changing the argument order.

In some cases we need an argument to be both an input and an output:

let mut bytes: u32 = 0x01_02_03_04;
unsafe {
    asm!("bswap {}", inout(reg) bytes);
}
assert_eq!(bytes, 0x04_03_02_01);

This example uses the bswap instruction to swap the byte order of the bytes variable. We can see that inout is used to specify an argument that is both input and output. This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register or memory location.

The Rust compiler is conservative with its allocation of operands. It is assumed that an out can be written at any time, and can therefore not share its location with any other argument. However, to guarantee optimal performance it is important to use as few registers as possible, so they won’t have to be saved and reloaded around the inline assembly block. To achieve this Rust provides a lateout specifier. This can be used on any output that is guaranteed to be written only after all inputs have been consumed. There is also a inlateout variant of this specifier.

Some instructions require that the operands be in a specific register. Therefore, Rust inline assembly provides some more specific constraint specifiers. While reg, mem, and imm will be available on any architecture, these are highly architecture specific. Usually a specifier for each register class, and register will be provided. E.g. for x86 the general purpose registers eax, ebx, ecx, edx, esp, ebp, esi, and edi among others can be addressed by their name.

unsafe {
    asm!("out {}, $0x64", in(eax) cmd);
}

In this example we call the out instruction to output the content of the cmd variable to port 0x64. Since the out instruction only accepts eax (and its sub registers) as operand we had to use the eax constraint specifier.

It is somewhat common that instructions have operands that are not explicitly listed in the assembly (template). Hence, unlike in regular formating macros, we support excess arguments:

fn mul(a: u32, b: u32) -> u64 {
    let lo: u32;
    let hi: u32;

    unsafe {
        asm!("mul {}", in(reg) a, in(eax) b, lateout(eax) lo, lateout(edx) hi);
    }

    hi as u64 << 32 + lo as u64
}

This uses the mul instruction to multiply two 32-bit inputs with a 64-bit result. The only explicit operand is a register, that we fill from the variable a. The second implicit operand is the eax register which we fill from the variable b. The lower 32 bits of the result are stored in eax from which we fill the variable lo. The higher 32 bits are stored in edx from which we fill the variable hi.

In many cases inline assembly will modify state that is not given as output. Usually this is either because we have to use a scratch register in the assembly, or instructions modify state that we don’t need to further examine. This state is generally referred to as being “clobbered”. We need to tell the compiler about this since it may need to save and restore this state around the inline assembly block.

let ebx: u32;
let ecx: u32;

unsafe {
    asm!("
        movl $4, %eax;
        xorl %ecx, %ecx;
        cpuid;
    ", out(ebx) ebx, out(ecx) ecx, clobber(eax, edx));
}

println!(
    "L1 Cache: {}",
    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)
);

We specify the clobbered state via a clobber argument following all inputs and outputs. In the example above we use the cpuid instruction to get the L1 cache size. This instruction writes to eax, ebx, ecx, and edx, but for the cache size we only care about the contents of ebx and ecx. Hence, we declare those as outputs, while declaring the other registers as clobbers.

Clobber specifications are generally architecture specific. The only clobber specification that is always available is mem, meaning memory that is not specified as output is being written. Other than that all architecture registers are usually available by name.

When we said earlier that the asm!("nop") statement would insert a nop instruction that was actually not the whole truth. Rust’s asm! macro is designed to allow optimization. This is another reason inputs and outputs need to be known to the compiler. If outputs of the inline assembly block are never read, or there are no outputs, the inline assembly block may be optimized away. Also if inputs don’t change across multiple invocations of an inline assembly block the compiler may assume it always yields the same result, only executing it once.

In some cases this may not be what we want. For example we may want to clear the interrupt flag on an x86 system:

unsafe {
    asm!("cli", flags(volatile));
}

As you can see in the example we do this using the cli instruction. However, this instruction has no output. We only run it for the side-effect. To avoid deletion of this inline assembly block by the optimizer we specify the volatile flag.

Flags can be provided as an optional final argument to the asm! macro. For now the only generally available flag is volatile, which enforces that the inline assembly block is always executed. However, there may be other architecture specific flags. E.g. on x86 the intelsyntax flag is provided to switch from AT&T to Intel assembly syntax.

Reference-level explanation

Inline assembler is implemented as a macro asm!(). The first argument to this macro is a template used to build the final assembly. The following arguments specify input and output operands. When required, clobbers and flags are specified as the final two arguments.

The assembler template uses the same syntax as format strings. I.e. placeholders are specified by curly braces. The corresponding arguments are accessed in order, by index, or by name. Future revisions may also use the format_spec to specify what LLVM calls template argument modifiers. However, this initial proposal elides this, as it is not necessary for inline assembly to be useful.

The following ABNF specifies the general syntax:

dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"
constraint_spec := "reg" / "mem" / "imm" / <arch specific>
operand := [ident "="] dir_spec "(" constraint_spec ")" expr
clobber_spec := "mem" / <arch specific>
clobber := "clobber(" clobber_spec ")"
flag := "volatile" / <arch specific>
flags := "flags(" flag *["," flag] ")"
asm := "asm!(" format_string *("," operand) ["," clobber] ["," flags] ")"

Direction specification

The direction specification indicates in what way the operand is being used by the generated assembly.

Five kinds of operands are supported:

in
- input operand
- may be read at any time
- may not be written
out
- output operand
- may not be read
- may be written at any time
lateout
- output operand
- may not be read
- may only be written after all inputs were consumed
inout
- input and output operand
- may be read at any time
- may be written at any time
inlateout
- input and output operand
- may be read at any time
- may only be written after all inputs were consumed

The expr given with an output must resolve to a mutable or uninitialized location.

Constraint specification

The constraint specification indicates which kinds of operand is required by the assembly template in the operands position.

Across platforms three constraint specifications are supported:

reg: the operand is placed in a general purpose register
mem: the operand is placed in a memory location
imm: the operand is an immediate

All other constraint specifications are defined per architecture. It is suggested that one exist for at least each physical register and register class (e.g. floating point register, 128-bit vector register). Names should be speaking rather than single letter acronyms. I.e. prefer for example float over f and xmm_vector over x.

Clobber specification

The clobber specification is used to indicate what state is being modified apart from the outputs. The mem clobber specification is always available. It indicates that arbitrary memory is being modified.

All other clobber specifications are defined per architecture. It is suggested that one exist for at least each physical register.

Flags

Flags are used to further influence the behaviour of the inline assembly block. The only flag defined at this point in time is volatile. The volatile flag indicates that the inline assembly block may have side-effects not indicated by inputs, outputs, or clobber (i.e. may not be optimized away).

Other flags can be defined per architecture. An intelsyntax flag for the x86 architecture should be provided.

Mapping to LLVM IR

The direction specification maps to a LLVM constraint specification as follows (using a register operand as an example):

in(reg) => r
out(reg) => =&r (Rust’s outputs are early-clobber outputs in LLVM/GCC terminology)
inout(reg) => =&r,0 (an early-clobber output with an input tied to it, 0 here is a placeholder for the position of the output)
lateout(reg) => =r (Rust’s late outputs are regular outputs in LLVM/GCC terminology)
inlateout(reg) => =r, 0 (cf. inout and lateout)

As written this RFC requires architectures to map from Rust constraint specifications to LLVM constraint codes. This is in part for better readability on Rust’s side and in part for independence of the backend:

reg is mapped to r
mem is mapped to m
a register name r1 is mapped to {r1}
additionally mappings for register classes are added as appropriate (cf. llvm-constraint)

For clobber specifications the following mappings apply:

mem is mapped to ~{memory}
a register name r1 is mapped to ~{r1} (cf. llvm-clobber)

The volatile flag is mapped to adding the sideeffect keyword to the LLVM asm statement. The intelsyntax flag is mapped to adding the inteldialect keyword to the LLVM asm statement.

Drawbacks

Unfamiliarity

This RFC proposes a completely new inline assembly format. It is not possible to just copy examples of gcc-style inline assembly and re-use them. There is however a fairly trivial mapping between the gcc-style and this format that could be documented to alleviate this.

The clobber example above would look like this in gcc-sytel inline assembly:

int ebx, ecx;
asm (
    "mov $4, %%eax;"
    "xor %%ecx, %%ecx;"
    "cpuid;"
    "mov %%ebx, %0;"
    : "=r"(ebx), "=c"(ecx) // outputs
    : // inputs
    : "eax", "ebx", "edx" // clobbers
);
printf("L1 Cache: %i\n", ((ebx >> 22) + 1)
    * (((ebx >> 12) & 0x3ff) + 1)
    * ((ebx & 0xfff) + 1)
    * (ecx + 1));

Rationale and alternatives

Implement an embedded DSL

Both MSVC and D provide what is best described as an embedded DSL for inline assembly. It is generally close to the system assembler’s syntax, but augmented with the ability to directly access variables that are in scope.

// This is D code
int ebx, ecx;
asm {
    mov EAX, 4;
    xor ECX, ECX;
    cpuid;
    mov ebx, EBX;
    mov ecx, ECX;
}
writefln("L1 Cache: %s",
    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1)
    * ((ebx & 0xfff) + 1) * (ecx + 1));

// This is MSVC C++
int ebx_v, ecx_v;
__asm {
    mov eax, 4
    xor ecx, ecx
    cpuid
    mov ebx_v, ebx
    mov ecx_v, ecx
}
std::cout << "L1 Cache: "
    << ((ebx_v >> 22) + 1) * (((ebx_v >> 12) & 0x3ff) + 1)
        * ((ebx_v & 0xfff) + 1) * (ecx_v + 1))
    << '\n';

While this is very convenient on the user side in that it requires no specification of inputs, outputs, or clobbers, it puts a major burden on the implementation. The DSL needs to be implemented for each supported architecture, and full knowledge of the side-effect of every instruction is required.

This huge implementation overhead is likely one of the reasons MSVC only provides this capability for x86, while D at least provides it for x86 and x86_64. It should also be noted that the D reference implementation falls slightly short of supporting arbitrary assembly. E.g. the lack of access to the RIP register makes certain techniques for writing position independent code impossible.

As a stop-gap the LDC implementation of D provides a llvmasm feature that binds it closely to LLVM IR’s inline assembly.

The author believes it would be unfortunate to put Rust into a similar situation, making certain architectures a second-class citizen with respect to inline assembly.

Provide intrinsics for each instruction

In discussions it is often postulated that providing intrinsics is a better solution to the problems at hand. However, particularly where precise timing, and full control over the number of generated instructions is required intrinsics fall short.

Intrinsics are of course still useful and have their place for inserting specific instructions. E.g. making sure a loop uses vector instructions, rather than relying on auto-vectorization.

However, inline assembly is specifically designed for cases where more control is required. Also providing an intrinsic for every (potentially obscure) instruction that is needed e.g. during early system boot in kernel code is unlikely to scale.

Make the `asm!` macro return outputs

It has been suggested that the asm! macro could return its outputs like the LLVM statement does. The benefit is that it is clearer to see that variables are being modified. Particular in the case of initialization it becomes more obvious what is happening. On the other hand by necessity this splits the direction and constraint specification from the variable name, which makes this syntax overall harder to read.

fn mul(a: u32, b: u32) -> u64 {
    let (lo, hi) = unsafe {
        asm!("mul {}", in(reg) a, in(eax) b, lateout(eax), lateout(edx))
    };

    hi as u64 << 32 + lo as u64
}

Unresolved questions

Clobbers

What actually can/has to be clobbered is somewhat unclear. The LLVM IR documentation claims that only explicit register constraints and ~{memory} are supported. Yet clang generates IR that has additional constraints. E.g. it will forward a cc (condition code) clobber from C inline assembly.

Flags

Is volatile or sideeffect a better flag name? LLVM internally uses sideeffect which seems to describe the more accurately. However, volatile is the more familiar name.

djc · January 2, 2018, 10:42am

I am not very experienced with inline ASM in other languages, so here are just some remarks from an interested RFC reader:

You hint at differences between syntax in GCC and/or clang and the syntax you’re proposing here, stating that there’s a straightforward mapping. It would be nice if the RFC had some actual examples for other syntax (this also goes for the D/MSVC approach) to allow the reader to judge for themselves (without doing a bunch of research) how different they are and how straightforward the translation is.
What’s the value of allowing multiple instructions in a single asm!() call? It seems that that will make the code harder to understand when you have lots of instructions and/or lots of formatting arguments. If there is a trade-off being made here (favoring compactness, or making sure that instructions end up right next to each other in the binary?) it might be good to make that more explicit (and also compare to other implementations).

cuviper · January 2, 2018, 4:50pm

Generally, you can't assume anything between distinct asm blocks -- like which registers are still alive, or even what order they will execute. You'd have to chain inputs and outputs between all of your distinct blocks, which is cumbersome and might not even be possible for advanced registers, flags, etc.

KasMA1990 · January 2, 2018, 5:17pm

Without knowing anything about inline assembly, I actually read the entire RFC thinking you were describing the syntax for the existing macro, to better compare with the suggestion you wanted to make (a comparison that never comes of course).

It would be nice if it was clearer that this syntax is meant as the replacement for the existing one

shepmaster · January 2, 2018, 8:29pm

Is there any rough feeling on the effort to implement this proposal today (maybe a preprocessing step, a compiler plugin, or even a branch)? I’ve written a bit of Rust inline assembly for X86 and AVR, so it would be interesting to “port” the current asm! syntax to the proposed one to get some real-world feedback.

Florob · January 2, 2018, 10:14pm

@djc I’ve added examples for the MSVC, D, and gcc syntaxes in appropriate places. I hope this makes it easier to follow.

@KasMA1990 I’m sorry you went through the trouble of reading the whole thing waiting for something that never came. The point of at least the “Guide-level explanation” is to “explain the proposal as if it was already included”. I’m not sure I can do that in readable way, while still making sure the reader is fully aware that this is something yet to be implemented.

@shepmaster I’m not too familiar with compiler internals so I have no proper estimate. I’d expect that it would be relatively trivial. The part involving most work should be the per-architecture specification of valid constraints and clobbers. Personally I definitely won’t have time to do it any time soon.

Amanieu · January 2, 2018, 10:18pm

I really like this! Some feedback:

Some architectures use special characters in register name, so it might be better to put register names in quotes: in("eax").
- x86 uses st(1), st(2)
- MIPS uses $0, $1
Figuring out a short name for some constraints is not trivial, it might be easier to just stick with the existing GCC single-letter contraints. In particular some constraints can be very complex:
- (PowerPC) P: An immediate integer constant whose negation is a signed 16-bit constant.
Have you considered simply using a bare volatile instead of flags(volatile)? Additional flags can be added after that.
An asm! with no outputs is meaningless if the volatile flag isn’t specified. It should be a compile-time error for a non-volatile asm! to have no outputs. Previous discussion.
Template argument modifiers are absolutely required in practice. I make heavy use of them in my code (ARM64 assembly). I think that we can just reuse LLVM’s single-letter modifiers here since these are used in the format string: mov {0:w}, {1:x}
The RFC should specify the rules for the clobber list. I think we should follow LLVM here: a register can’t both be an output and clobbered. However it is fine for a register to be an input and clobbered. (This only applies to explicit register inputs/outputs).
I suggest adding an additional direction specification tmp to deal with temporary registers and clobbered inputs:
- An input value may be specified, or _ can be used to indicate that the temporary register has no initial value.
- This is equivalent to an inout (with initial value) or out (without initial value) where the output value is simply discarded after the asm!. This is how it is done is C but this is very unintuitive.
You might want to specify that, like format strings, you can escape braces with {{ and }}. This is needed for certain ARM instructions.

Regarding your questions:

Clobbers: LLVM recognizes memory and cc on all architectures. Other values are architecture-specific, search for GCCRegNames in https://github.com/llvm-mirror/clang/tree/master/lib/Basic/Targets.
Flags: I don’t really have a preference for either volatile or sideeffect.

main · January 3, 2018, 12:52am

First of all, I think I like the formatstring-inspired approach. Some thoughts:

Perhaps specifying parameters in terms of borrowing makes sense? in maps to &, inout is &mut and out is &out (not a part of the language but well-known by now, I think).
I don’t like the syntax a(b) c (feels too much like b is optional?) but I know a bikeshed when I see one so whatever.
I would generally separate two kinds of constraints: Those that select one specific register (eax) and those that merely constrain the compiler’s selections to a set of registers (reg). Parameters that are not directly referenced (“excess parameters”) should only be allowed if they belong to the first group.
Please give me something better than having to add an intel flag to every single asm! I ever write. Beware, this claim is mostly guesswork, but AFAIK the ATT syntax is only so widespread because that’s what gcc forces you to use. I have heard plenty of opinions arguing in favor of intel syntax and I personally don’t know anyone who prefers ATT syntax. The only reason I’m not arguing for intel syntax to be the default is that I vaguely remember something about it not being 100% correctly supported in LLVM in some corner cases (though that may have changed by now, it’s been a few years). I’m fine with keeping ATT as default, but that default must be easily configurable (maybe asm-flavor = intel in Cargo.toml). Wrapping every inline asm with .intel_syntax\n/.att_syntax\n in GCC is a non-starter and the same goes for Rust. (Sorry but I actually have a strong opinion on this point)
flags(volatile) is yet another bikeshed but thinking about it, I got one interesting idea: Losing an entire asm!() to the optimizer because you forgot to specify it as volatile is a huge footgun! Shouldn’t the default be to mark it as volatile and then have a flag that allows you to opt out in case you really need the optimization?
Since an inout register is always overwritten and the compiler can’t possibly insert code to restore its value in the middle of an asm!(), how does it differ from inlateout?
The RFC should specify what kinds of things I can actually write into parameters (in = rvalue, out = lvalue ?). Immediates are the interesting part. Literals only would be too restrictive, at least constants should be allowed. Constfn is relevant.
I agree that constraints should have descriptive names, I love it in fact! Mapping arbitrary letters/symbols to actual semantics is insane!

gnzlbg · January 3, 2018, 8:54am

main:

Please give me something better than having to add an intel flag to every single asm! I ever write. Beware, this claim is mostly guesswork, but AFAIK the ATT syntax is only so widespread because that’s what gcc forces you to use. I have heard plenty of opinions arguing in favor of intel syntax and I personally don’t know anyone who prefers ATT syntax. The only reason I’m not arguing for intel syntax to be the default is that I vaguely remember something about it not being 100% correctly supported in LLVM in some corner cases (though that may have changed by now, it’s been a few years). I’m fine with keeping ATT as default, but that default must be easily configurable (maybe asm-flavor = intel in Cargo.toml). Wrapping every inline asm with .intel_syntax\n/.att_syntax\n in GCC is a non-starter and the same goes for Rust. (Sorry but I actually have a strong opinion on this point)

If asm! is a real macro we might just want to have different macros for the different syntaxes, e.g., asm_att!, asm_intel!, and with macros 2.0 you would just write:

use core::asm_intel as asm;

asm!(... uses intel syntax ...);

flags(volatile) is yet another bikeshed but thinking about it, I got one interesting idea: Losing an entire asm!() to the optimizer because you forgot to specify it as volatile is a huge footgun! Shouldn’t the default be to mark it as volatile and then have a flag that allows you to opt out in case you really need the optimization?

Interesting thought.

comex · January 3, 2018, 9:18am

Nitpick: AFAIK, it’s not meaningless (i.e. it won’t be removed) if it’s specified to clobber memory, even without volatile.

vi0 · January 3, 2018, 12:08pm

Can it work for asmjs / webassemly?

Javascript code inside asm! for asmjs would look funny although.

matthieum · January 3, 2018, 6:19pm

How easy is it to detect on which architecture this assembly code can run when inspecting a source file?

I would generally expect an asm! call to be extremely platform specific; would it make sense to restrict the usage of the asm! macro to functions which are platform specific, for example, so that it is made clear that this piece of code is only valid for x86/x86_64 and cannot be compiled to ARM?

vi0 · January 3, 2018, 8:04pm

What if entire crate/module is platform-specific?

Optionally declaring broad syntax family for each individual asm! like asm_x86!("...") may be useful for readability although.

There may be a clippy lint that for each asm! there must be exactly one mentioned architecture name (from a hard coded list?) in the { function name -> impl type name -> module name -> crate name + cfg feature flag on any of the preceeding chain element } path.

For example:


// OK:
mod x86 {
     struct Qqq;
     impl Qqq {
         fn www() {
             unsafe{ asm!(""); }
         }
     }
}

// OK:
mod mmm {
     struct X86;
     impl X86 {
         fn www() {
             unsafe{ asm!(""); }
         }
     }
}


// OK:
mod mmm {
     struct Qqq;
     impl Qqq {
         fn www_x86_something() {
             unsafe{ asm!(""); }
         }
     }
}


// Fail: no arch in path
mod mmm {
     struct Qqq;
     impl Qqq {
         fn www() {
             unsafe{ asm!(""); }
         }
     }
}


// Fail: more than one arch in path
mod x86 {
     struct Arm;
     impl Arm {
         #[cfg(target_arch="asmjs")]
         fn for_mips() {
             unsafe{ asm!(""); }
         }
     }
}

Amanieu · January 3, 2018, 8:19pm

I think using per-architecture marcos may be a good idea (asm_arm!, asm_intel!, etc). I see two main benefits

This neatly solves the issue of intel vs att syntax for x86.
This avoids breaking existing code which uses asm! during the transition period.

mark-i-m · January 3, 2018, 10:09pm

Personally, I have always hated the traditional inline assembly format and I always find the LLVM/GCC documentation incredibly painful to read and understand.

I really appreciate the way @Florob described the meaning of the syntax, and I think if the feature is ever accepted, this should end up in The Book.

One thing that bothers me about the traditional syntax is how painful it is to get right:

Using positional arguments means that I have to be good at counting (and I’m not :’( ). It also means that as I am editing/optimizing my inline assembly, all of the positions change, and I have to go back and edit everything again.
Putting clobbers/flags/etc not right next to the relevant instructions means that if I later change the code, I have a harder time determining if I should change clobbers/flags.

I would rather have something of the following format (I don’t feel tied to the syntax; I just picked something that seems to work):

let w: u32;
let x: u32;
let y: u32;
asm_x86_att! {
    "mov" : in reg "eax" x : out clobber reg "ebx" y,
    "mov" : in mem "(eab)" : out clobber mem "ecx",
    "nop" : volatile,
    "xor" : inout clobber reg "eax" w : volatile,
}

The key idea is that arguments, clobbers, and flags are specified next to the relevant instruction. I haven’t thought through all the details, but I think it should work. For me at least, this would vastly improve code maintainability and development speed.

Any thoughts?

hanna-kruppe · January 3, 2018, 10:32pm

That seems like it would introduce a lot of redundancy when inputs, outputs, clobbered registers, etc. occur more than once, with all the usual problems of redundancy. Even in your example there’s already two copies of volatile.

I also find it confusing that your proposed syntax does not name the instruction operands in the format strings, apparently instead inferring them solely from the constraints?! Do you mean to propose that as well?

mark-i-m · January 3, 2018, 11:24pm

That seems like it would introduce a lot of redundancy when inputs, outputs, clobbered registers, etc. occur more than once, with all the usual problems of redundancy. Even in your example there’s already two copies of volatile.

Hmm... that is true. My intention is that every place that requires the flag (e.g. volatile) is annotated that way. In my example, if I later decided that I wanted to take out the nop it is trivial to know that volatile is still needed because the xor is also annotated volatile. Likewise, if I wanted to take out the last two instructions, it is trivially clear that volatile is not needed any more.

Would shorter annotations help (e.g. vol instead of volatile)? I'm not sure what to do about this. Frankly, most of the inline assembly I have ever written is pretty short (<30 LOC per function), so I would gladdly take the redundancy hit.

I also find it confusing that your proposed syntax does not name the instruction operands in the format strings, apparently instead inferring them solely from the constraints?! Do you mean to propose that as well?

Sorry, I should have made this more clear. No, I don't want to propose this sort of inference. I was trying to propose that the format would be something roughly like this:

(INST (":" ARG_WITH_ANNOTATIONS)* (":" EXTRA_FLAGS)? ",")+

where INST is "mov", ARG_WITH_ANNOTATIONS is in reg "eax" x, etc., and extra flags could be volatile.

I see that my example actually is incorrect:

"xor" : inout clobber reg "eax" w : volatile

should be

"xor" : inout clobber reg "eax" w : inout clobber reg "eax" w : volatile

I do see the annoying-ness of this, though... What if we instead had per-instruction positional arguments:

let w: u32;
let x: u32;
let y: u32;
asm_x86_att! {
    "mov {0}, {1}" : in reg "eax" x, out clobber reg "ebx" y;
    "mov {0}, {1}" : in mem "(eab)", out clobber mem "ecx";
    "nop"          : volatile;
    "xor {0}, {0}" : inout clobber reg "eax" w, volatile;
}

@hanna-kruppe Does that seem any better to you?

hanna-kruppe · January 4, 2018, 12:01am

No. They might make it slightly faster to write, but they don't do anything about the duplication of knowledge, and probably decrease readability.

That doesn't seem to have a place for constraints that don't directly correspond to operands explicitly listed in the asm syntax (e.g., EDX and EAX in x86 mul), or indeed any slightly irregular instruction.

Slightly better re: redundancy, but doesn't address the redundancy across instruction boundaries. Does solve the complaint about constraints that don't occur in the asm syntax.

Honestly I don't think this is a problem that needs solving. My experience with inline asm is admittedly even more limited than yours, but it seems to me that any per-instruction information you might want to leave for future maintainers could just as well be a comment on the instruction. This way you don't have any redundancy you don't want or need. I also don't believe it is responsible to edit any part of an inline asm statement without taking the time to very carefully consider all parts of it. I absolutely see reason to keep notes that help with this, of course, but the enforced solution you propose doesn't seem the best way to do that.

CensoredUsername · January 4, 2018, 12:16am

I’m definitely a fan of work on stabilizing the use of inline asm in rust. A few remarks though:

{}'s are less than ideal for argument substitution. If they are to be used for it it will necessitate that any use of {} in actual assembly syntax is escaped (like ARM register lists, or x64 AVX-512 mask register syntax). I’m not really a fan of that, but unfortunately there just aren’t easy options around it.
As someone who’s written an assembler DSL for rust (see dynasm-rs), I would completely agree on not moving such DSLs into the compiler. Implementing them for even one architecture is a rather huge amount of work, and it suffers heavily from the issue that you’re going to be introducing yet another slightly-different assembly syntax due to how irregular some assembly formats are. DSLs have one big bonus though, which is that they can provide better error reporting to the user.
What I’m mostly missing in this proposal is how errors in assembly will be presented to the user. When the backend compiler spots an error in the generated assembly, how will this be presented to the user?
Mostly as a solution to the last two points: We could get the best of both worlds essentially by ensuring that the final asm! syntax is something that can easily be generated by a procedural macro. That way, the compiler will only have to support a simple asm! format that can, with some trivial changes, be passed on to the backend, while proper DSLs that handle variable substitution and error handling can be implemented in their own crates. Meanwhile the DSLs wouldn’t be baked in the compiler and could therefore be easily modified to fit people’s tastes.

main · January 4, 2018, 1:24am

@mark-i-m I feel like your proposal is missing … a coherent mental model? Like, the way inline asm currently works in both LLVM and GCC is that you have a block of instructions that are inserted into the binary almost verbatim, parametrized only by register allocation. Properties like volatililty, clobbering or inputs/outputs never apply to a single instruction but always to the block as a whole. So obviously I have a statistic significant covert channel inside the speculative execution. It just doesn’t make sense:

If I use a scratch register, I need to mark it as clobbered - UNLESS of course, I save and restore it. In this example, the inner instructions do clobber that register, but the entire block does not.

The way inline asm works is that the asm is just a black box and you define an interface (in, out, inout, clobbers, etc) for the entire thing, not for parts of it. Because the parts on their own are meaningless - there are no parts.

Topic		Replies	Views
[Pre-RFC #2]: Inline assembly language design	161	10703	March 15, 2020
Stabilization path for asm!()? language design	11	3317	March 25, 2019
Richer inline asm compiler	4	1067	November 19, 2022
Inline assembly syntax internals	7	4892	March 25, 2019
Should we ever stabilize inline assembly? language design	149	9491	March 15, 2020

[Pre-RFC]: Inline assembly

Summary

Motivation

Guide-level explanation

Reference-level explanation

Direction specification

Constraint specification

Clobber specification

Flags

Mapping to LLVM IR

Drawbacks

Unfamiliarity

Rationale and alternatives

Implement an embedded DSL

Provide intrinsics for each instruction

Make the asm! macro return outputs

Unresolved questions

Clobbers

Flags

Related topics

Make the `asm!` macro return outputs