[Pre-RFC]: Inline assembly


#41

I’m not quite sure what you mean. You don’t need to any constraints for intermediate values that only live in the asm, you can just pick a specific register and use it (and list it in the clobbers, obviously), e.g.,

asm!("
    movl rsi, {ptr}
    movl rdi, rsi
    addl rdi, {len}
    ; loop, incrementing rsi until rsi==rdi
",
    in(reg) ptr=slice.as_ptr(),
    in(reg) len=slice.len(),
    clobber(rsi, rdi),
    ...);

#42

Working with inline assembly was a truly awful experience when I had to spend a couple of weeks wrestling with it about a year ago - from ICEs caused by LLVM assertions that would give unhelpful errors like Assertion `Val && "isa<> used on a null pointer (hint: convert your fn items to fn pointers), to bits of asm being thrown away even when marked as volatile, to (if I’m reading my old notes right) having no way of passing an extern fn into an asm block as a pointer, to the bread and butter of just ICEing if something is a bit wrong with your parameters. I’ve written assembly in C but doing it in Rust is (was?) so much worse that for any project with non-trivial amounts of assembly, I will actually go back to using C.

I don’t see many similar complaints in this thread though, so maybe I’m an outlier? I’m very much in favour of anything that moves things past “perma-unstable hand it all down to LLVM” just because more people might hit these bugs and extra validation can be added into Rust or LLVM can be fixed. I’d even tolerate stabilising as is and just making error reporting better (and add docs with a cheat-sheet of constraints etc).


On the RFC itself: Apologies for the minor derail, and thanks for the RFC, it’s nice to see more movement in this area :slight_smile: I do like the proposal, I like using words for in/out/clobber constraints and I like that we have to do work in Rust-land before passing it down to LLVM. I’d like a bit more detail on what exactly you mean by “additionally mappings for register classes are added as appropriate”? Are you meaning they’ll be added on an as-needed basis, as LLVM says they will? If so, I would like to request “i” right off the bat.


#43

Please do not elide them! Certain common things are completely impossible without argument modifiers, e.g. it’s not possible to write this call in libfringe in any other way (or at least I looked at the LLVM sources and did not discover one).


#44

This is extremely not discoverable, as I had to read LLVM sources to figure it out, but it’s doable with argument modifiers (as I’ve just described).


#45

So I went back and rewrote some of my old assembly to use the proposed syntax. I didn’t really have any of the problems I expected (i.e. @rkruppe was right :slight_smile: ) , so I don’t really have any strong opposition to this RFC anymore…

One minor nit:

It would be nice to be able to look at a piece of inline assembly and trivially tell that there are no inputs/outputs… I guess we could use comments, which is what I did, but syntax would be nicer. The old syntax solved this problem by enforcing a section (separated by “:”) for ins, outs, clobbers, and flags. We could do the same in the new macro:

    let x: u64;
    let y: u64;

    asm_x86!{
        "
        push %rax
        movq $0, {x}
        movq $0, {y}
        "

        in { }

        out {
            x = rax x,
            y = late rcx y, // Rather than lateout; can additionally have `in`
        }

        clobber {
            "rax", "rsp"
        }

        flags { }
};

#46

For a real-world example, here is some assembly code (ARM64) which I am using in my current project:

// Common code for interruptible syscalls
macro_rules! asm_interruptible_syscall {
    () => {
        r#"
            # If a signal interrupts us between 0 and 1, the signal handler
            # will rewind the PC back to 0 so that the interrupt flag check is
            # atomic.
            0:
                ldrb ${0:w}, $2
                cbnz ${0:w}, 2f
            1:
               svc #0
            2:

            # Record the range of instructions which should be atomic.
            .section interrupt_restart_list, "aw"
            .quad 0b
            .quad 1b
            .previous
        "#
    };
}

// There are other versions of this function with different numbers of
// arguments, however they all share the same asm code above.
#[inline]
pub unsafe fn interruptible_syscall3(
    interrupt_flag: &AtomicBool,
    nr: usize,
    arg0: usize,
    arg1: usize,
    arg2: usize,
) -> Interruptible<usize> {
    let result;
    let interrupted: u64;
    asm!(
        asm_interruptible_syscall!()
        : "=&r" (interrupted)
          "={x0}" (result)
        : "*m" (interrupt_flag)
          "{x8}" (nr as u64)
          "{x0}" (arg0 as u64)
          "{x1}" (arg1 as u64)
          "{x2}" (arg2 as u64)
        : "x8", "memory"
        : "volatile"
    );
    if interrupted == 0 {
        Ok(result)
    } else {
        Err(Interrupted)
    }
}

This is what it would look like under the original proposed syntax (I made one minor change, I inverted in the volatile flag and renamed it to pure).

// Common code for interruptible syscalls
macro_rules! asm_interruptible_syscall {
    () => {
        r#"
            # If a signal interrupts us between 0 and 1, the signal handler
            # will rewind the PC back to 0 so that the interrupt flag check is
            # atomic.
            0:
                ldrb {interrupted:w}, {interrupt_flag}
                cbnz {interrupted:w}, 2f
            1:
               svc #0
            2:

            # Record the range of instructions which should be atomic.
            .section interrupt_restart_list, "aw"
            .quad 0b
            .quad 1b
            .previous
        "#
    };
}

// There are other versions of this function with different numbers of
// arguments, however they all share the same asm code above.
#[inline]
pub unsafe fn interruptible_syscall3(
    interrupt_flag: &AtomicBool,
    nr: usize,
    arg0: usize,
    arg1: usize,
    arg2: usize,
) -> Interruptible<usize> {
    let result;
    let interrupted: u64;
    asm!(
        asm_interruptible_syscall!(),
        interrupted = out (reg) interrupted,
        interrupt_flag = in (mem) interrupt_flag, // Does (mem) take an address or an lvalue?
        lateout ("x0") result,
        in ("x8") nr as u64,
        in ("x0") arg0 as u64,
        in ("x1") arg1 as u64,
        in ("x2") arg2 as u64,
        clobber("x8", "memory"),
        // volatile is implied by the absence of the "pure" flag
    );
    if interrupted == 0 {
        Ok(result)
    } else {
        Err(Interrupted)
    }
}

I feel that the new syntax is a lot nicer to use, especially since it supports named parameters and doesn’t require outputs to come before inputs. To make it easier to use in macros, I would suggest make the ordering fully flexible: clobbers and flags can occur at any position and multiple clobber lists are allowed (a union of them is used as the actual clobber list).

Having to explicitly say lateout instead of out is great since it makes you double-check that you aren’t reading any inputs after writing to the output. Making out a safe default will greatly help people who are new to inline asm.


#47

One thing which suddenly strikes me when reading your example is that these two in/out specifiers feel somewhat redundant in the new syntax:

interrupted = out (reg) interrupted,
interrupt_flag = in (mem) interrupt_flag,

If mapping a Rust variable into an inline ASM identifier of the same name is felt to be a frequent use case, perhaps a shorthand which avoids repeating the identifier could be introduced ?


#48

Perhaps we could just use the variable name as the named argument:

interrupted = out (reg),
interrupt_flag = in (mem),

#49

This is actually trickier than it seems since the value attached to a register can an expression (e.g. arg0 as u64) and is not necessarily just an identifier.


#50

It could take a queue from struct construction, where you can write either Foo { bar: some_expr }, or just Foo { bar } if there’s a bar in scope.


#51

At the risk of starting the dreaded bikeshed discussion, I feel like this kind of elision would be easier to introduce in a syntax where the asm binding name comes after the in()/out() specifier, such as the one which @rkruppe hinted at previously:

in(reg) ptr=slice.as_ptr(),
in(reg) len=slice.len(),

In this case, if there is already a “ptr” and a “len” in scope, one would just reword it as:

in(reg) ptr,
in(reg) len,

The main limitation is that this introduces a parsing ambiguity if both expressions (for anonymous arguments) and bindings (for named arguments) are allowed. As far as I can see, such a syntax would only be viable if naming arguments becomes mandatory (which, as @mark-i-m pointed out earlier, can be an unnecessary annoyance for small snippets).


#52

I think that this isn’t a huge issue and that we should just try to stick close to the existing format string syntax.


#53

I generally agree, was just wondering if there was an easy way to sugar out this kind of repetition. But there does not seem to be one, and this minor issue is not worth going for the hard way :slight_smile:


#54

I have a proposal for this problem:

  • Force crates to specify supported architectures in their Cargo.toml (default is ‘all’). The list is passed to the compiler.
  • Asm statements are given the architecture as a parameter (or perhaps have an asm_<arch> macro per architecture).
  • Asm statements can be placed in 2 locations:
    • An arch_match! procedural macro
    • A function marked with a special per-architecture attribute (#[architecture(<arch>)])
      • These functions can call eachother, like unsafe functions can

arch-match

A macro that opens an architecture specific scope, per architecture. must be exhaustive across all crate-supported architectures.

Example usage:

arch_match! {
    x86 => asm_x86!(...),
    armv7 => asm_armv7(...),
    all => unoptimized_rust_function(),
}

If for some reason, some asm statement is only relevant for a single architecture, one can just match on it and ‘all’, and do nothing in ‘all’:

arch_match! {
    x86 => asm_x86!("important_instruction"),
    all => { },
}

Asm subtyping

I don’t know exactly how this should work, but for example, newer amd64 cpus subtype older amd64 cpus. Everything subtypes ‘all’. Maybe it should be more trait-like? (specify amd64 + sse2 as the architecture)

The ‘all’ architecture

Only rust code can run in this context.


#55

You may be interested in the portability lint RFC, which already handles that sort of problem via the existing #[cfg] attribute.


#56

I am not sure the arch_match is the best idea, because of the subtyping that you mention.

It’s pretty normal to write a different assembly version for x86 depending on the presence of sse2, sse4, avx, avx512.

Since we already have a way for a function to be specialized per architecture (cfg) and for a function to enable CPU features (feature(enable = "sse4")?), maybe it would be possible to simply decline the macros on a per-architecture basis and then place a assert_cfg!(x86) inside the asm_x86 macro for example.

A crude way would be to declare an empty function only on x86 via cfg, then reference that function in the asm_x86 macro, causing a compile-time failure if the macro is invoked on a non x86 platform.


#57

The target feature RFC covers those use case: https://github.com/rust-lang/rfcs/blob/master/text/2045-target-feature.md.


#58

Personally, I feel like this would be really complicated to implement well and would involve a lot of arch-specific knowledge in the compiler. The current approach of using a function with #[cfg(...)] seems both more idiomatic and easier to implement.


#59

@Florob

I think it is time to submit this as an eRFC so that we can agree if this is the direction in which we want inline assembly to go without committing to all implementation details (e.g. which exact constraints we want to expose in stable Rust, etc.). It would help if the eRFC would be worded in such a way that conveys that.

Once we have agreed on the general direction (and people working on alternative Rust backends have given feedback), we can start the implementation, and some time later, submit an RFC for an MVP with a subset of all of this that could be useful for stabilization and has been made to work.

Would you be interested in submitting such an eRFC based on this pre-RFC ?


#60

I love this proposal, and I’d love to see it as the stable asm! syntax in Rust.

A few additional thoughts:

  • Based on this syntax, it seems reasonably straightforward to build a gcc_asm! or similar that translates from GCC assembly syntax, for convenience of porting. Or tools to mechanically translate from one syntax to the other.

  • I’d love to see shorthand for a {name} that matches the variable name passed in, to avoid having to write name=in(reg) name. (This assumes that the same name doesn’t get passed multiple times, and of course expressions other than identifiers would need explicit names.)

  • For the same reason that this proposal made earlyclobber the default and required an explicit lateout to optimize if you know you won’t clobber early, I find myself really tempted to make volatile the default and add some explicit annotation for non-volatile assembly. On the other hand, the explicit requirement to have either outputs or volatile and making it an error to not have at least one of those would help avoid common issues.

  • When naming constraints, please note that names like eax make an unwarranted assumption; for instance, GCC’s a constraint on x86 can turn into al, ax, eax, or rax, depending on the size of what you pass into it. Also note the constraint modifiers that let you (for instance) access the sub-registers of a register; those should probably become {:modifiers} or similar. These translations are both 1) why I find this proposal so awesome and 2) the largest amount of work in this proposal.

  • Regarding the idea of having asm! return outputs, while that won’t work in the general case, I like the idea of writing out(reg) return for an output and having the result used as the return value of the asm!. That would significantly shorten common cases that would otherwise have to declare and return a temporary. (It would require handling types carefully, though.)

  • If LLVM supports them, please support flag-output constraints as well, turning them into bool outputs (or return as above); this makes it possible to use the result directly in a conditional and have the compiler turn that into the appropriate conditional jump instruction.

Overall, I’d love to see this proposal turn into an RFC, and I’d be happy to help with it as well. I don’t think you need a complete specification of all the constraint translations as part of the RFC, just a representative sample of all the types of constraints and guidelines for adding new ones.