[Pre-RFC]: Inline assembly


#16

That seems like it would introduce a lot of redundancy when inputs, outputs, clobbered registers, etc. occur more than once, with all the usual problems of redundancy. Even in your example there’s already two copies of volatile.

I also find it confusing that your proposed syntax does not name the instruction operands in the format strings, apparently instead inferring them solely from the constraints?! Do you mean to propose that as well?


#17

That seems like it would introduce a lot of redundancy when inputs, outputs, clobbered registers, etc. occur more than once, with all the usual problems of redundancy. Even in your example there’s already two copies of volatile.

Hmm… that is true. My intention is that every place that requires the flag (e.g. volatile) is annotated that way. In my example, if I later decided that I wanted to take out the nop it is trivial to know that volatile is still needed because the xor is also annotated volatile. Likewise, if I wanted to take out the last two instructions, it is trivially clear that volatile is not needed any more.

Would shorter annotations help (e.g. vol instead of volatile)? I’m not sure what to do about this. Frankly, most of the inline assembly I have ever written is pretty short (<30 LOC per function), so I would gladdly take the redundancy hit.

I also find it confusing that your proposed syntax does not name the instruction operands in the format strings, apparently instead inferring them solely from the constraints?! Do you mean to propose that as well?

Sorry, I should have made this more clear. No, I don’t want to propose this sort of inference. I was trying to propose that the format would be something roughly like this:

(INST (":" ARG_WITH_ANNOTATIONS)* (":" EXTRA_FLAGS)? ",")+

where INST is "mov", ARG_WITH_ANNOTATIONS is in reg "eax" x, etc., and extra flags could be volatile.

I see that my example actually is incorrect:

"xor" : inout clobber reg "eax" w : volatile

should be

"xor" : inout clobber reg "eax" w : inout clobber reg "eax" w : volatile

I do see the annoying-ness of this, though… What if we instead had per-instruction positional arguments:

let w: u32;
let x: u32;
let y: u32;
asm_x86_att! {
    "mov {0}, {1}" : in reg "eax" x, out clobber reg "ebx" y;
    "mov {0}, {1}" : in mem "(eab)", out clobber mem "ecx";
    "nop"          : volatile;
    "xor {0}, {0}" : inout clobber reg "eax" w, volatile;
}

@rkruppe Does that seem any better to you?


#18

No. They might make it slightly faster to write, but they don’t do anything about the duplication of knowledge, and probably decrease readability.

That doesn’t seem to have a place for constraints that don’t directly correspond to operands explicitly listed in the asm syntax (e.g., EDX and EAX in x86 mul), or indeed any slightly irregular instruction.

Slightly better re: redundancy, but doesn’t address the redundancy across instruction boundaries. Does solve the complaint about constraints that don’t occur in the asm syntax.


Honestly I don’t think this is a problem that needs solving. My experience with inline asm is admittedly even more limited than yours, but it seems to me that any per-instruction information you might want to leave for future maintainers could just as well be a comment on the instruction. This way you don’t have any redundancy you don’t want or need. I also don’t believe it is responsible to edit any part of an inline asm statement without taking the time to very carefully consider all parts of it. I absolutely see reason to keep notes that help with this, of course, but the enforced solution you propose doesn’t seem the best way to do that.


#19

I’m definitely a fan of work on stabilizing the use of inline asm in rust. A few remarks though:

  • {}'s are less than ideal for argument substitution. If they are to be used for it it will necessitate that any use of {} in actual assembly syntax is escaped (like ARM register lists, or x64 AVX-512 mask register syntax). I’m not really a fan of that, but unfortunately there just aren’t easy options around it.
  • As someone who’s written an assembler DSL for rust (see dynasm-rs), I would completely agree on not moving such DSLs into the compiler. Implementing them for even one architecture is a rather huge amount of work, and it suffers heavily from the issue that you’re going to be introducing yet another slightly-different assembly syntax due to how irregular some assembly formats are. DSLs have one big bonus though, which is that they can provide better error reporting to the user.
  • What I’m mostly missing in this proposal is how errors in assembly will be presented to the user. When the backend compiler spots an error in the generated assembly, how will this be presented to the user?
  • Mostly as a solution to the last two points: We could get the best of both worlds essentially by ensuring that the final asm! syntax is something that can easily be generated by a procedural macro. That way, the compiler will only have to support a simple asm! format that can, with some trivial changes, be passed on to the backend, while proper DSLs that handle variable substitution and error handling can be implemented in their own crates. Meanwhile the DSLs wouldn’t be baked in the compiler and could therefore be easily modified to fit people’s tastes.

#20

@mark-i-m I feel like your proposal is missing … a coherent mental model? Like, the way inline asm currently works in both LLVM and GCC is that you have a block of instructions that are inserted into the binary almost verbatim, parametrized only by register allocation. Properties like volatililty, clobbering or inputs/outputs never apply to a single instruction but always to the block as a whole. So obviously I have a statistic significant covert channel inside the speculative execution. It just doesn’t make sense:

If I use a scratch register, I need to mark it as clobbered - UNLESS of course, I save and restore it. In this example, the inner instructions do clobber that register, but the entire block does not.

The way inline asm works is that the asm is just a black box and you define an interface (in, out, inout, clobbers, etc) for the entire thing, not for parts of it. Because the parts on their own are meaningless - there are no parts.


#21

I’ll just refer to readers to an earlier post on the matter.

I also suggest we drop AT&T x86 syntax support. I do not want alternative back ends to require support for multiple syntaxes. That is also helpful for readers of Rust code, since they would only have to deal with one x86 syntax.


#22

That seems like it would introduce a lot of redundancy when inputs, outputs, clobbered registers, etc. occur more than once, with all the usual problems of redundancy.

Just specify it on the first usage, not on every instruction.

It can be flexible - some may want to specify all of them at the end of the block, some may want to interleave inputs/outputs/clobbers with the asm code itself.

But each input/output/clobber should be specified only once per asm block.


#23

This looks pretty good, I’m happy that someone is picking this up.

  • Others have mentioned this too: it should be added to the drawbacks section that assembly code using {, } is harder/confusing to write. Why do I say confusing? I definitely always mess up the number of $ in my immediates when using inline assembly.

  • I like the explicit specification of lateout/inlateout, but I don’t like the syntax. I think the “late” part should be specified separately from the inout part.

  • GCC doesn’t allow you to specify an input and a clobber of the same register. We need to either just support this in a logical way, or error on it and have another way to specify what’s needed. For example an inclobber direction. inout is not sufficient because inout will presumably require the binding be mutable. inclobber would also allow an immutable binding which becomes invalid (as if moved) after the assembly block.

Nits:

  • First example of flags(volatile) missing closing parens.
  • “Names should be speaking” do you mean “Names should be words”?

@Amanieu

Some architectures use special characters in register name, so it might be better to put register names in quotes: in(“eax”).

This is also a nice way to distinguish between general and specifc constraints, i.e. reg vs. "r10".

@Amanieu

Figuring out a short name for some constraints is not trivial, it might be easier to just stick with the existing GCC single-letter contraints. In particular some constraints can be very complex:

I think the idea is to not allow these kinds of constraints and force the programmer to just use explicit register names if they want anything more advanced than just any GPR.

@Amanieu

An asm! with no outputs is meaningless if the volatile flag isn’t specified. It should be a compile-time error for a non-volatile asm! to have no outputs. Previous discussion

+1. It would also be nice if the unused lint applied to asm statements as well, because specified but unused outputs will get optimized away too.

@Amanieu

Template argument modifiers are absolutely required in practice. I make heavy use of them in my code (ARM64 assembly). I think that we can just reuse LLVM’s single-letter modifiers here since these are used in the format string: mov {0:w}, {1:x}

I’ve never used this, could you please explain what this does?

@Amanieu

I suggest adding an additional direction specification tmp to deal with temporary registers and clobbered inputs:

+1

@main

I would generally separate two kinds of constraints: Those that select one specific register (eax) and those that merely constrain the compiler’s selections to a set of registers (reg). Parameters that are not directly referenced (“excess parameters”) should only be allowed if they belong to the first group.

+1. I’d also like to propose that explicit registers constraints can’t be used for replacement in the template. After all, the programmer already knows what to write in the template.

@matthieum

I would generally expect an asm! call to be extremely platform specific; would it make sense to restrict the usage of the asm! macro to functions which are platform specific, for example, so that it is made clear that this piece of code is only valid for x86/x86_64 and cannot be compiled to ARM?

This meshes well with the portability lint proposal

@Zoxc

I also suggest we drop AT&T x86 syntax support.

Please no.


#24

I think this is fine since, even in ARM assembly, use of { and } is quite rare. The main benefit is that it matches the Rust format string syntax and you will probably need braces anyways to specify template modifiers.

LLVM/Clang actually allows an input & clobber of the same register. What it doesn’t allow is an output and clobber of the same register.

Also your proposed inclobber constraint is the same as the tmp constraint that I suggested in my previous post.

It causes the general-purpose register names to be printed with a w or x prefix, which indicates the register width to use (w = 32, x = 64): mov w4, x9. This is similar to eax vs rax in x86.

The mov example I showed will effectively truncate the second argument to 32-bits while moving it.


#25

I see a lot of potential problems with trying to abstract over the back-end’s native minilanguage for specifying what registers, memory locations, etc. can be used for each operand. The most basic problem is that it may be impossible to express “use this specific register in this instruction.” LLVM’s documentation is inadequate for me to tell whether it has this problem, but I know for a fact that GCC’s inline assembly can only target inputs or outputs to specific registers on x86, and even then only for the original 8 integer registers (not the r8-r15 extension that comes with x86-64); this is a fundamental limitation in the low-level “RTL” IR that GCC uses (the short version is that the constraint codes are defined by each individual architecture’s ultimate back end, the “machine description”, so if that has no need to express “this specific register” prior to register allocation, neither can an inline assembly operation). Allowing arbitrary immediates to come in from the surrounding code is also risky, I see someone already pointed out how wacky the rules can get for which immediates are allowed by which instructions. The worst this can do is give you a weird error message, but it might be such a weird error message that someone thinks the compiler is buggy.

Relatedly, input/output/clobber semantics as specified by the language can’t deviate even a little tiny bit from the actual semantics understood by the back end, whatever it is, or we’ll have subtly incorrect code generation under high register pressure.

p.s. @main @zoxc I really do prefer ATT syntax to Intel syntax for x86 assembly language and I would object to dropping support for ATT syntax. This is not because of my past working on GCC, but because I learned MC68000 and SPARC assembly languages (both of which are quite similar syntactically to x86-ATT) first.


#26

LLVM allows specifying all registers by name. Backends can change. If GCC wants to support Rust and this RFC ends up being how Rust does inline assembly, GCC will need to improve their inline assembly backend to have all the features needed by Rust.


#27

You can target to arbitrary registers with GCC, it just requires a completely different construct (register variables):

https://godbolt.org/g/3qmgzX


#28

I would generally expect an asm! call to be extremely platform specific; would it make sense to restrict the usage of the asm! macro to functions which are platform specific, for example, so that it is made clear that this piece of code is only valid for x86/x86_64 and cannot be compiled to ARM?

The language already has the tools to do that (e.g. conditional compilation #[cfg(target_arch = "powerpc64")]). The toolchain also detects many errors (e.g. embedding ARM inline assembly into an x86 binary fails to compile) but these errors are reported late (during LLVM translation, while invoking the assembler, etc.).

The language “run-time” will be able to give you more precise information about which assembly instructions you can actually execute. For example, the coresimd library already provides run-time feature detection on x86 in core::, so you can use that to detect AVX2 support at run-time and execute some assembly instructions only if the host supports them.

What rustc doesn’t have is a way to verify that some assembly code is going to compile successfully. For example, that you generate x86 assembly only in a part of the code that is protected by a #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] macro. What rustc also doesn’t have is a way to verify that some assembly code is safe.

There has been some progress towards eliminating common uses of unsafe with respect to using intrinsics, e.g., see RFC 2122, but extending the approach being pursued there to inline assembly would be a very long shot.

I agree that the issue you raise is a problem. But I don’t think it is a problem worth solving “right now”. Solving it requires a lot of work (rustc uses LLVM for inline assembly, so we would at least need to teach it to use the appropriate assembler to verify inline assembly “early”). Also, in all cases I can think of, if you screw up the code won’t compile. That’s bad, but it’s not horrible, in particular given that the asm! macro will be unsafe and that everyone should be assuming that any code that uses the asm! macro is not portable.

We could always solve this later in a backwards compatible way by adding a checker for inline assembly code. This checker could warn on portability issues (e.g. asm! macro invocations not protected by a cfg(target_arch)) and produce errors on broken assembly (if the code was already broken it is not backwards incompatible to error on it).


#29

I think that the proposal must mention what is an error, what is a warning, and the current state of affairs and maybe discuss what can be done to improve them but “better error messages” is a quality of implementation issue and I wouldn’t want to block the stabilization of inline assembly on that.

To offer better error messages we need to verify the inline assembly in rustc somehow, e.g., directly using the system’s assembler, parsing its errors, and trying to expose them in a good way (with line information, etc.). Even if we do that, there might be cases in which LLVM backend calling the system assembler might still fail. All of this is doable, but it is a lot of work, and nobody has volunteered to do it. This doesn’t mean that error messages must suck forever either, maybe by stabilizing the syntax we’ll give inline assembly more exposure and the current error messages will trigger somebody motivated enough to improve them.


#30

Rust already reports errors from inline assembly: https://play.rust-lang.org/?gist=35b145b367eafe528018f4237cdb9e44&version=nightly

The LLVM backend passes the error messages back to rustc so that rustc can display them.

The only (minor) downside is that this checking is not done for cargo check, only a full cargo build. I think this is fine as it is.


#31

This is my fault for not filling the bugs, but I believe that anybody writing any amount of inline assembly was either extremely lucky or must have hit a lot of rough edges. These are some examples I was able to come up with in 2 minutes:


#32

Ah sorry I misunderstood. You are talking about error-checking on the constraint placeholders rather than the assembly code itself.

I agree, the current state of affairs is pretty poor in that regards. Resolving it is pretty simple however: all we need to do is port the constraint validation code that Clang uses to rustc, which should resolve this issue.


#33

I was more concerned about portability than errors.

I was thinking that guaranteeing an error if the assembly is not used in a cfg(target_arg) enabled function (somehow) would in turn make it much easier to then have an analysis pass on crates which could identify automatically which architectures are supported by the crate.


#34

@rkruppe

I guess this is a classic “how much shall we lean on the language vs programmer?” question… I don’t have strong feelings, but I do prefer to lean on the language a bit more…

To be sure, I agree that commenting well is important and that the developer should read carefully, but assembly is tricky, and it’s easy to forget/miss stuff. For me personally, I would find it very helpful to have some help from the language.


#35

@main

Hmm… I see your point. I guess my complaint splits into 2 parts:

  • not all of an inline assembly block is “homogenous” in sense, I think. What I mean is that while some block of inline asm may have some property, I think it is also possible for a block to have a property specifically because of a single instruction. For example, the whole block may clobber memory because a single instruction that clobbers memory. If that instruction was removed, the block would not need the clobber any more. In such a case, I think it is useful for maintainance and documentation purposes to put the clobber on only that instruction.

  • Using positional arguments to define the interface is a huge pain (IMHO). There should be a more structured way of defining the interface.