[Pre-RFC #2]: Inline assembly

josh · November 18, 2019, 5:56pm

I would specifically propose that support for AT&T syntax, or for that matter, GCC-compatible syntax, should be a proc macro that translates to asm! under the hood, and shouldn't necessarily require built-in support in the language.

ckaran · November 18, 2019, 6:01pm

Quick question; would all asm!() blocks need to have the same set of parameters? If so, then you could just set this up once somewhere and reuse the constant over and over again. So, while I agree with you that the various architectures can be weird, at least you could set it up once, and then not think about it again.

BUT, weirdly enough, this discussion kind of proves a different point; that creating a struct like I showed before could be useful. Let's leave out the architecture key for the moment, and assume that asm!() is stabilized. Once it is stable, if you discover that you want to add in new functionality, you either need to come up with a new macro, or find a way of shoehorning it into asm!(). What I'm proposing makes it possible to continue to expand asm!() in the future without breaking old code. Does that make sense?

zackw · November 18, 2019, 6:14pm

Issues that I haven't seen brought up yet:

People have been talking about which registers can be used with this or that instruction, and yeah, that's a mess, but what's often even more of a mess is which immediate operand values can be used with this or that instruction. x86 is actually one of the simpler ISAs in this respect. To get a taste of how messy it can get, look at the definition of a "modified immediate operand" in 32-bit ARM machine language (ARM ARM, page F2-3924): any number that can be created by zero-extending an 8-bit constant to 32 bits and then rotating the 32-bit word right by an even number of bits. (Whether or not this is considered a signed number depends on the instruction. Some instructions take the bitwise NOT of the value before using it, and some instructions take the twos-complement negation before using it. If you're generating Thumb-2 machine instructions instead of traditional ARM instructions, the set of possibilities changes.) Something has to know which constants are allowed; if there's a separate assembler you can punt to it, but it sounds like people want to be able to generate machine code directly.

Symbolic operands are even worse, because the limitations of the linker and the object file format come into play as well:

let mut val = 1u32;
asm!("add {0}, {1}", inout(reg) val, sym GLOBAL_DATA);

Depending on the architecture and the ABI, this probably translates to some kind of fused load-register-indirect-and-add operation,

add eax, dword ptr [rip+0x0] !R_X86_64_PC32(GLOBAL_DATA)

which may or may not be representable depending on how big that immediate displacement field is, what relocation types can be used with it, etc. etc. It may not be possible to know whether the operation is representable until link time. Users probably expect that the toolchain will "fix up" the instruction and make it work regardless, which may involve rewriting the original instruction from load-reg-indirect-and-add to add-reg-reg (and finding a scratch register), inserting specially tagged no-op instructions before this one to give the linker space to rewrite in, etc. etc. etc.

In the previous thread on inline assembly, use cases involving switching between the text section and special data sections (and referring to addresses in the text section from the special data section) came up, e.g. https://github.com/cuviper/rust-libprobe/blob/431ac2999eb88e3a8ba5ee15df13557e234d9775/src/platform/systemtap.rs#L164 I don't see anything in here about that. I'd be fine with an initial implementation that doesn't support those kinds of use cases, but I'd hate to see them get forgotten about.

zackw · November 18, 2019, 6:26pm

Probably the strongest objective use case is that there's tons of C-with-embedded-AT&T-assembly out there in the wild that someone might like to do oxidation on. OpenSSL comes to mind. Anyone working on that would probably rather not mess with known-working inline assembly at the same time.

I can also attest to the existence of a nonempty set of programmers (containing, at least, me) who only know AT&T syntax for x86, feel that they have many better things to do with their time than re-learn how to write assembly language, and would therefore be annoyed with Rust if it only supported Intel syntax.

Amanieu · November 18, 2019, 6:45pm

zackw:

People have been talking about which registers can be used with this or that instruction, and yeah, that's a mess, but what's often even more of a mess is which immediate operand values can be used with this or that instruction. x86 is actually one of the simpler ISAs in this respect. To get a taste of how messy it can get, look at the definition of a "modified immediate operand" in 32-bit ARM machine language (ARM ARM, page F2-3924): any number that can be created by zero-extending an 8-bit constant to 32 bits and then rotating the 32-bit word right by an even number of bits. (Whether or not this is considered a signed number depends on the instruction. Some instructions take the bitwise NOT of the value before using it, and some instructions take the twos-complement negation before using it. If you're generating Thumb-2 machine instructions instead of traditional ARM instructions, the set of possibilities changes.) Something has to know which constants are allowed; if there's a separate assembler you can punt to it, but it sounds like people want to be able to generate machine code directly.

In the case of immediates, you are responsible for ensuring that the constants that you pass into an asm! are suitable for the instructions you are using them with. The assembler will emit an error if an immediate constant is not usable with a particular instruction.

zackw:

Symbolic operands are even worse, because the limitations of the linker and the object file format come into play as well:
let mut val = 1u32;
asm!("add {0}, {1}", inout(reg) val, sym GLOBAL_DATA);
Depending on the architecture and the ABI, this probably translates to some kind of fused load-register-indirect-and-add operation,
add eax, dword ptr [rip+0x0] !R_X86_64_PC32(GLOBAL_DATA)
which may or may not be representable depending on how big that immediate displacement field is, what relocation types can be used with it, etc. etc. It may not be possible to know whether the operation is representable until link time. Users probably expect that the toolchain will "fix up" the instruction and make it work regardless, which may involve rewriting the original instruction from load-reg-indirect-and-add to add-reg-reg (and finding a scratch register), inserting specially tagged no-op instructions before this one to give the linker space to rewrite in, etc. etc. etc.

For sym, you will get only the raw symbol name inserted into your asm code. We use the LLVM c modifier to ensure this. You are responsible for writing all the necessary boilerplate (e.g. @got) to obtain the correct address at runtime.

zackw · November 18, 2019, 7:06pm

That's probably fine for a first pass, but people may not be able to do that in general, e.g. when working with symbolic constants defined by external code.

The assembler will emit an error if an immediate constant is not usable with a particular instruction.

That's fine if there is an assembler, but I thought people wanted to be able to generate machine code directly?

This is probably too strong of a limitation, even for a first pass. In particular, the "necessary boilerplate" may depend not only on the ABI and the architecture, but on the exact compilation mode (e.g. PIC vs PIE vs fixed-load-location), and people may well want to compile the same crate in multiple modes. It's also mostly undocumented.

josh · November 18, 2019, 7:16pm

I agree that the developer needs to handle this, and that we should never rewrite assembly to handle constant loads. If you want to load an immediate into a register, you could tell the compiler to do so with in(reg), and then the compiler is responsible for performing the load. Otherwise, you would need to handle the details of immediates yourself, including if only a subset of possible immediates can be used directly in an instruction.

Along the same lines, for instance, it's the developer's responsibility to handle other limited-range immediates. For instance, on x86, you can only use a byte immediate for the port in an in or out instruction, and if you want a full 16-bit port number you have to load it into a register first. I don't think the compiler needs to help with that.

That said, if LLVM or other backends can help us check these constraints and give better error messages (such as if we use i, or if the ARM backend could provide a special constraint for "an immediate that meets the constraints to be used directly"), we should arrange to provide that additional support if we can do so reasonably easily. Perhaps we should allow the syntax imm("arch-specific-string"), for future compatibility?

Amanieu · November 18, 2019, 7:27pm

There is always an assembler, whether it's an external one or LLVM's integrated assembler. We (as in rustc) never actually interpret the asm string ourselves.

You would have to create variants of imm for every immediate type supported by an architecture. Honestly it's not worth the trouble when assembler error messages are good enough.

Ixrec · November 18, 2019, 7:28pm

What about non-LLVM backends?

Amanieu · November 18, 2019, 7:31pm

We can always support using an external assembler as a fallback path:

Amanieu · November 18, 2019, 7:40pm

I updated the draft to add RISC-V (turned out to be fairly straightforward) and a noreturn flag.

josh · November 18, 2019, 7:55pm

Please see my edit; I think it would make sense to define the syntax imm("arch-specific string") for future use, even if we don't define any arch-specific strings in the initial spec.

josh · November 18, 2019, 8:07pm

We should document that sym does handle name mangling, though.

We should also say something like "if you want a valid pointer rather than a name, you should use ... instead". There should be some mode in which the compiler does the work of resolving the symbol for you.

In this list, can you add %rip on x86? The instruction pointer doesn't make sense to use as an input or output.

This should explicitly state that this also has the effect of hint::unreachable_unchecked after the block, and has the same caveats. In particular, falling off the end of an asm! block marked as noreturn is undefined behavior.

Also, a minor nit: could you please change X86 to x86 everywhere?

jethrogb · November 18, 2019, 9:42pm

This is quite inconvenient. I already mess up $ vs $$ plenty of times when writing inline assembly now.

CAD97 · November 18, 2019, 11:47pm

This isn't constructive without some sort of reasoning.

I personally believe that having any exception from "system assembler flavored assembly" needs justification (rather than the current status which seems to be Intel syntax is the de facto default though it isn't actually?), but I don't work with asm so I'll acquiese to those who do.

There's a bias to specify inline asm in terms of translation to LLVM inline asm (and/or GCC inline asm), but I think we really should specify it in terms of linking in system assembler assembled asm to the codegen backend, and using LLVM inline asm as an optimization.

It's basically required for a string interpolation syntax. Whatever syntax we chose is going to clash with something in some asm dialects, so the easiest and most practical solution is just to use format!-style string interpolation.

Or do you have an alternative that works generically for all asm flavors while still being dead-simple (read: generic and know-nothing) to specify?

Tom-Phinney · November 19, 2019, 12:19am

IMO whatever syntax we choose, it needs to be transformable from existing inline asm backend syntaxes, preferably by a proc macro. It also needs to map well conceptually to many existing asm syntaxes, as many Rustaceans will focus on a specific architecture for the majority of their development work, most likely either Intel or ARM or RISC-V, and thus know that one syntax better than others.

I personally am not predisposed to favor any specific assembly syntax, whether LLVM-compatible x86/x86-64 syntax, or GCC's AT&T x86/x86-64 syntax, or ARM/Aarch64 syntax, or RISC-V syntax. To do so – to favor some existing syntax – is to some extent to predict which architecture will see the most use of inline assembly. Even so, I expect that embedded IoT devices increasingly will use SoCs based on RISC-V and ARM/Aarch64, leading to heavy use of inline assembly with those architectures, so they need to be well-supported in all their variety of optional features.

hanna-kruppe · November 19, 2019, 12:25pm

Apologies for the long list of unrelated bullet points but I don't think ten separate replies would be better. I wish I could leave inline comments on a diff, but here we are.

preserves_flags seems highly arch-specific (and somewhat coarse on some platforms), perhaps there should be a way to opt out of default clobbers instead? Straw syntax: noclobber("fpsr")
The interaction with Rust's memory model has been discussed a bit in the past (e.g. on the black_box RFC) and it falls under the "document what is UB" heading @mcy brought up, but in particular I am curious about the interaction with readonly and nomem. The motivation appears to be globals (which are fair game for all code in the program in most circumstances), but for most other memory there are very subtle interactions with Rust's rules on when what memory may be written through which pointers.
If sym operands are interpolated as a string into the template instead of being proper operands throughout the entire compilation pipeline, I forsee linkage problems:
- symbol names are aggressively internalized based on privacy and whole program analysis (LTO), but if rustc just puts the symbol name into the asm string, that's no longer a "visible use".
- conversely, if a function referencing an internal symbol is inlined into another codegen unit, the symbol is made global to make that work, but again a symbol name hard-coded into an asm template won't trigger that.
- This happens both at rustc level and LLVM level. My gut feeling is that the rustc side can be fixed with enough plumbing (keeping track of what symbols are referenced in inline asm) and the LLVM side can be worked around by making symbols maximally visible and #[used], but it's subtle and we already have a long tail of awful symbol visibility bugs.
Given all the headaches sym is causing, perhaps we can punt on it?
Since output operands allow place expressions in full generality, we need to settle the order of evaluation, e.g. when does out(reg) x[i] do bounds check and what does out(reg) x, out(reg) *x mean? Relatedly, in what order are outputs written (e.g. out(reg) *p, out(reg) *q where p == q)?
This was mentioned before in passing but sometimes immediates can't be written in natural int/float style. For example, RISC-V F/D instructions have an optional rounding mode which is a small immediate but in assembly has to be written as a mnemonic like rne, rtz, etc.
All the speculation (in the pre-RFC and comments) about how Cranelift and other non-LLVM backends could implement this flavor of inline asm are just that, speculation. I won't reurgitate points raised in the last thread but the two paragraphs in this pre-RFC are entirely insufficient to clear up those concerns. Someone really needs to sit down and prove that this works by building it.
As noted in an open question, some registers are reserved sometimes in obscure circumstances, and this plus flexibility in how registers are allocated leads to uncertainty about how many operands an inline asm can have before registers run out and a hard error must be issued. I see nothing in this draft to address those problems, and punting on it seems bad for stability.
It's only a "future possibility" so I don't want to lose too many words about it but asm goto requires such massive integration into the entire compilation pipeline that I honestly think even if other inline asm support can be bludgeoned into e.g. Cranelift, asm goto will never be portable across all reasonable backends. Even LLVM only gained support for it in LLLVM 9 because a lot of effort was put into being able to build the Linux kernel with Clang and the Linux people didn't budge on their usage of asm goto.

mcy · November 19, 2019, 3:33pm

Do we even want asm goto? I have never really encountered it in the wild, and I have to wonder what situations it's necessary in, where the performance hit of doing a branch once the asm block is exited is actually a Big Deal (to the point that, if it really matters, then maybe the whole function should be written in a .S file).

Amanieu · November 19, 2019, 3:38pm

This is actually fairly straightforward: the rules on what memory an asm block can access are exactly the same as what a C FFI call is allowed to access. This basically means any globals (subject to visibility?) and locals whose address is leaked out to external code (I realize this is a bit vague, but this thread isn't the place to discuss the FFI memory model). readonly and nomem simply restrict that subset even further.

I only proposed interpolating sym operands directly for global_asm! as a future extension. For normal asm! this would be done through a LLVM operand.

Given that we want to keep the option of mem operands open for the future, the only evaluation order that makes sense is to evaluate (left-to-right) all operand expressions into a value (for inputs) or a place before the asm runs. After the asm runs, outputs are copied from registers to the resolved places (again, left-to-right).

I don't understand the concern here? I imagine rounding modes would be written directly in the asm string, rather than passed as an argument.

I don't really see what we can do about that? For example, the Linux kernel is known to fail to compile with -O0 because it runs out of registers. My question was about whether we should always ban potentially reserved registers (e.g. the frame pointer, even in functions that don't use it) or if we should leave it to LLVM to decide whether to error or not.

hanna-kruppe · November 19, 2019, 4:28pm

Oh, I see. Then I can bring back what I wrote and scrapped about the restrictions (only intra-crate references) attached to this strategy. IIUC this restriction is motivated by:

LLVM not fully supporting references to external symbols, and
it being especially difficult to implement if there's a dynamic linking boundary between definition and use

However, rustc will often spread code coming from one crate across multiple LLVM modules and shared objects(/DLLs/dylibs). This is not something that can easily be changed -- even if you set -Ccodegen-units=1 or we tweak the CGU partitioning logic to make sure inline asm is in the same CGU as items it references, monomorphizations will still need to go in downstream crates and thus in different LLVM modules and (with the dylib crate type) different shared objects. So the "only reference things from the same crate" rule seems insufficient.

I'm thinking about implementations of intrinsics, for example. Just as we have intrinsics wrapping e.g. x86 shufps that take the shuffle mask as an enforced-to-be-constant argument, it might be useful to expose an intrinsic that does floating point arithmetic with a given rounding mode. Not being able to include it as an inline asm operand is not a deal breaker (you could e.g. switch over the immediate value instead and expect that it gets constant folded) but I wanted to note that there are cases where printing immediates as decimal integers doesn't work.

I don't know if we can do anything about that either (probably not without being very conservative and excluding some important use cases), but it needs to be taken into account when deciding if this design for inline asm is "stable enough" to comfortably stabilize it.

In any case, I expect there would be strong opposition to even more compiler errors post-monomorphization, so for the specific question you're asking "always forbid" seems more likely to make progress. I don't know if that is good enough for users of inline asm, though.

Topic		Replies	Views
[Pre-RFC]: Inline assembly language design	70	14204	March 25, 2019
Stabilization path for asm!()? language design	11	3320	March 25, 2019
Older RFCs for discussion this week	9	1657	March 25, 2019
This week's older RFCs	3	1235	March 25, 2019
Next week's older RFCs for discussion	8	2161	March 25, 2019

[Pre-RFC #2]: Inline assembly

Related topics