Should we ever stabilize inline assembly?

comex · October 31, 2019, 12:57am

This is a reply to Centril's post in the "2020 roadmap post" thread. I'm using "reply as linked topic" to avoid spamming that thread.

Probably, but it's slower due to function call overhead. Kernel use cases often aren't very performance sensitive, but they sometimes are.

For example, the Linux kernel has static keys, which are designed to "allow the inclusion of seldom used features in performance-sensitive fast-path kernel code". How? By runtime patching an inline asm sequence to be either a branch (if the feature is enabled) or a sequence of nops (if the feature is disabled). This way, the overhead when the feature is disabled is essentially zero.

(Arguably this isn't even a "kernel use case" per se; it's a generally applicable technique that happens to be implemented in a kernel. But that doesn't make it any less useful.)

The thing is, assembly files have most of the same stability and specifiability concerns as inline assembly, with the added downside that we have zero control over the system assembler and thus zero ability to help preserve backwards compatibility. As far as those concerns go, I believe that recommending the use of assembly files instead amounts to saying "not our problem" without actually helping users.

That doesn't apply to the Cranelift objection, though. It's valid if you think we'll someday have Cranelift-only builds of rustc without LLVM – otherwise we can just use LLVM to compile functions containing inline asm. For now, I am skeptical that Cranelift will ever get to the point where Cranelift-only builds would make sense from a performance perspective.

In any case, the comment you linked is inaccurate. If Cranelift doesn't want to implement assembly parsing itself, it "just" has to add a mode that writes out its own code as assembly, intersperses the user's inline assembly fragments, and compiles the whole thing using the system assembler. (That's what GCC does for everything it compiles, which helps explain why inline assembly is designed that way.) Implementing that would be a fair amount of work, since it would be a different path from Cranelift's existing code generation pipeline. But it would be both less difficult and less risky than actually parsing assembly.

And if it turns out Cranelift is unwilling to do that work... why should that disqualify inline assembly from Rust, as opposed to disqualifying Cranelift?

Centril · October 31, 2019, 1:42am

Seems like something that can be addressed by having more of the hot code in assembly (although I suspect some would say that this isn't really addressing the problem); meanwhile, we can focus on providing more intrinsics.

I think there's no shame in saying "not our problem". As a general purpose PL we don't have to solve every problem if we believe the consequences of supporting some niche feature has acceptable consequences (which is up for debate). From the description the cranelift comment has wrt. interactions with the pipeline it does not feel like we can reasonable offer stability in the way we do for other things.

I do think that Cranelift-only builds should be something we aim for, yes. I'm going to withhold judgement on the performance aspect.

I'm not going to pretend like I'm an expert in the field we are discussing, but it seems like @sunfishcode is. As such, if you believe what @sunfishcode is stating is factually incorrect, then I would prefer for that debate to take place on the Cranelift issue.

This is a question of priorities. You might disagree with mine, but I think that decreased compile times and backend independence from LLVM is more valuable (e.g. because some rustc compiler engineers who will go nameless here are frustrated over being stuck with LLVM in 2019 + soundness holes that are due to LLVM) than inline assembly, which to me seems like a niche feature.

At any rate, even if the question of "ever?" is not "never", I don't think it is "in 2020" or in say 2021 given the huge backlog we have.

glandium · October 31, 2019, 1:49am

I wonder how that can reconcile with LLVM-specific rustc flags being stable. (e.g. -C llvm-args, -C passes, etc.)

CAD97 · October 31, 2019, 1:52am

Just only accept them when using LLVM. If rustc gets --use-cranelift, then the LLVM-specific args are not valid.

atagunov · October 31, 2019, 1:57am

Very limited unique inline-ASM dialect?
If C compilers shaped existing ASM-s, why shouldn't Rust brew its own?

josh · October 31, 2019, 3:34am

Inline assembly is a high priority for me, for FFI and C parity. I think the syntax proposed in the pre-RFC thread will help keep inline assembly more backend-independent and less tied to LLVM.

CAD97 · October 31, 2019, 4:12am

How low overhead is it possible to make "external assembly", especially once we have LTO (or equivalent)?

Is it possible that we could make "external assembly" usable like it was "inline assembly"? i.e. have a macro inline that contains arbitrary text passed to the assembler that is then linked into the middle of the function. This would be like implementing "minimal inline assembly" (i.e. know-nothing inline assembly) in rustc completely independent of the backend.

I know practically nothing about this domain, but to me as an outsider looking in, it seems like a potential "third option" is to make external assembly as easy to use as inline assembly.

comex · October 31, 2019, 5:47am

With LLVM, you could use cross-language LTO to inline a C function containing an inline asm block, which would be zero overhead compared to embedding the inline asm block directly. That probably works today, and it would gracefully degrade to a non-inline call for backends that don't support LTO (i.e. Cranelift). But it doesn't work for all use cases (particularly asm goto), and it's odd to have to depend on C for this functionality.

It doesn't really make sense to do LTO with an actual assembly file, because the assembly code is meant to be opaque to the compiler, so the compiler can't do things like remove the prolog/epilog code that's needed for an external function definition, or change which registers are used as operands.

Splicing the user's code into a generated assembly file is how GCC works and is a fine approach. But it only works if the backend passes its code through an assembler in the first place. AFAIK, Cranelift currently goes from IR directly to machine code; I'm not sure if it even supports outputting assembly. But I suggested that it could add a mode that does output assembly and passes it through the system assembler, as an easier path to inline assembly support.

comex · October 31, 2019, 6:00am

I'll point out that adding intrinsics also creates additional work for Cranelift to reach parity. That said, it's probably less work than implementing inline assembly, and in some cases the intrinsics would be needed anyway since inline assembly doesn't provide an adequate substitute.

Centril · October 31, 2019, 6:17am

This is true. My take-away from that is that we should be deliberate in adding them, require that the benefits be demonstrated and motivated, and so on. We shouldn't be adding intrinsics on a whim, as if they were without cost. Beyond costs for cranelift, they will also have costs for Miri and any eventual specification we produce, as they will need to be included there. Fortunately, however, intrinsics are usually constrained problems for which an operational definition can be given. If not, perhaps that particular intrinsic should not be added.

jethrogb · October 31, 2019, 4:01pm

ckaran · October 31, 2019, 4:11pm

These might be a dumb questions, but if asm! is stabilized will the assembly languages that rust supports also become part of the language? That is, if I decided to write my own rust compiler, would I also be required to support some set of chips to meet a future rust specification? What happens as Intel and other device manufacturers expand x86-64 in new ways? Are those instructions also part of rust automatically? What happens if my compiler is highly specialized and only intended to target some esoteric embedded chip; am I out of spec if I don't support the standard set of chips as well? If all of this is true, how does it affect formal methods? I mean, it's hard enough formally describing rust, if you tell me I now need to formally deal with multiple chips in some way, then that makes life even harder. At least with intrinsics, they can be treated as keywords and given formal semantics on what they are supposed to accomplish which will allow formal methods to be applied.

anp · October 31, 2019, 4:17pm

The precedent set by gcc and clang is referenced often here, but IIRC their implementations of in-line assembly are not ISO-specified but language extensions. Worth considering that their technical approach has required a hands off approach for stability hazards.

Amanieu · October 31, 2019, 4:26pm

A minimal implementation of asm! is actually not that complicated since 90% of the work can be offloaded by invoking the system assembler. Here's a rough outline of how it could work:

Run register allocation on the asm fragment, using the constraints specified in the asm!. You may need to extend your compiler's register allocator to support this.
Perform string substitution to replace the placeholders with actual register names.
Generate an external asm file with the following contents (replace ${ID} with some unique identifier):

.section .text.inline_asm_${ID},"ax",@progbits
.globl inline_asm_${ID}
.type inline_asm_${ID}, @function
inline_asm_${ID}:
    /* <insert asm string here with registers filled in> */
    jmp inline_asm_${ID}_return
.size inline_asm_${ID}, . - inline_asm_${ID}

For the actual code generation of the asm! in your compiler, just emit a jump to the external asm block, and a label for the external asm to return to:

// ...
   jmp inline_asm_${ID}
.globl inline_asm_${ID}_return
inline_asm_${ID}_return:
// ...

Assemble and link in the generated extern asm files.

jethrogb · October 31, 2019, 4:32pm

How do you envision this? Depending on your system, a wildly different dialect of assembler maybe used. The way inline assembly works with LLVM today, you can use the same dialect regardless of the compilation host and target, and this is a feature that ought to be preserved.

josh · October 31, 2019, 4:47pm

No, definitely not.

If you wanted to support your implementation of Rust on x86-64 you'd need to support asm! for x86-64. x86-64 assembly is only expected to work on an x86-64 target. If you only support your own custom target, you don't need to support x86-64. (Also, we don't have to require that every platform support inline assembly even for its own assembly language.)

CAD97 · October 31, 2019, 4:53pm

Honestly, when writing asm, I do not see a problem of putting the effort of providing the correct assembler on the asm author. Ideally, the compiler would be as hands-off as possible on the handwritten assembly.

Asm is intrinsically importable to the maximum extent possible. It's perfectly fine for the compiler to admit that and ask the author to provide the exact tool they're expecting to assemble the asm.

HeroicKatora · October 31, 2019, 5:02pm

Interesting. Without any concrete ideas, but might it be more useful to have asm! be a particular kind of typed (const) interface into the compiler? Make it possible to write such a correct assembler as a crate instead of a compiler plugin? Not unlike procmacros.

ckaran · October 31, 2019, 5:07pm

OK, then I see a formal specification of rust going in one of two ways. Either it is incomplete, or it is hopelessly complicated.

If we create a formal specification for asm! that only includes the parameter signatures (basically, the assembly is just an arbitrary length string), then formally asm!('asdfasdfasdf') is correct, but the assembler will choke on it. Moreover, since assembly is ignored in this model, and since it can do arbitrary things, any proofs are only valid between pairs of asm! blocks.

If we require that the assembly be valid in all ways, then we need to include all of the assembly languages that are supported, as well as their behavior in the formal proofs. Even if you state that you only support x86-64, that is really, really, really complicated. Not something I'd want to get into...

josh · October 31, 2019, 5:21pm

We don't have to go that far (though the complexity is part of the reason it isn't ready yet).

We don't have to specify what (for instance) the pclmullqhqdq instruction does. We need to establish its inputs, input/outputs, outputs, and their types.

And even then, we need ways to tell the compiler "no, I really know what I'm doing, let me write instructions or directives you don't know about".

Topic		Replies	Views
Stabilization path for asm!()? language design	11	3310	March 25, 2019
2020 roadmap post	14	5644	February 5, 2020
[Pre-RFC]: Inline assembly language design	70	14023	March 25, 2019
Proposal: support *.s natively via llvm-mc tools and infrastructure	7	3706	March 25, 2019
Richer inline asm compiler	4	1021	November 19, 2022

Should we ever stabilize inline assembly?

Related topics