Should we ever stabilize inline assembly?

cuviper · November 1, 2019, 2:16pm

It's fine for the probe nop to be inlined into different functions -- the SDT note describes the original semantic meaning of that probe point.

And yes, you could just punt this externally, even use C and the original sdt.h, but that's true of almost any C-parity feature that Rust lacks.

I'm not insisting that the probe crate needs to be a priority. I just wanted to show an interesting use case that's more than just a simple instruction stream.

Amanieu · November 1, 2019, 2:20pm

As another example of some inline asm that I am actively using in production, see my comment in the pre-RFC thread.

bjorn3 · November 1, 2019, 4:19pm

Still work in progress (just two examples atm)

bjorn3 · November 1, 2019, 6:59pm

I added some more examples. If you know about an (important) example, please let me know, or open a PR.

comex · November 1, 2019, 8:53pm

Yeah. My purpose in explaining how it could be external wasn't to say it should be external (that kills the zero-overhead aspect!), but to show that it's largely equivalent from a formal specification perspective.

gnzlbg · November 2, 2019, 11:45am

I find this request unreasonable given that we support, e.g., core::arch and PowerPC in stable Rust, but Cranelift does not support either and might never will. People that want to write code that's portable to cranelift backends already have to use #[cfg(cranelift)]/#[cfg(not(cranelift))] to avoid doing things that Cranelift does not support and inline assembly is just another one of these.

It also paints the problem as having only two solutions: either Cranelift implements inline assembly or Rust never supports inline assembly, but the solution space is much bigger. For example, the Rust frontend could implement inline assembly for those backends that do not implement it, e.g., by mapping asm! to global_asm! in a dumb way. I don't think we promise anywhere that inline assembly won't result in an unknown function call - only that the inputs and outputs will be handled in certain ways.

bjorn3 · November 2, 2019, 3:11pm

There is no such thing as #[cfg(cranelift)]
Most of the things not yet supported are nightly only, so only libcore/liballoc/libstd use them, which are easy to patch for me.

For arch feature detection on x86, I solved it by patching core_arch to pretend that cpuid doesn't exist on the current CPU, so std_detect doesn't call it, but returns no. The rest of the inline asm in core_arch is not related to SIMD. Lukily I haven't seen any application yet which requires a non-SIMD platform intrinsic, so those don't give any problems yet.

That is an interesting solution. However I see two problems with it: it requires a native assembler, and it requires the rust frontend to parse the register constraints to actually be able to assign valid registers to the inline asm code. For example it shouldn't put =ra in rbx. The first problem is not that big of a deal and the later problem should be solvable I think.

bjorn3 · November 2, 2019, 3:21pm

What about #[naked] functions? I can imagine that there is code (for example an OS) which doesn't want the stack being overwritten, which would happen when calling the global_asm! function.

aidanhs · November 2, 2019, 3:51pm

(I'm going to put aside the "should we ever" question, because I think others have made the point well on both a technical capability front (i.e. "I cannot do X") and the related positioning front (i.e. "it makes sense for Rust to allow me to do X"))

I like doing prioritisation, since it encourages discipline in the consideration of features and their impact in the context of an overall vision. As a strawman, a perspective could be "async/await will be a huge win for networked applications, but we've neglected a) the low-level space, b) ..., c)..., so how do we move the needle?", which may then lead to inline assembly (or not!). It looks like you've stated your vision as:

This is fine, but it's important to consider perceptions - for the set of people in this thread, the implication is "you need to make do with your bug-ridden asm! feature on nightly". This is disappointing, especially if following this up with "but we will provide more intrinsics" - it may be that the people writing inline asm are actually better served by this approach, but there's a large body of evidence of use of inline asm in and outside of Rust that is not easy to overlook (evidence inside Rust may be particularly strong, because people are using it despite it being such a terrible experience).

IMO: I don't write any inline asm these days and I actually agree that a faster compiler and backend independence are more valuable overall, but this is such an ugly corner of Rust that a step forward here is long overdue. Consider - global allocators happened, but it was a small step in the allocator story. Rather than pitting inline asm against compiler speed, I propose trying to consider what a tiny step forward would look like, to happen alongside the higher priority work (which, no matter what, is not going to pause everything else for a year).

gnzlbg · November 2, 2019, 4:35pm

@bjorn3

What about #[naked] functions? I can imagine that there is code (for example an OS) which doesn't want the stack being overwritten, which would happen when calling the global_asm! function.

Good question. Maybe instead of an unknown function call to a symbol declared using global_asm! we could just put a label instead and "jump" to it and back in that case ?

it requires the rust frontend to parse the register constraints to actually be able to assign valid registers to the inline asm code.

Yes. I think that when stabilizing inline assembly we should be careful with which constraints do we precisely allow here. IIRC @Florob's RFC did not allow everything that LLVM or GCC allowed, but only a small subset of it.

I would kind of prefer to be cautious when trying to homogenize the inline assembly feature into some minimum common denominator, and IMO it is a pretty-much backend-specific feature.

While we might be able to come up with some useful subset that works for LLVM and Cranelift, I have no idea whether that subset will work in future backends, or even in some existing backends like miri.

bjorn3 · November 2, 2019, 4:43pm

That would require for the codegen backend to support jumping to the middle of a function. It would also give problems when inlining a function, as the global asm wouldn't get duplicated for each new target it could jump to.

I agree. Should we add that restriction right now? Or should we wait for the outcome of this discussion?

Ixrec · November 2, 2019, 4:48pm

This part confused me, and seems important to clarify. Did you mean to say target-specific feature? (a.k.a. machine-specific, architecture-specific, hardware-specific, whatever we want to call it). Or do you think there is a reason why even the platonic ideal final form of Rust would have multiple inline assembly dialects for each backend? (LLVM, Cranelift, I guess hypothetically GCC, etc)

The only reason I know of is the complex register allocation constraints that are only reliably useful when tied to a specific, known register allocator algorithm. But that's exactly the sort of thing everyone in this thread seems happy to leave out of a stable inline assembly feature. Or are you imagining a future where we have a "lowest common denominator" inline assembly in stable Rust, plus some backend-specific extensions?

josh · November 2, 2019, 9:18pm

There's an ongoing effort to specify a new (and less LLVM-specific) version of asm!, and I would expect that to take place before stabilization.

proc · November 2, 2019, 11:52pm

I have to argument strongly against this. Main use of inline assembly is also speed up, things like SIMD (looking at you, hasbrown, now part of libstd). Things I'd expect to work in a systems language. And of course also the ability to program an OS, which apparently requires assembly. As of now rust is something in between. It's neither low enough that you can just drop it in everywhere nor high enough that you can use it "as easy" and boilerplate-free as other high level languages.

I might be missing something crucial, but at this point I'm not even sure if separating from LLVM is that of a good idea, if it means relying on external compilers for assembly. Because if I'd want assembly support in stable rust on a new target or OS, I'd need to port over gcc and rustc. And why should I bother to port both, just so I can use assembly in rust ? Then I can just stick with the c/assembly compiler I have.

Nevertheless I can totally see the problems of supporting inline assembly in rust or even adopting a subset of assembly (so a "rust" assembly). But as it stands this is required for many "nice to have" things.

In a way asm! is the escape hatch for platform specific things you can't do without adding intrinsics to the compiler. And I'd say this is required for something called a systems programming language.

proc · November 3, 2019, 12:22am

I have to +1 this, because it's just too true.

mcy · November 3, 2019, 2:40am

Relatedly, is there any interest in ignoring register constraints completely, and implementing register pinning, instead? In modern GCC and Clang, you can write

register size_t my_pinned_int asm("%rax") = x;

When my_pinned_int has automatic storage, this causes the register allocator to reserve %rax for my_pinned_int, which can then be used in a succeeding asm block. How far would something like

#[pin_reg("%rax")]
let mut my_pinned_int = x;

get us to all the things we want for inline asm? I've seen a lot of code at work that copies arguments into register-pinned variables and then executes some assembly, and it's certainly easier to read than register constraints. You could even imagine something like

fn foo(#[pin_reg("%rax")] foo: usize) { .. }

copying foo into %rax as part of the prelude.

This doesn't solve the "just give me a free register" problem, but I'm sure there's some dumb extension to this that makes that easy... but then again, I've never written inline assembly that doesn't have a really specific calling convention.

Amanieu · November 3, 2019, 2:43am

The register pinning used in GCC/Clang is really a massive hack and should be avoided. Instead, you should just specify directly what register you want in the asm constraint. And if you look at how Clang generates LLVM IR for register pinning, you will see that this is exactly what it does.

mcy · November 3, 2019, 2:47am

That doesn't seem to follow. Inline asm constraints themselves are pretty horrible for humans to read. A rustc with an LLVM backend is still going to need to generate LLVM asm constraints, even if we do something on the frontend to make them more palatable. Register-pinned local variables are, ultimately, just syntax sugar.

RalfJung · November 3, 2019, 11:52am

That "extensibility" however still assumes that all relevant code is written in Rust and, in particular, uses the Rust memory model.

Inline Assembly is much harder than that, and certainly our RustBelt proofs so far do not support it.

True. FFI with languages that have a different memory model is almost as hard as inline assembly in Rust. ("almost" because inline assembly has a much more fuzzy boundary to Rust, when compared with FFI.)

But that doesn't help at all to specify inline assembly in Rust. It's not like it has a proper spec in C.

Yes, Amal's work is probably the best we have there currently. But AFAIK they consider a typed source and a typed target language and do not consider unsafe (aka untyped) code. Also that is using a specific assembly language they control, as opposed to specifying "here's how our highly optimized low-level language interacts with any untyped assembly language you might compile it to".

(Just responding to the pings here, I am afraid I don't have the time right now to follow this discussion. My personal thinking is that we could try to build a useful and precise spec for volatile memory accesses as a stepping stone towards inline assembly -- volatile accesses have some similar problems in terms of interacting with the target language memory model but are much more limited, thus making them an easier target.)

gnzlbg · November 4, 2019, 9:36am

This is what I meant. Rust has already 3 backends: LLVM, Cranelift and miri, two of which do not have plans to support any kind of inline assembly. The LLVM "backend" supports inline assembly, but each target within LLVM itself has widely different support for inline assembly, offering completely different sets of constraints. These also often subtly differ with what GCC provides. The minimum common denominator is "no inline assembly whatsoever", the moment that we support any form of inline assembly, we have at least already ruled out miri as a valid Rust "backend".

Topic		Replies	Views
Stabilization path for asm!()? language design	11	3310	March 25, 2019
2020 roadmap post	14	5644	February 5, 2020
[Pre-RFC]: Inline assembly language design	70	14023	March 25, 2019
Proposal: support *.s natively via llvm-mc tools and infrastructure	7	3707	March 25, 2019
Richer inline asm compiler	4	1021	November 19, 2022

Should we ever stabilize inline assembly?

Related topics