@comex, I think @RalfJung's comments really make the point better; assembly is hard it verify.
As for inline vs. external assembly, I'm comfortable with having multiple assemblers that rust is able to call into. In my wild dream world, all the tools would have the ability to pass specifications between the assembler and rustc such that each could verify to the other that the spec was met. This would require a language independent specification language that was sufficiently powerful to meet all needs, which (AFAIK) doesn't currently exist.
That said, I don't think that making assembly a part of rust is the right way to go. It just makes things too complicated.
Alas we need a good integration story of assembly, maybe not inline, but something along the lines of "you can do this here, provided you give us a compiler / use exactly this toolchain". And I think that's fine because it's already a "niche", but a required niche.
I think that @proc is correct about needing a good integration story. Rather than talking about inline vs. external, how about creating a list of things that we want to see in any assembly integration, and then see if we can design something that will fit? My own wishlist:
Amenable to formal methods/proofs
Relatively easy to extend to other assembly languages that rust may support in the future
Easy to make part of the build chain.
I'm sure others will have their own opinions on what is needed, once we have a complete list, we can start hashing out a design to make it happen.
I think miri is different: it also doesn't support any kind of FFI (and cannot with its current design), so it can never be a drop-in replacement for LLVM. Cranelift can theoretically become a drop-in replacement, and I think it makes sense to expect LLVM and Cranelift to support the same form of inline assembly. (But I also think it's feasible for Cranelift to support it, eventually.)
Indeed, assembly is hard to verify. But, as he wrote, so is FFI:
And as I said before, it's not just that specifying inline assembly and specifying FFI have similar levels of difficulty: if you want to specify FFI in full generality*, then they are largely the same task!
Yet FFI is a core feature of Rust, so we have already made that task a prerequisite for fully specifying the Rust language, should it ever be attempted. Inline assembly doesn't make things much harder.
* In particular, you need to specify assembly if you want to specify any of the following:
What it means to conform to the C ABI on a particular platform (the ABI is defined in terms of assembly)
FFI to assembly functions (this is common in practice, e.g. memcpy is usually implemented in assembly).
FFI to C functions that use inline assembly (somewhat less common, I think, but it happens).
In practice, it may be easier to just make assumptions about the behavior of, say, memcpy, rather than proving the assembly-level implementation correct. If there is a valid specifiability concern about inline assembly, it's that supporting it might encourage people to use it as a speedup for things that could otherwise be written in pure Rust – which would be easier to specify.
Well-written crates that do this should include a pure-Rust alternative anyway, if only as a fallback for unsupported architectures, and that fallback could also be used when using (so far purely hypothetical) formal verification tools to verify the codebase as a whole. (Also for miri.) That still means you're not proving the assembly correct; intrinsics may be better from that perspective. But, well, intrinsics and inline assembly are not mutually exclusive. We definitely want to keep adding intrinsics, since they typically have better codegen.
Intrinsics are simply pre-defined, built-in functions to the language that have implementations in the underlying assembly/machine language of the target platform. Is that not the case? If so, I agree with you and I don't see how defining intrinsics and having some form of in-line or convenient assembly should ever be mutually exclusive. In fact, it would seem that it should work like this:
I have a need for something that is performant that can't be done performant enough in Rust or Unsafe Rust alone.
I can make that thing in assembly for the targets I care about.
I can link those assembly blobs to my Rust Code.
I want it to be more convenient, so, I'd like this to be "Inline" assembly.
The "Inline Assembly" rules define how I need to specify which registers my inline assembly will require use of and how they will use memory and/or registers for input/output/mutation.
I implement my assembly routine inline and specify those constraints.
Everything is great!
A pattern begins to emerge where a number of use-cases for a number of people converge to the same or similar assembly solution to the problem.
A new intrinsic is proposed for addition to the language to cover that use case.
The intrinsic is created, accepted, and becomes part of the "Compiler"
All in-line assembly that was created for that need is replaced by uses of the intrinsic.
Everything is GREATER!
Someone realizes that the intrinsic can be expressed as some Rust Language-Level construct.
That is created and accepted into the language.
All direct uses of that intrinsic are replaced by the new Rust Language-Level construct.
Now everything is SUPER-PEACHY AWESOME GREATNESS!
Start over at 1 for the next thing that needs performance/special handling.
Isn't that basically how it should roll? If not, why?
EDIT: To be clear, this list of steps doesn't preclude jumping ahead or skipping steps or only doing steps "notionally". For example, you could conceivably, imagine the necessary assembly, imagine the intrinsic, recognize it can be added simply as a language-level thing, do that directly via RFC etc, as long as that process is quick enough for your implementation needs. The full set of steps allow for more "Agile" development though and allows people to move forward with their needs while still allowing Rust itself to evolve more slowly to accommodate the newly recognized needs.
Please don't equate "intentionally not portable" with "not well written". There's nothing wrong with, for instance, a crate that implements a platform-specific interrupt handler, or a crate that implements a specific non-standard calling convention, or a crate implementing platform-specific context switching.
I do think it's reasonable that such crates may just not run on miri.
I don't see why these are necessarily true. Compilers need to have some support for the ISA of each architecture they target since calling conventions are defined in these terms. That is not equivalent to 'assembly' which itself is a higher level language and also the neccary instructions for calls are just a small subset of the ISA. Furthermore, ffi to externally linked functions such as from C is fine as long as you have some notion of observable effects of such a call and the function causes a subset of them.
OK, so is it possible to have a formal specification of what the blob is doing, and assume that the blob meets the spec? This is less than ideal, but it allows the compiler to 'jump over' the blob, instead of choking on it. The other advantage is that the spec could be fed into the asm! macro (or FFI, or whatever), and if the assembler/compiler/whatever on the other end is able to ingest the spec, it will be able to verify that blob meets the spec for you. If it can't ingest the spec, then it can silently ignore it.
The only question then becomes 'which specification language should we use?' I found a Wikipedia page that lists a bunch of them, but I have no experience with any of them. And honestly, I'd prefer it if we could formally specify some core base of rust itself, and then use that as an unambiguous specification language, so that we don't have to learn yet another language to be able to master rust.
It's quite simple really, just treat the blob as an external function call, except that it uses a custom calling convention for passing argument in/out and clobbered registers.
With some unsafe requirements, that course of action would also work if the blob is not produced by the compiler itself but user provided. I am currently trying this out for function-like asm blocks with C-calling convention, with a procmacro, nasm, and #[no_mangle] hacks. work in progress: https://github.com/HeroicKatora/direct-asm
(To clarify, by "do this" I was referring to what I wrote in the previous sentence, "use [inline assembly] as a speedup for things that could otherwise be written in pure Rust". Poor wording on my part since there's a paragraph break in between.)
Ah, I see! Thanks for clarifying; I indeed did not catch the narrower use case, and thought you meant inline assembly in general. My apologies for reacting strongly to my own misunderstanding.
Instead of providing a GCC-like inline assembly interface, one could provide intrinsics for all CPU instructions (all forms, including privileged instructions).
This would essentially be equivalent to the GCC-like interface, but without having to parse the assembly, except for not having control over register allocation, instruction order and not eliminating dead instructions; however, that can be fixed as well by providing "identity function" intrinsics that return their argument and in code generation force it to be into one of a set of registers, and "identity function" intrinsics that cannot be reordered between themselves, as well as "black_box" to prevent code elimination.
The intrinsics could be implemented using the current asm! facility internaly for the LLVM backend, while other backends would need to implement them directly themselves.
Kind of, miri does intercept many FFI calls already, and emulates them. But in general you are right, and code uses #[cfg(miri)] to add workaround un these cases.
I only know x86 well enough to comment on this proposal, but for this architecture at least the amount of effort that is required to provide an equivalent compiler intrinsic to every variant of every instruction should not be underestimated. The Intel set instruction reference is ~2000 pages long in spite of being written in a concise style where each instruction description typically fits in two pages, so we're talking about around a thousand instructions, some of which have a dozen variants.
Furthermore, some x86 features are probably not expressible within Rust's abstract machine (far jumps and segmentation immediately come to mind, for example). These are not features which you deal with everyday, in code targeting modern hardware at least, but they are occasionally needed in the sort of bare-metal environments where people typically go for inline assembly today.
Notice that this approach does not work. An asm!("foo ...\n bar ...") block is often not equivalent to foo(...); bar(...); intrinsics, e.g., if foo sets some register flags and between the foo(...); and bar(...); calls Rust or LLVM does a nop in the abstract machine (e.g. like spilling a register to the stack and back) that modifies those flags.
We have had to deprecate Rust features like core::arch::x86_64::__readeflags/__writeeflags because they were impossible to use correctly due to this.
Maybe just a silly idea, but why not only one generic intrinsic : fn run_machine_instruction([u8]) , so the inline assembly could be handled by an external crate providing its own asm!{} macro creating the sequence of machine instructions.
The difficulty is that the biggest win from inline assembly is taking part in register allocation. In other words, I want to do something like (simplified enormously from what GCC can handle):
let a: u32 = 42;
let b: u32;
asm!("ADDS {1}, {0}, {0};"; read: a; write: b; clobber: CPSR);
assert_eq!(a + a, b);
and have the compiler work out which registers to use and ensure that a and b are loaded into registers before the asm!, while not also reloading other locals after the asm!, and not needing to store a or b to memory if they're immediately consumed. In this case, I've also indicated that if the compiler depends on the contents of the CPSR, it'll need to recompute/save it over the asm! block.
The raw bytes themselves of the inline assembly are not the hard part; it's the "this is the resulting change to machine state from this block, and this is how to map machine state to Rust-level constructs like variables" that's hard.
I think an important question is what machine state we care about. The only constraints I've ever encountered are (in LLVM's language):
=&r and =&{reg} (which is how most people semantically treat =r).
=*m.
r and {reg}.
~{reg} and ~{memory}.
I can imagine wanting to provide a handful of non-general-purpose register categories, like f for an FPU register (whatever that means on a given platform), or SIMD categories.
We probably want to draw a line past which "if you know you need something more specific, either hard code your register choices or write this in a separate .S file or #[naked] function." Of course, I mostly think in terms of x86_64 or RISC-V, so maybe this line is much harder to draw in you care a lot about armv7.