How does inline assembly and the physical machine fit into the abstract machine model, or exist outside of it?

The optimizer is allowed to to optimization which doesn't perfectly maintain the lowered AM state, so long as the logical AM state is maintained. This requires knowing the set of possible ways that the state can be observed. As an example, turning

let mut place = 0i32;
let r1 = &mut place;
ffi(r1);
let r2 = &mut *r1;
*r2 = 1i32;
asm!(…);
*r2 = 2i32;
…

into

let mut place = 0i32;
let r1 = &mut place;
ffi(r1);
let r2 = &mut *r1;
asm!(…);
*r2 = 2i32;
…

as there is no valid way for the asm! line to read or write place without making the write to *r2 invalid and UB[1]. If we allow any observation, then it could peek through the r1 reference that was stashed somewhere by the ffi call.

But the whole point of optimizations is that they're required to be behavior preserving. You're basically arguing that you should be allowed to ignore that and look "behind the curtain" as long as you pinky promise that you don't actually care about what you see. Which isn't observing AM state, it's observing arbitrary garbage.


  1. If we're given a guarantee that the asm! terminates, anyway. Diverging makes simple examples very difficult to construct soundly. ↩︎

I follow on the write, but why can't it read? If using asm you are implicitly using the machine model, and on all major architectures I know you will get a value when reading an address in memory as long as it is mapped. Sure it might not be the value you want or expect, but it is ok to read it.

As long as you then don't assume it contain something valid it should be fine. An example of this are highly optimised strlen implementations, that will read 8 or 16 bytes aligned and check if they contain any nul byte, and if so figure out which one it was. Since it is aligned it can never cross a page boundary and is fine. But we throw away the invalid data (the safety invariant being that data is initialised up until the nul byte).

Another use case for reading memory without respecting the AM is debugging/dumping. Ideally this should be done out of process, but that is rarely an option in a kernel or embedded. Unless you have a JTAG debugger hooked up you might have to do that in process.

2 Likes

And why exactly it's a problem? I explicitly said that compiler optimizations may result in unexpected results if you are not careful while writing the asm! block. We did not break any AM-level optimizations here, we only have a mistake made by the asm! block author.

We routinely look much farther than "behind the curtain" while writing asm!. We observe and poke the world which AM has absolutely no idea about. Yes, if we are not careful we may observe garbage instead of expected AM state, but it's our area of responsibility as programmers, not AM's.

Again, my point is that the mistake is not reading the garbage, the mistake is incorrectly assigning some meaning to it.

2 Likes

Except that case is so much easier. It's trivially acceptable for asm! or FFI to manipulate state which isn't AM state. Nothing about the AM cares what is done outside the AM.

So, terms matter here. It is not acceptable to read the AM state without using a logical AM operation to do so. However, you can argue that it is acceptable to read the machine state, so long as the only thing you learn is the machine state, and not the AM state. Unless you interact with state in a way blessed by the AM implementation to correspond to

Secondly, the defined semantics of an asm! block do not need to correlate to the implementation in any way, only the AM-relevant state transitions. So your optimized strlen could have a defined semantics of the trivial strlen but read in machine word chunks of machine state just fine, as long as you sufficiently for your needs show that the observed behavior is fully equivalent.

For this point, it may be helpful to note that while the AM has a concept of load alignment, all loads are functionally a loop of "read byte, requiring alignment." It's a bit difficult in surface Rust, but you can read a byte with a high alignment requirement without requiring that a load of that size would be valid.

Making the mapping between asm! implementation and the AM semantics does not need to be simple nor straightforward, but it does need to be possible.

1 Like

But we deal only with machine state while writing asm!. Now, there is a correspondence between abstract machine state and the real machine state which we get after lowering. And as we discussed above, this correspondence can be quite non-trivial due to the presence of compiler optimizations. To properly prove semantics of an asm! block we have to know low-level details of this lowering, but it goes far beyond just the AM semantics. Yes, being able to describe asm! block semantics in terms of AM operations makes it much easier to do this reasoning, but in my opinion it's not a strict requirement. The freeze(&Foo) example works correctly in practice and interacts with AM without any issues. AFAIK none of potential compiler optimizations would break it.

For example, let's say I wrote an asm! block for zeroizing stack (e.g. for a cryptographic library). The block takes stack pointer and zeroizes memory below it. Seems fine because we do not interact with AM state, right? But on a system where stack grows up the block no longer would be correct! Even worse, stack handling may in theory depend on compiler flags.

Another example:

// Let's assume for simplicity that this code is Linux-only

#[repr(align(4096))]
#[repr(C)]
pub struct Page([u8; 4096]);

pub fn freeze_page(page: &mut MaybeUninit<Page>) -> &mut Page {
    unsafe {
        core::arch::asm!(
            "mov qword ptr [{}], 0",
            in(reg) page.as_mut_ptr(),
            options(nostack, preserves_flags),
        );
        page.assume_init_mut()
    }
}

Here we work around the possibility of a page marked with MADV_FREE. Yes, I know that we currently consider that this function contains UB, but provide a concrete example of how it could break in practice (without an unreasonably "smart" compiler which can peek inside asm!).

From the AM point of view, asm! is a black box, the compiler can not peek inside of it and know whether it zeroized just one u64 or zeroized one u64 and filled everything else with randomly generated bytes. The AM-level uninit bit gets effectively erased by this external code. So AM only cares that the "frozen" memory does not break its assumptions about memory (i.e that without writing we always read the same value).

On a bare metal system we even can skip the u64 write completely and use asm! just as a black box against undesired compiler optimizations since we know that we work with "physical" memory without any virtual memory shenanigans.

1 Like

That's literally what @CAD97's example shows. Here's another one:

fn my_fun(x: &mut i32, f: impl FnOnce()) {
  *x = 0;
  *x = 1;
  f();
  *x = 2;
}

When building with panic=abort, the compiler is allowed to remove the *x = 1. (SB would also let it remove the *x = 0 but TB might not permit that... let's not have that discussion here okay? :wink: )

But if an inline asm block inside f were allowed to even just read from the part of the AM state that x points to, then obviously such an optimization would be unsound.

So, to ensure that inline assembly is compatible with optimizations on the rest of the code, and to ensure that code which does not want to use inline asm can be optimized as much as it could if inline asm did not exist, it is crucial that the AM state transition implemented by the inline asm block cannot even observe parts of the AM state that Rust code could not observe here.

To be clear, since that may explain some of the confusion above: It is totally legal to write an inline asm block that stores a MaybeUninit<i32> in some register, loads it again, and returns the result as i32. However, to justify the legailty of this, you have to declare that the semantics of your block is that it returns some arbitrary non-deterministically-chosen number, and then the rest of the code has to be correct under those semantics. That is obviously not the same semantics as freeze, and that is the entire point.

But you cannot at the same time say "this behaves like freeze" and "this is allowed to always return garbage". Those two statements just cannot be true at the same time.

Exactly. And by saying that this behaves like freeze, you are assigning meaning to it, since freeze is defined to preserve the original contents if they were already initialized (i.e., it does not always return garbage).

As I just sketched, it is possible to consider this function well-defined with the semantics of "reset *page to contain arbitrarily non-deterministically-chosen contents (without provenance)". But it is not possible to consider this function well-defined with freeze semantics, where existing contents are preserved if they have been already initialized.

We provided concrete examples of how other inline asm blocks that observe machine state which Rust code cannot observe can break in practice. I think that validates my approach in principle, so now the burden of proof is on whoever wants to introduce exceptions from this principle for particular cases.

I already acknowledged upthread that specifically for freeze, it is possible that all optimizations remain sound, due to specifically how that state works. But our correctness standards are generally higher than "there exists the possibility of a correctness argument for this code, but nobody has actually been able to come up with a solid argument".

This is wrong, and that's what we are trying to tell you. :slight_smile: As long as you are writing Rust code, you are dealing with Rust AM states. If you use asm!, you additionally deal with concrete machine states, but never ever can you ignore the AM state when writing anything that passes through the Rust compiler.

The urban myth than inline assembly lets you go down to the concrete machine level and entirely ignore the Abstract Machine is just that, an urban myth. It is fundamentally incompatible with how optimizing compilers work. You can curse the fact that we have optimizing compilers, you can demand a language that is less optimized and thus makes writing inline assembly much easier -- but you cannot have both powerful optimizations and easy inline assembly, that's just a logical impossibility.

3 Likes

However, to justify the legailty of this, you have to declare that the semantics of your block is that it returns some arbitrary non-deterministically-chosen number, and then the rest of the code has to be correct under those semantics.

It's worth noting that these are precisely the semantics of arbitrary() which is what I was proposing in the first place.

Fair. :wink: The discussion quickly moved to freeze as that is usually the primitive people ask for here.

arbitrary() is compatible with what I laid out for inline asm blocks so I am confident it is compatible with the rest of the compiler. It does have the problem of leaking the contents of uninit memory into the program; that is a separate problem to the inline asm concerns. (Let's not derail this thread by talking about such leakage, we already have Rust needs a safe abstraction over uninitialized memory for that.)

1 Like

Are you talking about the asm!-based freeze(&Foo) example or about reading random memory in general?

If it's the latter, then I never said that it should behave like freeze. My argument was that the read itself is not the problem. In other words, this function should be sound in my opinion:

fn read_random_mem() {
    let val: u64;
    unsafe {
        core::arch::asm!(
            // construct some random non-trapping ptr to proper memory
            // (i.e. not to a hardware register) and read it to `val`
            "...",
            out(reg) val,
            options(nostack, readonly, preserves_flags),
        );
    };
    println!("{val}");
}

Since we do not make any assumptions about the read value, it does not matter whether we read something which corresponds to AM state or not. As a more practical example, I could write a function which prints all process memory to stdout and it should be sound despite the fact that it reads "AM state" (let's consider only single-threaded process for simplicity).

My argument was far more subtle than that and I tried to explain it after the part you quoted out of context.