How does inline assembly and the physical machine fit into the abstract machine model, or exist outside of it?

CAD97 · March 11, 2025, 6:06pm

The optimizer is allowed to to optimization which doesn't perfectly maintain the lowered AM state, so long as the logical AM state is maintained. This requires knowing the set of possible ways that the state can be observed. As an example, turning

let mut place = 0i32;
let r1 = &mut place;
ffi(r1);
let r2 = &mut *r1;
*r2 = 1i32;
asm!(…);
*r2 = 2i32;
…

into

let mut place = 0i32;
let r1 = &mut place;
ffi(r1);
let r2 = &mut *r1;
asm!(…);
*r2 = 2i32;
…

as there is no valid way for the asm! line to read or write place without making the write to *r2 invalid and UB^[1]. If we allow any observation, then it could peek through the r1 reference that was stashed somewhere by the ffi call.

But the whole point of optimizations is that they're required to be behavior preserving. You're basically arguing that you should be allowed to ignore that and look "behind the curtain" as long as you pinky promise that you don't actually care about what you see. Which isn't observing AM state, it's observing arbitrary garbage.

If we're given a guarantee that the asm! terminates, anyway. Diverging makes simple examples very difficult to construct soundly. ↩︎

Vorpal · March 11, 2025, 6:16pm

I follow on the write, but why can't it read? If using asm you are implicitly using the machine model, and on all major architectures I know you will get a value when reading an address in memory as long as it is mapped. Sure it might not be the value you want or expect, but it is ok to read it.

As long as you then don't assume it contain something valid it should be fine. An example of this are highly optimised strlen implementations, that will read 8 or 16 bytes aligned and check if they contain any nul byte, and if so figure out which one it was. Since it is aligned it can never cross a page boundary and is fine. But we throw away the invalid data (the safety invariant being that data is initialised up until the nul byte).

Another use case for reading memory without respecting the AM is debugging/dumping. Ideally this should be done out of process, but that is rarely an option in a kernel or embedded. Unless you have a JTAG debugger hooked up you might have to do that in process.

newpavlov · March 11, 2025, 9:25pm

And why exactly it's a problem? I explicitly said that compiler optimizations may result in unexpected results if you are not careful while writing the asm! block. We did not break any AM-level optimizations here, we only have a mistake made by the asm! block author.

We routinely look much farther than "behind the curtain" while writing asm!. We observe and poke the world which AM has absolutely no idea about. Yes, if we are not careful we may observe garbage instead of expected AM state, but it's our area of responsibility as programmers, not AM's.

Again, my point is that the mistake is not reading the garbage, the mistake is incorrectly assigning some meaning to it.

CAD97 · March 11, 2025, 10:51pm

Except that case is so much easier. It's trivially acceptable for asm! or FFI to manipulate state which isn't AM state. Nothing about the AM cares what is done outside the AM.

So, terms matter here. It is not acceptable to read the AM state without using a logical AM operation to do so. However, you can argue that it is acceptable to read the machine state, so long as the only thing you learn is the machine state, and not the AM state. Unless you interact with state in a way blessed by the AM implementation to correspond to

Secondly, the defined semantics of an asm! block do not need to correlate to the implementation in any way, only the AM-relevant state transitions. So your optimized strlen could have a defined semantics of the trivial strlen but read in machine word chunks of machine state just fine, as long as you sufficiently for your needs show that the observed behavior is fully equivalent.

For this point, it may be helpful to note that while the AM has a concept of load alignment, all loads are functionally a loop of "read byte, requiring alignment." It's a bit difficult in surface Rust, but you can read a byte with a high alignment requirement without requiring that a load of that size would be valid.

Making the mapping between asm! implementation and the AM semantics does not need to be simple nor straightforward, but it does need to be possible.

newpavlov · March 12, 2025, 11:13am

But we deal only with machine state while writing asm!. Now, there is a correspondence between abstract machine state and the real machine state which we get after lowering. And as we discussed above, this correspondence can be quite non-trivial due to the presence of compiler optimizations. To properly prove semantics of an asm! block we have to know low-level details of this lowering, but it goes far beyond just the AM semantics. Yes, being able to describe asm! block semantics in terms of AM operations makes it much easier to do this reasoning, but in my opinion it's not a strict requirement. The freeze(&Foo) example works correctly in practice and interacts with AM without any issues. AFAIK none of potential compiler optimizations would break it.

For example, let's say I wrote an asm! block for zeroizing stack (e.g. for a cryptographic library). The block takes stack pointer and zeroizes memory below it. Seems fine because we do not interact with AM state, right? But on a system where stack grows up the block no longer would be correct! Even worse, stack handling may in theory depend on compiler flags.

Another example:

// Let's assume for simplicity that this code is Linux-only

#[repr(align(4096))]
#[repr(C)]
pub struct Page([u8; 4096]);

pub fn freeze_page(page: &mut MaybeUninit<Page>) -> &mut Page {
    unsafe {
        core::arch::asm!(
            "mov qword ptr [{}], 0",
            in(reg) page.as_mut_ptr(),
            options(nostack, preserves_flags),
        );
        page.assume_init_mut()
    }
}

Here we work around the possibility of a page marked with MADV_FREE. Yes, I know that we currently consider that this function contains UB, but provide a concrete example of how it could break in practice (without an unreasonably "smart" compiler which can peek inside asm!).

From the AM point of view, asm! is a black box, the compiler can not peek inside of it and know whether it zeroized just one u64 or zeroized one u64 and filled everything else with randomly generated bytes. The AM-level uninit bit gets effectively erased by this external code. So AM only cares that the "frozen" memory does not break its assumptions about memory (i.e that without writing we always read the same value).

On a bare metal system we even can skip the u64 write completely and use asm! just as a black box against undesired compiler optimizations since we know that we work with "physical" memory without any virtual memory shenanigans.

RalfJung · March 19, 2025, 3:16pm

That's literally what @CAD97's example shows. Here's another one:

fn my_fun(x: &mut i32, f: impl FnOnce()) {
  *x = 0;
  *x = 1;
  f();
  *x = 2;
}

When building with panic=abort, the compiler is allowed to remove the *x = 1. (SB would also let it remove the *x = 0 but TB might not permit that... let's not have that discussion here okay? )

But if an inline asm block inside f were allowed to even just read from the part of the AM state that x points to, then obviously such an optimization would be unsound.

So, to ensure that inline assembly is compatible with optimizations on the rest of the code, and to ensure that code which does not want to use inline asm can be optimized as much as it could if inline asm did not exist, it is crucial that the AM state transition implemented by the inline asm block cannot even observe parts of the AM state that Rust code could not observe here.

To be clear, since that may explain some of the confusion above: It is totally legal to write an inline asm block that stores a MaybeUninit<i32> in some register, loads it again, and returns the result as i32. However, to justify the legailty of this, you have to declare that the semantics of your block is that it returns some arbitrary non-deterministically-chosen number, and then the rest of the code has to be correct under those semantics. That is obviously not the same semantics as freeze, and that is the entire point.

But you cannot at the same time say "this behaves like freeze" and "this is allowed to always return garbage". Those two statements just cannot be true at the same time.

Exactly. And by saying that this behaves like freeze, you are assigning meaning to it, since freeze is defined to preserve the original contents if they were already initialized (i.e., it does not always return garbage).

As I just sketched, it is possible to consider this function well-defined with the semantics of "reset *page to contain arbitrarily non-deterministically-chosen contents (without provenance)". But it is not possible to consider this function well-defined with freeze semantics, where existing contents are preserved if they have been already initialized.

We provided concrete examples of how other inline asm blocks that observe machine state which Rust code cannot observe can break in practice. I think that validates my approach in principle, so now the burden of proof is on whoever wants to introduce exceptions from this principle for particular cases.

I already acknowledged upthread that specifically for freeze, it is possible that all optimizations remain sound, due to specifically how that state works. But our correctness standards are generally higher than "there exists the possibility of a correctness argument for this code, but nobody has actually been able to come up with a solid argument".

This is wrong, and that's what we are trying to tell you. As long as you are writing Rust code, you are dealing with Rust AM states. If you use asm!, you additionally deal with concrete machine states, but never ever can you ignore the AM state when writing anything that passes through the Rust compiler.

The urban myth than inline assembly lets you go down to the concrete machine level and entirely ignore the Abstract Machine is just that, an urban myth. It is fundamentally incompatible with how optimizing compilers work. You can curse the fact that we have optimizing compilers, you can demand a language that is less optimized and thus makes writing inline assembly much easier -- but you cannot have both powerful optimizations and easy inline assembly, that's just a logical impossibility.

abgros · March 19, 2025, 3:44pm

However, to justify the legailty of this, you have to declare that the semantics of your block is that it returns some arbitrary non-deterministically-chosen number, and then the rest of the code has to be correct under those semantics.

It's worth noting that these are precisely the semantics of arbitrary() which is what I was proposing in the first place.

RalfJung · March 19, 2025, 4:04pm

Fair. The discussion quickly moved to freeze as that is usually the primitive people ask for here.

arbitrary() is compatible with what I laid out for inline asm blocks so I am confident it is compatible with the rest of the compiler. It does have the problem of leaking the contents of uninit memory into the program; that is a separate problem to the inline asm concerns. (Let's not derail this thread by talking about such leakage, we already have Rust needs a safe abstraction over uninitialized memory for that.)

newpavlov · March 21, 2025, 9:33pm

Are you talking about the asm!-based freeze(&Foo) example or about reading random memory in general?

If it's the latter, then I never said that it should behave like freeze. My argument was that the read itself is not the problem. In other words, this function should be sound in my opinion:

fn read_random_mem() {
    let val: u64;
    unsafe {
        core::arch::asm!(
            // construct some random non-trapping ptr to proper memory
            // (i.e. not to a hardware register) and read it to `val`
            "...",
            out(reg) val,
            options(nostack, readonly, preserves_flags),
        );
    };
    println!("{val}");
}

Since we do not make any assumptions about the read value, it does not matter whether we read something which corresponds to AM state or not. As a more practical example, I could write a function which prints all process memory to stdout and it should be sound despite the fact that it reads "AM state" (let's consider only single-threaded process for simplicity).

My argument was far more subtle than that and I tried to explain it after the part you quoted out of context.

RalfJung · March 23, 2025, 10:14am

Okay, it seems I misunderstood then. Sorry for that. I agree this function is sound under the model I proposed above. I was arguing why freeze cannot be implemented in that model.

Your function does violate the property regarding leaking the contents of uninit memory that we talked about, but that is not itself a soundness property -- it's more of a (implicit) social contract, since there is no "official" operation that will behave that way.

arielb1 · April 23, 2025, 8:26am

To the extent the user program is predictable (e.g. from its interactions with the environment), the "justification" for inline assembly can do "if the original program before optimizations would have always have this byte initialized, return the byte. Otherwise return a nondeterministic byte".

I suspect that this forces the compiler to never miscompile freeze from a user perspective (the user never cares about the state of the program in mid-optimization, only about the source Rust program and the generated assembly program), while being an imperfect definition.

Of course, that interpretation means that this program can panic:

fn main() {
    let v = MaybeUninit::<u8>::uninit();
    let x = 0;
    let y = 0;
    let nondet = (&x as *const i32 as usize) + 4 == (&y as *const i32 as usize);
    if nondet {
        v.write(0);
    }
    let data = freeze(v);
    assert!(!(nondet && data != 0));
}

But I believe that normal users of freeze don't care (e.g. in SIMD, the length of the read is normally predictable, and the bytes beyond the length don't matter).

At least it means that if you do "superstitious leaking" of your data to assembly code [i.e. in the start of the SIMD algorithm, do an inline asm statement that takes the data pointer + length as arguments] - which generally won't impact performance for typical SIMD algorithms, and would allow fairly easy prediction of the code's behavior - your code is guaranteed not to be miscompiled, since it's possible to predict which bytes are important depending on that. But I'll rather not tell people to do "superstitious leaking".

arielb1 · April 28, 2025, 1:28pm

The concept of "this program has UB but it is impossible to miscompile" can be wider than that e.g.

fn ub_if_odd_power(p: BigPrime, g: BigNum, g_x: BigNum) {
    let x = nondet();
    if x < p && x.is_odd() && pow(g, x, p) == g_x {
        unsafe { intrinsics::unreachable(); }
    }
    println!("OK!");
}

In theory, the compiler can generate a function that will UB for (p, g, g_x) values where log(g_x, g) is odd. In practice (assuming DLP is hard, and the compiler generates code that finishes for reasonable inputs in reasonable time), the compiler must generate assembly code prints "OK!" for almost all values of (p, g, g_x) (otherwise you could use the program to break DLP!).

[Of course, an "attacker" that knows x and can influence the non-determinism could "guide the program to UB", but assume that there is no such attacker].

And of course, we'll like the programs that we write to actually have a correct semantic interpretation, rather than merely being impossible to "miscompile", but I think this is indeed a useful concept.

bjorn3 · April 28, 2025, 2:24pm

With PGO the compiler may realize that p and g are often a certain value and decide to add a specialization for this specific combination of values. Once it does, if this specialized value is one for which there is UB, it becomes trivial for the compiler to notice this.

steffahn · April 28, 2025, 3:24pm

I don’t think this argument follows the correct level of abstraction. It assumes that the compiler takes the ub_if_odd_power as a whole, then turns it into assembly in isolation, and then the meaning of ub_if_odd_power is defined entirely in terms of that generated assembly. This is a common instinct: Write program with UB, inspect the assembly of an individual function, determine that that assembly is the desired assembly, conclude that thus the UB is “fine”. Your example here would avoid the common counterpoint to such reasoning that “the assembly may change for different compilation or compiler models” because it argues why such assembly, generated in isolation, can reasonably never be undesirable.

But nobody ever promised that there's any "one function at a time" compilation model. In fact, it very commonly is very much incorrect, isn’t it? (E.g. with inlining.) So the argument cannot apply to single functions at all, but perhaps only to whole executable. (As mentioned in the previous reply, perhaps even not at that level, e.g. considering profile-guided optimization, or other ways to make compilation explicitly not a process of compiling each whole executable in isolation.)^[1]

Back to optimizations/compilation within the same executable, e.g. with inlining: if inlining is possible, then the caller of ub_if_odd_power might simply be doing a call of the shape ub_if_odd_power(p, g, pow(g, some_x, p)) or perhaps ub_if_odd_power(p, g, some_extra_calculation_step(pow(g, some_x, p)))^[2], in which case it seems a very reasonable possible choice that within the ub_if_odd_power call, x at run-time gets assigned the same value of some_x.^[3] Besides literal inlining itself, the same effect could also be achieved with merely generating specialized versions for ub_if_odd_power for a known context where some set of argument values are known to never occur; or where arguments are provided in a different form that’s somehow preprocessed or has additional context information added (which can then “influence the non-determinism”, after all, so to speak). And even a context without inlining and without specialization, it may still be allowed to have the caller have extra knowledge^[4] about the abstract behavior of the called function, so that e.g. for cases that can be statically proven to cause UB, or cause UB under certain conditions^[5] the UB can propagate anyway.

Also, with a more general notion of “compilation” not only to assembly but to other formats, e.g. compilation into another programming language can mean a “miscompilation” is trivially possible if the target format itself has ways to express nondeterminism and undefined behavior.
(By the way, I’m not entirely sure what exact notion of “miscompilation” you have in mind here in the first place. The phrase “program with UB but impossible miscompilation” sounds like you do not mean “miscompilation” to mean “actual behavior differs from defined behavior” because there was never a defined behavior in the first place.) ↩︎
especially if some_extra_calculation_step has some chance of returning pow(g, some_x, p) itself ↩︎
reasonable because this has the potential to make the pow(g, x, p) call inside look identical to the pow(g, some_x, p) outside, and maybe let’s say the compiler already determined that pow is a function that has no side-effects if it terminates and returns without panic, so subsequent identical calls are allowed to be deduplicated. ↩︎
more precisely: the compiler gets to have extra knowledge (beyond the plain assembly code previously generated for that function) during the compilation of the caller ↩︎
which may become computationally much easier to check in the caller’s context where the arguments always have a particular shape ↩︎

tczajka · April 29, 2025, 10:05am

All UB is impossible to miscompile. That's what UB means.

arielb1 · April 30, 2025, 11:06am

No it's not. Given (p, g, g_x) it's going to be hard for the compiler to find the x, and therefore it can't know whether there is UB or not (of course, the compiler could introduce non-OK! behavior for a small subset of triples, for example for (p, g, g), but not for a large subset of them).

I mean "miscompile" in the sense of "non-printing OK!" behavior rather than in the "bug in the compiler" sense.

tczajka · April 30, 2025, 11:36am

You seem to be contradicting yourself. You said the program might not print "OK!" sometimes ("for a small subset of triples"), and so clearly it's not "impossible" for it to not print "OK!". So I don't understand what point you're making.

fintelia · April 30, 2025, 5:00pm

No. If the compiler noticed that the function was definitely going to be called with arguments that triggered UB in even a single place, then it could replace the behavior for every input value with something that didn't print "OK".

robinm · May 2, 2025, 8:16am

This situation nearly never occurs. Most cases of UB are UB for a subset of the inputs. The presence of UB in the program for a subset of all possible input cannot change the execution of a program that didn’t or will not trigger any UB for another subset of all possible inputs.

No. If the compiler noticed that the function was definitely going to be called with arguments that triggered UB in even a single place, then it could replace the behavior for every input value with something that didn't print "OK".

That’s exactly the same situation for *some_pointer_that_may_be_null that cannot be unconstitutionally replaced by garbage because is some cases some_pointer_that_may_be_null is null.

The complexity of detecting that some assumption are broken, ie. that UB is triggered is irrelevant.

binarycat · May 9, 2025, 4:28am

/proc/self/mem is outside the scope of rust's safety guarantees. otherwise this would open a can of worms that could not be closed except by forbidding debuggers from poking memory or marking every impure function as unsafe.

like, what if you invoke gdb to poke memory? what if you modify .bash_profile to poke memory? what if you send an http request to a hypervisor to poke memory? what if you run a spin loop, and the noise of the fans whirring up activates a beam of ionizing radiation pointed directly at the CPU? what if opening the disk drive makes the computer fall off a cliff?

Topic		Replies	Views
[Pre-RFC]: Inline assembly language design	70	14205	March 25, 2019
Stabilization path for asm!()? language design	11	3320	March 25, 2019
Richer inline asm compiler	4	1080	November 19, 2022
Reading and erasing bits outside of an allocation in inline asm Unsafe Code Guidelines	2	301	August 27, 2024
[Pre-RFC #2]: Inline assembly language design	161	10731	March 15, 2020

How does inline assembly and the physical machine fit into the abstract machine model, or exist outside of it?

Related topics