How does inline assembly and the physical machine fit into the abstract machine model, or exist outside of it?

So this example is better?

#[repr(C)]
#[repr(align8)]
pub struct Foo {
    a: u32,
    b: u8,
}

pub fn freeze(x: &Foo) -> u64 {
    unsafe {
        let r: u64;
        core::arch::asm!(
            "mov {1}, qword ptr [{0}]",
            in(reg) x,
            out(reg) r,
            options(nomem, nostack, preserves_flags),
        );
        r
    }
}

Most programmers expect that asm! blocks exist outside of the AM. The only connection between them is input/output registers and options, everything else is just a black box (well, there are also some extremely annoying technicalities like needing to enable target features if you use certain instructions, but they are not important here). Yes, this black box may wreak havoc, but (informally) proving correctness of the asm! block is responsibility of the author and relies on reasoning outside of the AM model.

Unfortunately, stuff like creating &mut [u8] references to an uninitialized memory by freezing is tied closer to AM since because of MADV_FREE such memory may observably break assumptions used during optimizations. Doing so probably at the very least would require "touching" memory pages before returning the reference.

There was also a somewhat similar example I encountered recently:

It's unclear how it's better to formalize it, but these tools exist and some programmers rely heavily on them in practice.

Note that I say it's not freeze not due to the implementation—that's fine—but because angelic nondeterminism is the incorrect spec for what we want freeze to be. freeze wants demonic nondeterminism, but there's no established way to specify that an initialized byte remains as-is and an uninitialized byte is replaced with a demonically selected nondeterministic value.

(Choose a random value, and its always the one that you want it to be is angelic. Choose a random value, and its always the one that causes the most issues is demonic. Demonic nondeterminism is fairly easy to optimize around. Angelic nondeterminism can cause optimization issues very easily.)

It may be the case that dropping down to x86 asm! implicitly freezes any value that passes through a register, as the physical machine cannot represent an abstract byte that is uninitialized. Or it may be that putting uninitialized into a physical machine register is UB. I don't think the latter is really useful in any way other than maintaining "freeze is UB" as a property, and that's a contributing factor to why I think we will end up exposing freeze in some form.

You cannot force the AM to do something it cannot do just by dropping down to asm!. But specifying freeze properly in the AM is not the hard part, at all. It's justifying the implications of introducing a new primitive nondeterministic operation to the AM.

Unfortunately, while convenient, this is a misleading partial truth. You still need to be able to define the transition function that the asm! implements, and if that definition relies on implementation details of how the AM is lowered to the target machine, you're opening yourself up to conflicts with optimization passes that (rightly) assume that those details cannot be observed from within conformant (UB free) code.

1 Like

By the same logic AM can not interact with the external world at all. Any volatile read from a hardware register or a syscall execution is "nondeterministic" in the same fashion as reading "frozen" stuff. My point is that we inevitably have the world outside of AM and it's impossible to model it in terms of AM, at some point we have to say "here is a black box, it does some stuff, dunno what exactly", while modeling of the whole system together with the black box has to be done at a completely different level which includes low-level details about the target platform.

4 Likes

But the guarantee that freeze would be violating is not an AM guarantee.

The guarantee is, to quote Ralf: "every UB-free Rust program is guaranteed to never depend on the contents of uninitialized memory."

But in the AM, uninitialized memory does not have any secret contents that you can "depend" on. To the contrary, the contents are well-known: every byte has the value 'uninit'!

(…And if you passed those bytes through a hypothetical freeze operation, you would get some nondeterministic values that technically have nothing to do with the original memory.)

Only at the assembly level can we speak of the concept of secrets from past allocations being left behind in memory. Therefore, using asm! to retrieve those secrets does already break the guarantee. Which means it's not a valid guarantee.

I can only distinguish the situations in one way. Hypothetically, we might someday define an intermediate abstraction layer, some kind of Semi-Concrete Machine that is low-level enough to model the allocator as potentially reusing allocations, but still higher-level than assembly. I'm fairly skeptical of whether such a machine could be coherently defined or would be useful. But if we did define one, then freeze might be the only operation guaranteed to reveal Semi-Concrete bytes.

But even then, it wouldn't matter! We don't care whether secrets are guaranteed to be disclosed; we care whether they're at risk of being disclosed. For any secret byte in the Semi-Concrete Machine, which might be at risk of being disclosed through freeze, there's a high chance a corresponding byte would exist in the assembly-level machine, which would be at risk of being disclosed through asm!. And a chance of a risk is the same thing as a risk. It's not like we would ever make guarantees that some byte in the Semi-Concrete Machine doesn't exist in the assembly-level machine.

4 Likes

It's about operations, not guarantees. In the end, to rationalize an inline asm block, a user has to do the following:

  • Pick a specification for what the block does, as a function transforming the AM state before the block into the AM state after the block (or more precisely, the output is a set of AM states to capture non-determinism, and can also include UB, and things get even more complicated if you need to account for concurrency). The extra flags set on the asm blocks can constrain this state transformation, e.g. readonly means that most of the AM state must be left unchanged.
  • Ensure that the asm code in the block indeed acts on the AM state in the way described by the spec. The asm block can have arbitrary other side-effects not visible to the AM (unless it is marked pure or readonly), those don't matter. (This answers @newpavlov's question.) However, every part of the asm state that the compiler relies on to implement the AM semantics must be left intact. Note that this step might have to rely on how exactly the Rust compiler maps AM states to asm states; this mapping is in general considered unstable, so in principle you can only rely on what is documented about it. This mapping can also be implementation-specific. (The formal name for this mapping is "simulation relation" as it describes how the asm execution simulations the AM execution.)
  • Demonstrate that the exact same state transformation could be written in regular Rust code.

From then on, reasoning about the program can proceed on the AM level by treating the asm block as a black box that performs the AM transition picked in the first step.

For freeze, the last step fails: there is no Rust code that can achieve the AM state transition that the inline asm block is intended to provide. freeze is a somewhat special case in that I am not aware of optimizations that it actually breaks, but I would be extremely hesitant to permit any kind of deviation from the recipe laid out above. That recipe ensures that program analysis can assume the entire program is written in Rust, which is what MIR-level compiler optimizations inevitably will have to assume once they want to do more interesting transformations (e.g. based on the aliasing model).

1 Like

Since when has the external world become part of the "AM state"? If I have a file into which I write and then read from it, in my understanding, the file system state is not part of the AM state. Same with calling the getrandom syscall. It's impossible to model it in terms of AM and "demonstrate that the exact same state transformation could be written in regular Rust code".

You could specify that getrandom returns "random" data, but then I could hijack the syscall and return deterministic data or even zeros. Exact behavior of syscalls can not be properly modeled inside AM (outside of special operations like malloc) and code transformations done using AM should not rely on the "specification for what the block does" outside of what was specified in options and input/output arguments.

A better question in my opinion is: what properties asm! blocks and syscalls should satisfy to not break assumptions used during code transformations which are done in the AM terms. On the first glance, non-trapping reads of some random memory in asm! blocks are "correct" in this sense and should not be different from the compiler point of view from a getrandom syscall.

Why can't I use this "demonstration" for the freeze function in my snippet?

fn freeze(x: &Foo) -> u64 {
    let mut res = get_random_u64();
    res &= 0xFFF;
    res |= (x.a as u64) << 32;
    res |= (x.b as u64) << 24;
    res
}

You could say that get_random_u64 can not be "written in regular Rust code", but we circle back to how AM interacts with the external world.

You also may say that programmers may expect that freeze calls for the same pointer with unchanged underlying data should return the same frozen data, but this guarantee is less important and can be discussed separately.

I don't think it answers my question. These side effects do matter, since they may influence how other asm blocks act on the AM state.

1 Like

I think Ralf is arguing that your freeze is different because it always modifies the data, whereas a real freeze only modifies the data if it was uninit.

Ralf, is that right?

But…

You're arguing that @newpavlov's asm! example is currently impossible to "rationalize".

But what does it mean to be impossible to rationalize? Does that mean UB?

[Well, actually the example is definitely UB, because it loads from memory while being marked nomem. But I assume that's just a mistake. Let's pretend nomem is removed. Is it still UB?]

I think you are arguing yes, it is UB, because it would violate what future "MIR-level compiler optimizations" may "assume"?

That seems pretty alarming to me. The asm block can be assigned a well-defined AM state transition function. If you do a case analysis, it's easy to show why the asm block would obey that function for the init case, for the uninit case, and for every possible partially-uninit case. And of course, for each case, the state transition function for that case can be simulated in pure Rust.

But because the overall state transition function can't be simulated in pure Rust, the asm block is UB anyway?

If so, that seems like an argument for adding freeze to the language, to avoid adding UB to what seems to be perfectly well-behaved asm.

I don't know if there are any other situations besides freeze where an AM state transition function can be broken down into individually-simulatable cases but is not simulatable as a whole.

Not to put you on the spot, but is this really what you meant before by "every UB-free Rust program is guaranteed to never depend on the contents of uninitialized memory"?

Surely you were at least alluding to the security issue of disclosing the contents of asm-level uninitialized memory, even if you had the simulation issue in mind as well.

Or else I badly misinterpreted you.

2 Likes

In general that would make it impossible to write a kernel in Rust I think. On some architectures there are instructions you need to use as a kernel that no rust code will ever generate. For example:

  • Flushing TLB mappings after modifying the memory map.
  • Writing to or reading from MSRs (model specific registers, these generally configure or probe the CPU in various ways). This might be needed to enable extended instruction and register sets (e.g. you can't do AVX in user space unless the kernel writes a bit to a register saying "I know how to handle AVX when context switching") as well as many other things.
  • Enabling secondary CPU cores at boot up
  • Interacting with the IO bus on x86 (x86 not only have memory mapped IO, it also has an entirely separate IO address space as well, a legacy from the original 8088)
  • Saving and restoring registers at context switching.

None of these can be done in pure rust. Without them, you wouldn't have OS to run on. Are you saying they are all (or at least some) UB?

1 Like

Maybe I've misunderstood, but doesn't reading from process memory using the system API qualify as "the exact same state transformation"?

use std::fs::File;
use std::io::{Seek, Read, SeekFrom};
use std::mem;
use std::slice;

fn main() {
    let uninit_frozen = {
        let mut buf = mem::MaybeUninit::<[u8; 100]>::uninit();
        let addr = &raw const buf as u64;

        let mut process_memory = File::open("/proc/self/mem").unwrap();
        process_memory.seek(SeekFrom::Start(addr)).unwrap();

        unsafe {
            let mut slice = slice::from_raw_parts_mut(&raw mut buf as *mut u8, mem::size_of_val(&buf));
            process_memory.read(&mut slice).unwrap();
            buf.assume_init()
        }
    };

    println!("{:?}", uninit_frozen);
}

Unfortunately MIRI can't confirm whether this is UB.

I am not sure I understand the distinction. Are you talking about mutation of the temporary variables? I don't think it should matter? Similarly to how it does not matter whether we write let x = true; or let x = !false;

The function takes a reference and returns u64. From the AM point of view, the AM state after execution of both functions is the same (modulo the exact value of u64). The only difference is that the "pure Rust" function is more transparent for the compiler and it can track the value of the highest 40 bits and apply optimizations based on that.

Yes, it's a mistake (I modified the snippet from the earlier comment without checking options).

I believe the concession that you're looking for is that one of the AM capabilities is to do a foreign function call. Anything which touches the AM state (i.e. a read or write of memory reachable by the AM) must be describable in terms of AM operations, but anything which is only visible to the foreign machine can work by the foreign machine's rules.

Given that, I think it's actually possible to define extern "C" fn freeze(MaybeUninit<u8>) -> u8. Specifically, we guarantee that the ABI of MaybeUninit<scalar> matches that of scalar, so all you need is a foreign machine capable of defining the extern "C" interface which can both accept an uninitialized value across said ABI and promote an uninitialized value to an initialized one.

Is x86 asm! such a foreign language? You could argue either way, honestly; register uninitialization is not a thing in the x86 language, so it's a valid choice to say that the lowering of uninit into x86 picks an arbitrary byte value demonically. But it's also a valid choice to say that asm! is an extended form of x86 assembly that can preserve uninit in registers, or to say that it's UB to call a "C" ABI with uninit values to a foreign language that disagrees on how to model an uninit parameter.

The safe position would be that this call performs a transmute by ABI, and it's simply UB to transmute MaybeUninit::<scalar>::uninit() to scalar. But to be fully clear: I ultimately agree that having freeze as a valid operation, and x86 implicitly freezing any AM values which it reads into a register, is the better model, and the one that matches how people want to use asm! in practice. It's just not an obvious truth.

I also believe this to be the case, but it's a property of how uninit behaves; it's considered UB to use uninit in any way, even just to check whether it's uninit.

If someone wants to search for another case, it's going to necessarily involve AM state that isn't exposed to the concrete machine, e.g. provenance. Reading the first byte that a pointer has provenance to read could maybe be an example, depending on whether you count "guess the correct offset to apply" as individually simulatable.

I obviously can't speak for Ralf, but from my T-opsem framing, the operations are the core, the definition of the AM. Any and all properties or guarantees are not "part of" the AM, but are derived from the set of all possible operations.

The AM does not currently have a way to consume an uninitialized byte other than to overwrite it or to just copy it around. freeze adds a new thing which can be done: replace uninit with a nondeterministic arbitrary value. This new operation does invalidate any derived properties that rely on a value overwritten with uninit being unobservable.

/proc/mem/self is explicitly stepping outside the Rust semantics and querying the concrete implementation in a way that we've defined as being outside of scope. It's fine from the OS standpoint since the only security boundary is the process, but it's certifiably Cursed when trying to maintain any kind of state invariants on top of the concrete machine ones.

Yes, that can often work, but a kernel will do things that affect not just user space programs, but also the state of it's own AM. That is especially true when it comes to the "shape of" it's AM rather than the "contents of" the AM. With that I mean things like changing memory maps, number of concurrent executing threads, etc. While often this will be about those things for the user space, ay least during boot it will also be changing those things for itself.

In fact, there have been pushes during recent years in the Linux kernel to only map the memory you need to service a particular syscall or thread. This is in the name of least privilege principle, and the mechanisms for doing so have varied over time. I don't think much (apart from meltdown mitigations) have been merged though. This sounds similar to some discussions over on zulip about changing memory mappings in user space.

Yes. This is crucial. It is trivial to define state transition functions that violate fundamental invariants of Rust (such as modifying memory that a live shared reference points to). If we want to do any non-trivial optimizations, we must declare it UB to use an asm block whose AM state transition function cannot be expressed in pure Rust. (We could try to do something on a case-by-case basis, but that quickly becomes a nightly to analyze, and is otherwise almost equivalent to actually adding these operations as primitives. So we should just add them as primitives instead if we want to consider them permitted.)

2 Likes

I don’t think so. From an AM perspective, such values are still unobservable after adding freeze. From a concrete machine perspective, they were already observable before adding freeze.

3 Likes

What about the common usage of asm! blocks to perform things that Rust legitimately can't express, such as novel calling conventions, manipulating specific registers in ways pure Rust code can't, and so on? I'd say that most of things that inline asm do can't be expressed in pure Rust. "AM state transition function" is higher level, so maybe this doesn't matter?

Also, something more important: is there any difference if you write asm in another file rather than inline asm? I'm thinking about reading uninitialized memory but without exposing this memory to Rust, such that it's the asm can be viewed as a "black box". Surely this should be allowed right?

1 Like

Yes, your understanding is correct. Honestly, I am very surprised that it's not considered a very basic AM capability, since we can not do anything useful without it.

But I think that "anything which touches the AM state" is a tad too strict, we can relax it to "anything which observably (in the AM) mutates the AM state". As I wrote above, I believe that non-trapping reads of random memory are fine, since the read itself is unobservable on the AM level, only the value which we return and we can consider it "demonically" non-deterministic.

1 Like

Even if said random memory has a well defined value in the AM, the foreign code is not guaranteed to match what the value it has in the AM if it wasn't possible to replace the foreign code with AM code that reads the memory without UB. Because the lowering from the AM to the real machine assumes that there is no foreign entity inspecting and mutating its state except by following the rules of the AM.

I did not say it should have this guarantee, this is why I wrote that the value should be considered "demonically" non-deterministic in the base case.

In my opinion, the problem here is not the read per se, but (potentially incorrectly) assuming something about return data (e.g. by using unreachable_unchecked). In the freeze(&Foo) example I can only make assumptions about the highest 40 bits, while assuming e.g. that the lower bits will be equal to zero or even that it will be the same for different freeze calls is incorrect (it may be possible to add the later guarantee, but it's a separate problem).

I didn't say it is. Please carefully re-read what I wrote. The information density is very high. I'll write this down in a blog post eventually which explains this in 10x as many words, but I don't have the time currently to write that all up so right now I can only offer the highly condensed version, sorry.

The state transition I am describing transitions the AM state. If there's state that the AM cannot see nor observe, not even indirectly, the asm block can do with it whatever it wants. Ofc in reality the AM will eventually be able to observe the state change, but typically one has to insert a sort of fence to ensure that the act of doing the state change is ordered-before anything that may depend on it; in AM terms, that fence then performs some sort of reorganization on the abstract state (e.g. making some memory available), and that has to be expressible in Rust terms.

Reading or writing /proc/self/mem can easily cause UB. This is a well-known soundness hole, and going down that rabbit hole is not helpful for this discussion.

No.

No, we cannot. The compiler can do reasoning based on what other code can or cannot observe. If you use inline asm to extend the observational power of other code, you are potentially breaking compiler assumptions.

So in general it really has to be any kind of interaction with or dependency on the AM state that is so carefully guarded. As I already said, freeze is a somewhat special case in that I am not aware of optimizations that it actually breaks, but I would be extremely hesitant to permit any kind of deviation from the recipe laid out above.

The main reason it cannot have angelic non-determinism is that there is no operation you can write in pure Rust that is guaranteed to be angelically non-deterministic. Angelic non-determinism has global ramifications, e.g. you cannot move a malloc down across an unknown function call if that call may make an angelic choice. This kind of issue is exactly the reason why the inline asm block's action on the AM state must be expressible in Rust code.


I invite anyone who disagrees with the inline asm and FFI rules I laid down to come up with their own rules and then convincingly argue that any inline asm block or FFI call following those rules is compatible with every current and future optimization and analysis the compiler may make. I am fairly convinced that it is not possible, in general, to relax the requirements I described. I also think the vast majority of inline asm usages are actually compatible with these rules, much more so than people realize, but it requires a very particular kind of thinking to fit the actions of an inline asm block into this framework.

I also apologize for being so terse, but I am extremely short on time currently so I just don't have the time to spell this out in more detail.

3 Likes

Can you provide a concrete example of how observation of AM state by an external piece of code may break compiler optimizations? Because to me it does not look any different from freeze which you consider "special case". In both cases the observed result may contain something totally unexpected because of compiler optimizations, but I don't think such observations break existing optimizations by themselves.

Reading something relative to stack pointer and assuming that this memory contains a variable defined earlier in the code is incorrect, since compiler may move this variable using the unobservability assumption and it may not be present on the stack at all. But the problem is not with the read per se (assuming it does not trap), but with incorrectly assuming what exactly we read.

1 Like