Rust needs a safe abstraction over uninitialized memory

Well that's.. something. Reminds me of how creating raw pointers isn't unsafe per se, it's just dereferencing them that is.

Well what about the previously linked RFC?

(Sorry, actually this one does MaybeUninit<T> -> MaybeUninit<T> too)

Unfortunately, we are not at the point where this can even be reliably done. This needs some cross-cutting changes through LLVM and Rust to give the compiler an understanding of what the "secrets" are, so that it can preserve the property that "if the source code does not leak secrets, then the compiled program does not leak them either".

But anyway that's largely orthogonal to freeze; obviously attackers wouldn't be stopped by what you can or cannot do in UB-free Rust. If your program does have unrelated UB, whether or not Rust has freeze does not matter any more.

1 Like

It's not clear at all that there's a global property being lost here (maybe it was clear to you, but it wasn't stated and many people reading along will not immediately make this connection), and so I was pushing back on the idea that "just" adding freeze is an easy decision. Yes my first post was vague but then I linked to the RFC so from then on there was no "alluding".

Could we have a dangerous_bikeshed keyword that worked simularly for things that can't cause UB but are still really dangerous? (I feel like this leads towards an effect system, but let's not go there, since it feels like that effort stalled).

So, is this UB?

pub fn freeze(x: MaybeUninit<u64>) -> u64 {
    unsafe {
        let r: u64;
        std::arch::asm!(
            "// {}",
            inout(reg) x => r,
            options(nomem,nostack,preserves_flags),
        );
        r
    }
}
2 Likes

It depends on what you say is the AM specification of the asm! block, which is your responsibility to provide as the author of the asm!. (Additionally, a (normal) function definition cannot be UB by itself; you must call it in order for the AM to encounter the UB (which may then time travel).)

With the current draft specification of the AM, this function could be defined, although it would need to use angelic nondeterminism, the heaviest hammer we have, as there is no existing AM operation which determines whether a byte is uninitialized without also throwing UB if it encounters an uninitialized byte. (This is sufficient for there to be no UB at the (current) LLVM level, as specified.)

This operation is not freeze, though, although you could use it as freeze. freeze replaces an uninit byte via demonically nondeterministic choice, thus meaning that if any possible output value could trigger UB, your program has UB. Angelic nondeterminism is the opposite, doing "what the programmer intended[1]."

Additionally, it's still undecided whether asm! semantics are actually allowed to use nondeterministic pick/choose semantics, or if those semantics are limited to be used as part of the definition of the actual usable AM operations.

So the full answer to your question would be that it is undecided. By a strict reading of the currently defined rules, it is impossible to define your function's behavior, thus it is undefined. By a loose reading, the building blocks do exist, so if such defined, it is defined. The project currently explicitly reserves the right to decide either way, and thus it would be unsound to define your function or to rely on the absence of a freeze operation.

My personal opinion is that we are more likely to expose a freeze than not, but this is not a strongly held opinion. After all, the preservation of write_volatile to zeroize secrets on drop is only maintained by best-effort quality of implementation in the compiler[2].

The main issue is getting ecosystem agreement on what is this class of dangerous. Soundness and unsafe are strictly (albeit still incompletely) defined as the reachability of UB given downstream code fulfilling the letter of the safety contract and no more. (This letter may be unfortunately vague handwaving in some cases, such as the dynamic borrowing rules.) You can although it may be difficult, verify that any unsafe code is (or isn't) sound objectively by looking at it and any code within its safety barrier (and any, hopefully rare, ambient rules and exploits considered out of scope by upstream libraries).

There's no such objective definition possible for "dangerous" functionality. Ecosystem crates already sometimes feel the need to extend the set of "not strictly considered unsafe but could still cause UB" conditions sometimes. The correct indication is a longer, expressive name that captures the downside, such as .sort() vs .sort_unstable().

It's definitely cheeky, but if we define it as unsafe fn freeze(MaybeUninit<T>) -> T, it is unsafe, but only due to the fact that it's freeze_assume_valid or whatever name you want to give the compound operation.

I don't think this is a good idea, as it's unclear that it is not in and of itself unsafe to freeze owned values, and clarity of soundness requirements should be the primary objective of unsafe API design. But we could do it.


  1. Formally, ang-ndet picks the execution without UB if there exists one. Modifying it to cause UB if the programmer "expected" behavior would cause UB, though, shouldn't invalidate any reasoning about it, at least non-formal reasoning. ↩︎

  2. The compiler could in theory see that nothing can ever read the write and elide it, despite it being volatile, if the memory is known to ultimately be allocated on the stack or Global heap, this to be "normal" behaved memory. In practice we just don't do this. ↩︎

1 Like

25 posts were split to a new topic: How does inline assembly and the physical machine fit into the abstract machine model, or exist outside of it?


Note: To avoid breaking apart reply chains too much, the first posts in that split-off thread did end up containing some replies that do fully, or in part relate very directly to “freeze” and uninitialized memory, so some of its initial post and replies can be of interest to readers of this topic as well.



I am just throwing an idea out, but I think it non-rigorously justifies marking freeze unsafe: Give every initialized byte and primitive (scalar) value a new form of provenance "secrecy-provenance" (distinct from [and a component of] pointer-provenance) that is either:

demonic // = freeze(uninit))
general // existing values

Most operations would propagate this new secrecy-provenance (arithmetic, bit-ops, selection, casts, etc.). Some would conditionally propagate it (vector masking, etc.). But some would cause UB when given values with demonic "secrecy-provenance" (I/O primitives, insecure_free operations, conditionals in near-crypto, etc.).

If we had a time machine, only the UB causing primitives would need to be unsafe. However, as some of these UB causing operations (I/O) are declared safe, as they are safe for all existing values, these new values must be introduced as unsafe-but-valid inhabitants, thus their creation operation (freeze) must be unsafe.

Of course, adding a new provenance type would require reevaluating all optimizations and make reasoning about generalized code harder, but I think it would be workable in theory.

Some of this was already done in choosing between PVI and PNVI provenance models. Any provenance breaks seemingly trivial properties, such as that if byte_eq(a, b), use of a or b doesn't matter. Of course, if lowering to LLVM just discards this secrecy provenance, then it's fine; all defined programs behave as defined. But if there's no transformation enabled by the secrecy UB, why is it UB in the first place? At that point, we're just abusing UB freedom to track a distinct (but useful) property. Which is the exact dilution of unsafe which we'd like to avoid.

The input is a MaybeUninit<u64>, so if it has any initialized bits, those should be correctly set in the chosen input register. The output is an u64 with an unspecified value. Per the black-box -rule, I expect the compiler to not assume anything more than that.

Of course, I mostly care about the executions where the inline assembly just returns the value in its input register as is, and given the no-op assembly, those are constrained to values consistent with the initialized parts of the input.

If the input to the asm-block has any uninitialized bits, the compiler is free to use demonic nondeterminism to set them. The problem for this demonic compiler is that it still cannot assume anything about the output, other than, as usual, that the execution doesn't have UB.

Of course the programmer can't just assume angelic choice: If there is some possible output that would cause UB, then there's nothing to prevent the compiler from passing such a value through the assembly.

A proper intrinsic freeze would be a better implementation, sure. However, as far as the programmer is concerned, this is almost the same thing. It just optimizes less in cases where the compiler would know that the input actually is initialized to something more specific.

1 Like

Adding provenance to integers breaks even basic optimizations such as equivalence of x and x * 0. So this is a completely impractical proposal, I am afraid.

2 Likes

We do, actually. Raw pointers not being Send or Sync, and UnsafeCell not being Sync, are exactly that. (Though I do think those were mistakes)

2 Likes

I would argue that asm!-based freeze can't have angelic nondeterminism because it can produce outputs that later cause UB, e.g. if the output is MaybeUninit<&u8>, it could produce null, which would be UB to assume_init(). angelic nondeterminism specifically chooses whichever values won't produce UB, but here the asm! and assume_init can easily produce UB.

1 Like

In light of @RalfJung's comment here:

The discussion quickly moved to freeze as that is usually the primitive people ask for here. arbitrary() is compatible with what I laid out for inline asm blocks so I am confident it is compatible with the rest of the compiler.

Maybe we should do an informal poll as to whether people would prefer an API like:

fn arbitrary<T: FromBytes>() -> T;

or

fn freeze<T: FromBytes>(data: MaybeUninit<T>) -> T;

The difference being that freeze() must preserve data that was already initialized.

AtomicCell needs freeze so it can get a value's bytes that are deterministic and can be compared without ending up in an infinite loop of compare_exchange

1 Like

I'd personally be quite happy with just arbitrary as long as vec.resize_with(1234, arbitrary) worked reasonably. I mostly want to be able to turn uninitialized byte slices into inputs that are valid to pass to Read::read. The docs on that method describe that the contents of the slice shouldn't matter but it must be initialized:

Implementations of this method can make no assumptions about the contents of buf when this function is called. It is recommended that implementations only write data to buf instead of reading its contents.

Correspondingly, however, callers of this method in unsafe code must not assume any guarantees about how the implementation uses buf. The trait is safe to implement, so it is possible that the code that’s supposed to write to the buffer might also read from it. It is your responsibility to make sure that buf is initialized before calling read. Calling read with an uninitialized buf (of the kind one obtains via MaybeUninit<T>) is not safe, and can lead to undefined behavior.

(Reading now, that last sentence seems wrong... I thought turning a MaybeUninit into a &mut [u8] without initializing it first would be insta-UB?)

1 Like

It is documented as UB here, but that is us being careful since we don't know yet what the final rules will be. Miri accepts such code, though it rejects &mut ! as insta-UB. See here and the issues referenced from there for the ongoing discussion.

1 Like

So. My thoughts on safe abstactions for rust memory.
Introduce a Uninit<T>. For example Uninit<Int>

An Uninit object is special, in that it does not implement the destruct trait.

This means that the function.
fn null<T>(x:T){} is not valid in full generality.

The function is only valid if T:Destruct.

In particular,
fn null(x:Uninit<i64>){} is not a valid function.

If you have Uninit data, there is only 1 thing you can do with that data.
Call the set function.

fn set<T>(u:Uninit<T>,t:T)->&mut T

This function fills in the uninitialized memory, and then passes out a mutable reference to the data in case anyone wants it.

Principles

  1. When uninitialized data is created, a lifetime is created.
  2. If, when that lifetime expires, any Uninit variables are still in scope, this is a compile error.
  3. The only way for an Uninit variable to be destroyed is by calling set().
  4. You can only read the data after the lifetime expires, or via the &mut T created by set()

You should be able to break up an Uninit struct or vec into it's components. So long as you take All the components out when doing so.

Isn't this exactly https://doc.rust-lang.org/std/mem/union.MaybeUninit.html#method.write?

No. For a start, method.write doesn't consume the MaybeUninit object. Secondly, A MaybeUninit object can be dropped, or left unused. And, because of the previous two differences, using method.write means using unsafe { x.assume_init() }.

The whole point of my idea is to use lifetimes to keep track of Uninit data, so the compiler can know when it's been filled in.

This means that it is totally safe to make an Uninit array of references. No assume_init() needed. The compiler can track the scopes.

Here is some example code.

let data:[i64;2]=create_initialize(|x:Uninit<[i64;2]>| {
 let y:[Uninit<i64>;2]=split!(x);
 let [z0,z1]=y;
 let a:&mut i64=set(z0,5);
 set(z1,7);
 *a+=10;
})
assert_eq!(data,[15,7]);