WHAT-IF: Reading uninit RAM was not UB?

atagunov · October 19, 2020, 12:44pm

Rust has MaybeUninit<> but reading truly uninit memory is UB.
LLVM freeze can eliminate it at a tiny performance cost.
Let's indulge into "can we" first and "should we" later

Yesterday I suggested

type mu8 = MaybeUninit<u8>;

and a coercion:

impl mu8 {
  // applied automatically each time
  // - mu8 is assigned to u8
  // - mu8 is used in an expression like a + 1
  fn freeze(self) -> u8 { /* LLVM freeze */ }
}

This makes reading mu8 safe and removes UB.
Less UB and less unsafe is a win isn't it?

The idea is to stop using unsafe facility offered by MaybeUninit for reading mu8 entirely both in safe and unsafe code.

Further steps to imagine

Apply said autocoercion to all (primitive?) types for which all bit patterns are valid.

Use mut mu8 for local variables as a sort of extreme optimization: say a loop is executes at least once, variable is never read inside the loop but is assigned; we don't really need to assign initial value.

Allow new safe syntax:

struct A {b : B}
let a : MaybeUninit<A> = ...
a.b // would be typed as MaybeUninit<B>

Most ambitious: grow typestate muscle/syntax to track which parts of a struct are uninit to allow gradual initialization.

Typestate-light: if the compiler can prove that at a certain point in control flow a local MaybeUninit variable has been assigned to it treats it as having the "underlying" type from that point on.

Use &MaybeUninit for out-only parameters (but somehow treat them as fully init afterwards using only safe code???...)

Introduce reverse autocoercion from u8 to mu8 and in other similar cases.

P.S. I'm aware of security implications but would like to discuss feasibility first. I'm also aware older LLVM versions supported by Rust don't provide freeze.

vorner · October 19, 2020, 1:10pm

On a high level, being able to read into uninitialized buffers safely seems nice and worth exploring.

I'm a bit worried about one thing. I wouldn't enjoy if MaybeUninit<u8> became somewhat language exceptional. If this goes ahead, I'd prefer if this got solved in a similar way to null-pointer optimization (eg. Option is in no way special) and anything where all bit patterns are valid could benefit.

And I'd expect there would be a lot of devils hiding in the details .

chrisd · October 19, 2020, 1:17pm

Something I'm unclear on is what it means to read uninitalized memory? Isn't the range of behaviours very platform and situation specific?

atagunov · October 19, 2020, 1:20pm

This topic is about LLVM freeze. Each time you freeze-read uninit u8 you get a potentially different u8

scottmcm · October 19, 2020, 1:25pm

You keep jumping all the way to having this be an invisible coercion. There's no way that's happening as one step.

I strongly suggest simplifying this down to just having methods on MaybeUninit. That's something that can be experimented with.

I had a conversation about this on zulip a few weeks ago. It's probably something like

impl<T> MaybeUninit<T> {
    // `freeze` doesn't enforce validity invariants, so it can't return a `T` and can't be safe.
    pub fn freeze_unchecked(&self) -> MaybeUninit<T>;

    // The safe transmute work will give us a way to bound this to types with no invariants
    // (This must be aware of safety invariants too, not just validity ones.)
    pub fn freeze(&self) -> T where T: TransmuteFrom<[u8; size_of::<T>()]>;
}

Any conversations about the ergonomics of using this stuff would need to wait on experience with actually using it.

atagunov · October 19, 2020, 1:37pm

My hope is sketching far-on-the-horizon goal-posts enabled by MaybeUninit::freeze would

provide extra motivation to add it - in case this hasn't been decided yet
better inform design choices around it

...while also being fun

RalfJung · October 19, 2020, 9:57pm

Since we are becoming technical here I have to correct this statement: as you can see in the reference, there is no general "reading uninit memory is UB" clause for Rust. It depends on the type of the read. If you read uninit memory at type MaybeUninit<T>, it is actually okay and even safe to read uninit memory. If you read at type bool, I hope we all agree that that will never be okay.

For u8 and other integer types (I don't think the single-byte type should be special here in any way), this is subject of discussion, but my personal opinion is that it should be UB.

jplatte · October 21, 2020, 3:28pm

I had the impression that all these freeze ideas don't work in practice because of things like MADV_FREE:

Or does that only apply to a subset of things one might want to do with LLVM's freeze operation?

Also, kinda OT but if it applies to everything one would do with that operation, has this been discussed with the LLVM developers?

atagunov · October 21, 2020, 5:17pm

The semantics freeze-reading truly uninitiazed RAM at an integer type would have is "get some unpredictable integer value safe for any kind of use".

Safe valid code enabled by freeze

E.g. once MaybeUninit::freeze is added you will be able to write

let mut mu : MaybeUninit<u8> = MaybeUninit::uninit();
// or equivalently let mut mu = MaybeUninit::<u8>::uninit();
...
let a : u8 = mu.freeze();
// I would rather want to express exactly the same thing like this:
// let a : u8 = mu;
// this would be the "invisible coercion" which I suggested to discuss
// and which Scott said can perhaps be added later but not immediately

...// use a in any way
let b : u8 = mu.freeze();
...// use both a and b in any way

Here both a and b will be regular integer variables. You just won't be able to predict what values they get. a would be safe to use and keep its value throughout execution of the program. It will not unpredictably change. Doing if (a < 15) .. will not be UB. Same for b. But there will be no guarantee that a equals to b. MADV_FREE will be one possible reason for a to end up being different from b.

Extremely dangerous code without freeze

Let me contrast this with the following code that can be written today:

let mut mu : MaybeUninit<u8> = MaybeUninit::uninit();
...
let a : u8 = unsafe { mu.value };
...// use a

Here there is no freeze and so LLVM can figure out that a is in fact poison. Therefore a can change unpredictably throughout execution of the program and more worryingly doing something like if (a < 15) ... will be UB.

Actually yes, MADV_FREE is yet another reason that can cause a to change unpredictably in this dangerous bit of code.

What really won't work

I think what really will not work is freezing large chunks of RAM. I'm not actually sure if it will be possible to freeze a value of type MaybeUninit<[u8; N]> or [MaybeUninit<u8>; N]. Well maybe, but at the cost of copying the whole thing which defeats the purpose. Or maybe not at all.

However it should be entirely possible to freeze individual integer values the moment they are read from this buffer. Freeze is just a sort of barrier that precludes LLVM from going really wild even if it can detect that the buffer just read is truly uninit.

dhm · October 22, 2020, 10:24am

Less unsafe definitely is; but less UB means less compiler optimizations, so slower code; not everybody may agree that that is a win

The main use-case / offender that leads to the current status quo being far from ideal is the Read trait. That being said:

there are library crates that fix that, in almost-non-unsafe-and-still-zero-cost-fashion, such as:

there is an RFC that at a tiny runtime cost (mainly a field to track how much memory has been initialized), offers a completely non-unsafe API:

https://github.com/sfackler/rfcs/blob/read-buf/text/0000-read-buf.md

atagunov · October 22, 2020, 12:30pm

Not suggesting to replace u8 with mu8 in all codebases
That would slow applications down.

However would

replacing u8 with mu8 in all IO buffers
replacing all unsafe reads from them with freeze-reads

cause a measurable performance degradation? My gut feeling is IO buffers are usually read sequentially and only once.

Okay I've never written a compression library nor a crypto one.. Do you think it would get noticeably slower there?..

dhm · October 22, 2020, 12:49pm

I honestly don't know, I was just mentioning that the general "less UB ⇒ win" is not that clear

Some kind of FrozenCell<T : ?Sized> that would wrap a MaybeUninit<T> and use @scottmcm .freeze() method under the hood might be a way to emulate your suggestion while making this "less-UB" be opt-in

At which point I'd be looking forward to FrozenCell causing an ICE at some point

Aloso · October 25, 2020, 12:26pm

This was my idea as well. The freeze method could take self by value, so it can't be called multiple times. This would prevents problems like this:

RalfJung · October 27, 2020, 3:39pm

No it would not, MaybeUninit<u8> is Copy.

system · January 25, 2021, 3:39pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
`freeze(MaybeUninit<T>) -> MaybeUninit<T>` for masked reads language design	32	2366	November 16, 2022
Safely reading uninitialized memory	25	3026	March 25, 2019
Reading into uninitialized buffers, yet again	11	3078	January 26, 2021
`<[T]>::as_uninit_mut` libs	6	1023	February 28, 2022
"What The Hardware Does" is not What Your Program Does: Uninitialized Memory	40	6274	October 17, 2019

WHAT-IF: Reading uninit RAM was not UB?

Further steps to imagine

Safe valid code enabled by freeze

Extremely dangerous code without freeze

What really won't work

Related topics