Rust has MaybeUninit<> but reading truly uninit memory is UB.
LLVM freeze can eliminate it at a tiny performance cost.
Let's indulge into "can we" first and "should we" later
impl mu8 {
// applied automatically each time
// - mu8 is assigned to u8
// - mu8 is used in an expression like a + 1
fn freeze(self) -> u8 { /* LLVM freeze */ }
}
This makes reading mu8 safe and removes UB.
Less UB and less unsafe is a win isn't it?
The idea is to stop using unsafe facility offered by MaybeUninit for reading mu8 entirely both in safe and unsafe code.
Further steps to imagine
Apply said autocoercion to all (primitive?) types for which all bit patterns are valid.
Use mut mu8 for local variables as a sort of extreme optimization: say a loop is executes at least once, variable is never read inside the loop but is assigned; we don't really need to assign initial value.
Allow new safe syntax:
struct A {b : B}
let a : MaybeUninit<A> = ...
a.b // would be typed as MaybeUninit<B>
Most ambitious: grow typestate muscle/syntax to track which parts of a struct are uninit to allow gradual initialization.
Typestate-light: if the compiler can prove that at a certain point in control flow a local MaybeUninit variable has been assigned to it treats it as having the "underlying" type from that point on.
Use &MaybeUninit for out-only parameters (but somehow treat them as fully init afterwards using only safe code???...)
Introduce reverse autocoercion from u8 to mu8 and in other similar cases.
P.S. I'm aware of security implications but would like to discuss feasibility first. I'm also aware older LLVM versions supported by Rust don't provide freeze.
On a high level, being able to read into uninitialized buffers safely seems nice and worth exploring.
I'm a bit worried about one thing. I wouldn't enjoy if MaybeUninit<u8> became somewhat language exceptional. If this goes ahead, I'd prefer if this got solved in a similar way to null-pointer optimization (eg. Option is in no way special) and anything where all bit patterns are valid could benefit.
And I'd expect there would be a lot of devils hiding in the details .
impl<T> MaybeUninit<T> {
// `freeze` doesn't enforce validity invariants, so it can't return a `T` and can't be safe.
pub fn freeze_unchecked(&self) -> MaybeUninit<T>;
// The safe transmute work will give us a way to bound this to types with no invariants
// (This must be aware of safety invariants too, not just validity ones.)
pub fn freeze(&self) -> T where T: TransmuteFrom<[u8; size_of::<T>()]>;
}
Any conversations about the ergonomics of using this stuff would need to wait on experience with actually using it.
Since we are becoming technical here I have to correct this statement: as you can see in the reference, there is no general "reading uninit memory is UB" clause for Rust. It depends on the type of the read. If you read uninit memory at type MaybeUninit<T>, it is actually okay and even safe to read uninit memory. If you read at type bool, I hope we all agree that that will never be okay.
For u8 and other integer types (I don't think the single-byte type should be special here in any way), this is subject of discussion, but my personal opinion is that it should be UB.
The semantics freeze-reading truly uninitiazed RAM at an integer type would have is "get some unpredictable integer value safe for any kind of use".
Safe valid code enabled by freeze
E.g. once MaybeUninit::freeze is added you will be able to write
let mut mu : MaybeUninit<u8> = MaybeUninit::uninit();
// or equivalently let mut mu = MaybeUninit::<u8>::uninit();
...
let a : u8 = mu.freeze();
// I would rather want to express exactly the same thing like this:
// let a : u8 = mu;
// this would be the "invisible coercion" which I suggested to discuss
// and which Scott said can perhaps be added later but not immediately
...// use a in any way
let b : u8 = mu.freeze();
...// use both a and b in any way
Here both a and b will be regular integer variables. You just won't be able to predict what values they get. a would be safe to use and keep its value throughout execution of the program. It will not unpredictably change. Doing if (a < 15) .. will not be UB. Same for b. But there will be no guarantee that a equals to b. MADV_FREE will be one possible reason for a to end up being different from b.
Extremely dangerous code without freeze
Let me contrast this with the following code that can be written today:
let mut mu : MaybeUninit<u8> = MaybeUninit::uninit();
...
let a : u8 = unsafe { mu.value };
...// use a
Here there is no freeze and so LLVM can figure out that a is in fact poison. Therefore a can change unpredictably throughout execution of the program and more worryingly doing something like if (a < 15) ... will be UB.
Actually yes, MADV_FREE is yet another reason that can cause a to change unpredictably in this dangerous bit of code.
What really won't work
I think what really will not work is freezing large chunks of RAM. I'm not actually sure if it will be possible to freeze a value of type MaybeUninit<[u8; N]> or [MaybeUninit<u8>; N]. Well maybe, but at the cost of copying the whole thing which defeats the purpose. Or maybe not at all.
However it should be entirely possible to freeze individual integer values the moment they are read from this buffer. Freeze is just a sort of barrier that precludes LLVM from going really wild even if it can detect that the buffer just read is truly uninit.
I honestly don't know, I was just mentioning that the general "less UB ⇒ win" is not that clear
Some kind of FrozenCell<T : ?Sized> that would wrap a MaybeUninit<T> and use @scottmcm.freeze() method under the hood might be a way to emulate your suggestion while making this "less-UB" be opt-in
At which point I'd be looking forward to FrozenCell causing an ICE at some point