`MaybeInvalid<T>` - separate concepts of uninitialized memory and invalid values

Currently, uninitialized memory is dealt with using a MaybeUninit<T> type. But:

  1. Unitialized memory is difficult
  2. It can also store inproperly initialized values

So it might be better to dedicate MaybeUninit to the first point only and separate the second into a distinct type.

Idea

Create a new type that can store T without proper initialization, but unlike MaybeUninit it must contain a fixed value:

Valid value Invalid value Non-fixed value
MaybeUninit<T> :white_check_mark: :white_check_mark: :white_check_mark:
MaybeInvalid<T> :white_check_mark: :white_check_mark: :cross_mark:
T :white_check_mark: :cross_mark: :cross_mark:
#[repr(transparent)]
pub union MaybeInvalid<T> {
    invalid: (),
    value: ManuallyDrop<T>,
}

impl<T> MaybeInvalid<T> {
    ...
    // similar to MaybeUninit<T>
}

MaybeUninit would get new functions to interact with it:

impl<T> MaybeUninit<T> {
    ...
    pub fn written(MaybeInvalid<T>) -> Self;
    pub unsafe fn assume_written(self) -> MaybeInvalid<T>;
    pub unsafe fn assume_written_ref(&self) -> &MaybeInvalid<T>;
    pub unsafe fn assume_written_mut(&mut self) -> &mut MaybeInvalid<T>;
    pub fn write_raw(&mut self, MaybeInvalid<T>) -> &mut MaybeInvalid<T>;
}

Case 1: enum optimization

We could define a trait that gives a compiler "an example" of invalid value of this type:

/// ## Safety
/// INVALID's value must be actually invalid:
/// impossible to obtain in safe Rust
pub unsafe trait Invalid: Sized {
    const INVALID: MaybeInvalid<Self>;
}

// example
unsafe impl<'a, T> Invalid for &'a T {
    const INVALID: MaybeInvalid<Self> = MaybeInvalid::zeroed();
}

This could be used to optimize enums layout for arbitrary user types:
for<T: Invalid> assert_eq!(size_of::<T>(), size_of::<Option<T>>)

Case 2: freezeing uninitialized values

LLVM's freeze instruction, mentioned a couple of times (17188, 22254, 13231 from a quick search) here, would become trivial and safe:

impl<T> MaybeUninit<T> {
    ...
    pub fn freeze(self) -> MaybeInvalid<T>;
}

And if we had some auto trait for always-valid types, then one could create an "uninitialized" buffer without using unsafe:

let mut buf = MaybeUninit::<[u8; 256]>::uninit().freeze().valid();

That trait would be mutually exclusive with Invalid from case 1 though, which is probably fine - there are Drop and Copy but these are fundamental.

1 Like

Sounds like you might be interested in

I don’t think I’m fully understanding how this is supposed to make freeze any easier, but I’m also not quite getting the proposed type’s properties in the first place.

Note that even for valid types, uninitialized bytes are often an option. For example for a type like MaybeUninit<u8> itself; or for the padding bytes in a struct (if it has any padding bytes).

In my mental model, a value in Rust (only speaking of the shallow data, not data behind pointers) consists - basically - just of a sequence of bytes[1], but every byte of memory has not only the 256 different values (from 0 to 255) for concrete numerical values, but also an additional 256’th value that we can call “uninitialized”. (In LLVM this corresponds to the value “undef”.) In Rust types, a byte is thus represented by MaybeUninit<u8>.

A struct or enum value, consisting of a (usually fixed-length) sequence of bytes usually then comes with additional restrictions on the possible values of its bytes. These restrictions could be independent for each byte, but they can also interdepend.

For example u8 is one byte, and restricted to not be undef.

For people worrying about undef not existing at run-time – that’s not really an issue in the abstract machine model. There it really is an undef separate from all other possible values for a byte; just the rules that disallow you from “reading” undef (in a sense of trying to read it as any other concrete byte value 0 through 255) mean that in lowering it into concrete machine code, it never actually needs to be written or read at run-time.

Now, MaybeUninit<T> lifts all the byte-level restrictions that the type T introduces on its (shallow) data. In this mental mode, MaybeUninit does nothing else but allow invalid values. Whether or not uninit byte values are allowed depends on the type – i.e. for certain T in certain bytes (and under perhaps certain conditions of what the other bytes may be) undef is allowed or not allowed in plain T; of course it’s always allowed anywhere in MaybeUninit<T> because that allow all invalid values, too.

But now you propose MaybeInvalid<T> and write down that “non-fixed value” should still be disallowed (by “non-fixed” I suppose you’re referring to undef byte values?) And this doesn’t fit my mental model well, because <T> itself may have never rules some undef bytes. So I’d assume you might mean that it allows additional invalid values that aren’t undefined, but never adds additional undefined values?

But what are the exact rules?

Some example questions that precise rules will have to be able to give an answer for:

Is something like [2, undef] supposed to be a value that MaybeInvalid<(bool, MaybeUninit)>[2] is allowed to take or not? The byte 2 is an invalid but non-undef value for bool, but the second byte is undefined, but the inner MaybeUninit<u8> means that it’s also valid…

How about an Option<u8>, assuming a layout of [tag, data] and tag 0 for None, 1 for Some: The Option<u8> then allows values [0, undef] and [1, 0], …, [1, 255]. Does MaybeInvalid<Option<u8>> allow [1, undef] or not? Does it add a value like [0, 0] which can then safely be transmuted back into [u8; 2] (i.e. the second byte is allowed to be read back out, and has not become undef) or doesn‘t it do that?

How about Result<bool, bool> assuming a repr-C version of Result with a layout of [tag, data] and tag 0 for Ok, 1 for Err. Would MaybeInvalid<Result<bool, bool>> add a value of [undef, 2]? (Note that [undef, 2] could be a valid choice for niche in principle, because it can be - by its second byte - still be discerned from all valid values of the repr-C-like Result<bool, bool> which are [0, 0], [0, 1], [1, 0], [1, 1]).


  1. ignoring additional information like provenance ↩︎

  2. assuming straightforward, repr-C-like layout of the tuple! ↩︎

2 Likes

Technically, poison, IIRC. (TBH, even though I read LLVM’s documentation about the difference between undef, poison, and freeze(poison) not all that long ago, I’ve already forgotten the details, other than the fact that they’re trying to move towards using poison.)

I thought “poison” too, but then read the docs, and thought “undef” is more accurate.

And the generated LLVM IR seems to agree with this assessment, too :slight_smile:

1 Like

This proposal seems to be based on a misunderstanding about how uninitialized memory factors into the Rust validity rules. There is no special rule in Rust that says uninitialized memory is inherently forbidden, or anything like that. It's just a state memory can be in. Some of our types require memory to be in certain states. For instance, i32 requires memory to be in an arbitrary initialized state, while bool requires memory to be either 0x00 or 0x01. Those are the same kind of requirement. We don't say "first of all, the bool must be initialized, and then also it must be 0x00 or 0x01". Instead we just have a set of valid representations, done. This distinction you try to draw between "invalid value" and "non-fixed value" does not exist. So, your proposal suggests to add a fundamentally new concept to the language, a new distinction that we currently do not have or need. Why do you think we should add more concepts? Things are already complicated enough, we don't need even more concepts.

Also, you need to be careful with terms like "fixed value". If the memory contents are 0xFF, and the type is bool, then this memory has no value at that type (it is "invalid" at the type), and therefore it also makes no sense to talk about the value being "fixed". I assume what you mean is that the underlying representation is "fixed" in the sense of being fully initialized.

However, that brings us to the next question. What does that type even do when there is padding? (u8, u16) allows the padding byte to be uninitialized. Does MaybeInvalid<(u8, u16)> somehow require the padding byte to be initialized? That would make no sense at all. But if you allow it to be uninitialized then the description of MaybeInvalid makes little sense.

To learn more about values and representation, here's a talk I gave last year at RustWeek:

4 Likes