Mem::uninitialized, `!` and trap representations

I think the real solution is to teach Rust more about uninitialized values, like with my stateful MIR. This example is one of the reasons why it is necessary to be able to reason about uninitializedness per-field (another is partial drop).

2 Likes

This issue touches upon it

We just have to define semantics a bit more for unions, and they could be good to use for this? Maybe just for #[repr(C)] unions? As pointed out in the issue, a union with () still leaves rustc with the option to use layout optimizations.

I don’t get why people bring up unions with these sorts of things. Unions are dynamic and a barrier to compiler and human reasoning alike. Here the compiler needs to know that while initializedness varies, it varies statically and is thus known and constant at each point of the CFG.

1 Like

Ok. I have a different usecase then, where I need a region where the uninitialized part is varying dynamically.

2 Likes

That would solve the “alternate RVO ABI” problem. However, it would not solve the “gather^Wscatter” problem, or the “SmallVec” problem. Using a union with () nicely solves these problems, but is a breaking change and requires an unstable feature.

See, for example, here. This is essentially a small vector, as others have mentioned. If returning uninitialized values were unsafe behavior, it would totally ruin modularity.

In general, if a field is not supposed to be used at all in unsafe code, I want to make that field uninitialized, not just for performance but for easier debugging. Valgrind will complain at uses of uninitialized values, but not at uses of “sane defaults” like 0. Additionally, it just makes code more clear: if I see that something is uninitialized, it is obvious that I shouldn’t use it, but if something is 0, I don’t know what will happen. Declaring it illegal to return uninitialized values makes it illegal to have a new method, which makes modularity worse and leads to more bug-ridden code: I have to separately verify every place where an object is constructed to make sure that the right fields are filled out, when a new method could do that automatically. Passing uninitialized values to functions should also be legal, especially behind pointers.

Essentially, the rules you suggest make it nearly impossible to use partially uninitialized structures in a modular way, and we ought to support those structures.


There are a few pieces of code that I definitely expect to be safe that are not under your rule:

unsafe fn my_uninit<T>() -> T {
    mem::uninitialized()
}
// This is like ptr::read, but takes a reference (so is somewhat safer),
// should compile to the same code locally (the compiler can optimize
// away writes of uninitialized values), and tells anyone reading this
// code that I won't use whatever is behind the reference anymore.
unsafe fn read_final<T>(ptr: &mut T) -> T {
    mem::replace(ptr, mem::uninitialized())
}
// This is literally the implementation of `ptr::read` in the standard
// library today. Note that it involves passing a reference to an
// uninitialized value to a function.
pub unsafe fn read<T>(src: *const T) -> T {
    let mut tmp: T = mem::uninitialized();
    copy_nonoverlapping(src, &mut tmp, 1);
    tmp
}

I agree that this should be valid, and the compiler should be permitted to lift data.data[0] out of the look as an optimization. This is actually a place where making the rules for when you can use uninitizalized data stricter hurts optimization - unless it is legal to read uninitialized data, the compiler can’t do LICM properly here.


I think the rule for use of invalid data (which includes, but is not limited to uninitialized data) can be very simple: everything but inspecting the data is valid. Primitive operations on invalid values return arbitrary, possibly invalid results, and branching based on invalid data (as in, e.g., match mem::uninitialized() or if mem:uninitialized()) is undefined behavior. This

  • matches intuition, since it treats uninitialized and invalid data just like anything else.
  • has a nice parametricity property: if a function (e.g. fn foo<T>(bar: T)) is fully generic on a type, then it is valid to pass uninitialized/invalid data to that function. This can be used to be fully confident that functions like ptr::read, ptr::write, mem::replace, mem::swap, Vec::push, etc. are all safe to execute on uninitialized/invalid data.
5 Likes

Yes SmallVec is dynamic. Which is the “gather” problem?


Also while its way way complex, stateful MIR + type level nats + existentials can deal with SmallVec.

That’s actually the “scatter” problem. My mistake.

fn pi(index: u32) -> u32 { /* permutation */ }
fn gather() -> [u32; 1024] {
    let mut result : [u32; 1024] = mem::uninitialized();
    for i in 0..1024 {
        result[pi(i)] = i;
    }
    result
}
1 Like

Yeah that’s very hard to make safe. The best one could do with stateful MIR is

fn pi(index: u32) -> u32 { /* permutation */ }
fn gather() -> [u32; 1024] {
    let mut result : [Uninit<u32::Size>; 1024] = mem::uninitialized();
    'a {
        let refs = result.map_borrow_out(|r: &out T| r); 
        // result: [Borrow<'a, Uninit<_>, u32>; 2014]
        for i in 0..1024 {
             let r = unsafe { copy_instead_of_move(refs[pi(i)]) };
            *r = i;
        }
        unsafe { mem::forget(refs) };
    }
    // result: [u32; 1024]
    result
}

which isn’t much

I think we can solve the SmallVec problem by introducing a MaybeUninitialized library type.

union MaybeUninitialized<T> {
    value: T,
    // to solve https://github.com/rust-lang/rust/issues/36394,
    // can be implemented today with a #[lang="maybe_uninitialized"]
    buf: [u8; mem::size_of::<T>]
}

impl<T> MaybeInitialized<T> {
    fn new_uninitialized() -> Self {
        unsafe { mem::uninitialized() }
    }

    fn new(value: T) -> Self {
            MaybeInitialized { value: value }
        }

        fn get_mut(&mut self) -> *mut T {
            &mut self.value
        }

        fn get(&self) -> *const T {
            &self.value
        }
        // more APIs might be implemented
}

I’m still not sure why enum layout optimization is a bad thing. If we make ELO better, and start to allow it inside of padding, then it’s a nice optimization that

Option<StructWithPadding>
== StructWithPadding
== Option<MaybeInitialized<StructWithPadding>> 
== MaybeInitialized<StructWithPadding>

which could be accomplished with

union MaybeInitialized<T> {
  valid: T,
  invalid: (),
}

and which could be implemented today.

1 Like

However, this solution by itself does not restrict references to uninhabited types in any way - you can play with your &! as much as you want, as long as you don’t actually dereference it.

I don’t get the point of distinguishing between ! and &! in this way - they’re both equally uninhabited, any representation of either is invalid. If you can have a &! in scope so long as you don’t inspect it then you should be able to have a ! in scope so long as you don’t inspect it.

I think the rule for use of invalid data (which includes, but is not limited to uninitialized data) can be very simple: everything but inspecting the data is valid.

Though if the compiler can no longer assume that data is valid - even when it’s not being inspected - then it can no longer assume that functions which return ! actually diverge. This would fix bugs around using mem::uninitialized::<!> but it would break everything else which uses !.

One solution for this would be to distinguish between safe and unsafe code and say a safe block or function always returns a valid value but an unsafe block or function can return any chunk of bits of the right size. This means that an unsafe function (like mem::uninitialized) can return a ! without diverging but it’s UB to ever read the !.

A much better solution would be to deprecate mem::uninitialized, introduce &uninit pointers and MaybeInitialized but this would be a lot more backwards-incompatible.

Edit: The conversation around the 2017 roadmap had strong vibe of “We won’t bother to implement/fix language features or specify semantics until we need to” - but that’s how you get into situations like this. I really think that any deep changes to the language which aren’t purely extensions, things like !, linear types or new pointer types, need to be figured out ASAP before we build any more of an ecosystem on top of this thing.

2 Likes

That’s what MaybeInitialized is for. If you are working with fire, at least put a warning sign.

Both of these examples WILL be miscompiled if, say, T = &u32. dereferencable + undef = trouble.

This is, of course, totally fine, as long as mem::uninitialized is legal. The trap representation is never loaded in any way.

Of course the compiler can and does do LICM - it’s not like your computer literally explodes when you load a trap representation. Even at the source-code level, you can call ptr::read(data as *const MaybeInitialized<T>).

Since basically forever, the rule in Rust is that you can’t have invalid values (a boolean with the value 42), but can have references to invalid values (a &bool that points to the number 42) as long as you don’t dereference them. This seems to give the most clear and useful semantics (specifying invalid values seems to require extra effort, and forbidding invalid pointees seems to require extra effort).

I don’t see any good reason for special treatment of uninhabited types in this context. They are 0-sized types with 1 trap representation and 0 non-trap representations.

The problem is that an &StructWithPadding can have arbitrary bit-patterns in its padding (that’s it, unless we place some type-based restriction on the padding’s content, like #[single_repr] does), and copying it to an &mut StructWithPadding must be allowed to use memcpy, which would copy these arbitrary bit patterns into your enum’s discriminant.

And in any case, the reason I don’t like the “closed” definition for MaybeUninitialized is because it has the effect of “basically works, but corrupts your data in some edge cases for no particularly good reason”.

Do you have a source for that claim? The reference says that dangling references are not allowed. It does not go into detail as to what constitutes dangling, but I’m sure it’s stricter than “contains the address of an allocated section of memory.”

&!, as far as my limited analysis goes, is either always dangling or never is. “Always dangling” makes more sense, because we already have a perfectly good never-dangling reference called &().

Don’t give much weight to the reference - it was never properly fact-checked, and that section specifically is pretty random. In any case, dangling references are references that don’t refer to any valid allocation. The data behind these references has nothing to do with it.

Because ! is zero-sized, &! is indeed never dangling. That’s a simple consequence.

In any case, dangling references are references that don’t refer to any valid allocation

Why is it useful to bring allocation into the definition? I would have thought a dangling reference is any reference that no longer points to valid data. If a reference got freed then something else got allocated at the same address does make the reference no longer dangling?

No safe code can produce a &bool which points to 42. So surely the compiler can always assume that a &bool doesn’t point to 42 (not that it can tell without dereferencing it). In the case of &! though, it can tell without dereferencing that the reference is dangling.

Very interesting thread so far. I think this is an interesting test case for the question of “to what extent can we break existing code if there is a good reason”.

It seems inarguable that, if we have !, the most overall consistent semantics is to say that functions cannot return !, and to migrate people so that they use unions in place of mem::uninitialized<!>. I would imagine we might want a targeted lint saying that calling uninitialized for some type T that may not be instantiable (it seems that these same concerns apply to empty enums, I would think) is deprecated, and to prefer unions. (The idea here was to try to avoid tagging too many projects that are using uninitialized as a poor man’s out pointer). The fact that uninitialized breaks in a loud way (panic) is a very good thing here, I would think. In any case, it seems like we would also want some kind of warning periods – in general I don’t feel like we have the “warning-period-then-deprecate” rhythm working especially smoothly right now.

OTOH, I think that @arielb1’s original thoughts have a certain appeal as well. It seems to fit into the more “access-based” way of thinking. It certainly validates a lot of code that, at first glance, seems quite reasonable to me. My biggest concern is that while it makes some set of code work, it also makes other very reasonable patterns, like those cited by @gereeter illegal. I’m not sure if this is a win.

That definitely can’t be right, because there’s lots of code that does the 0x01 as *const () as &() transformation. Vec, for example, does it. There’s no allocation at address 1.

Doesn’t every pointer point to a valid zero sized allocation?

I thought ! is -INF sized. Which makes any talks of the dangliness of &! irrelevant, because you cannot have a value of type &! ever.

@canndrew probably has more info on &!