Official document that states "references must point to initialized data"?

I’m watching bytes::BufMut::bytes_mut(), which is an unsafe function that intensionally returns &mut [u8] pointing uninitialized buffer. I think it’s unsound, but the unsafe code guideline doesn’t state anything about this kind of problem.

Is making(without reading) such reference UB? If so, is there some official document for it?

What you should be looking for is an official statement that what you/they are doing is allowed. Everything else you basically have to assume is UB, because it means the semantics are not carved out yet and we need some freedom there to do any language design. :wink:

But I understand that's not a helpful answer, so let me instead link you to some in-progress discussion on the topic of references and integers and initialization:

  • This issue discusses whether uninitialized integers count as "valid" -- whether they are "initialized enough" to not cause insta-UB, basically. If this is answered "yes", then there is no trouble at all with uninitialized data at integer type (until you start to "use" it, like in array indexing or any of our unary/binary operators).
  • This issue discusses whether references must point to initialized data.

Also see the docs for MaybeUninit, which state

(Notice that the rules around uninitialized integers are not finalized yet, but until they are, it is advisable to avoid them.)

6 Likes

I think &out proposals which introduce write-only references (as opposed to read-write &mut and read-only &) also relevant here. But unfortunately it does not look they will be introduced anytime soon.

@RalfJung

I think the main question is not “validness” of uninitialized integers/floats, but should we consider temporary &mut T pointing to an uninitialized memory an insta-UB, or is it acceptable assuming that we will not read from it.

1 Like

@newpavlov I introduced that idea myself a while back, but there wasn’t enough motivation for them, especially with MaybeUninit on its way at the time so it got closed.

&mut MaybeUninit<T> can serve the same purposes as &out T

2 Likes

A way one could do something similar to &out today is to use library defined move references(like that of refmove) constructed using a function with this signature:

fn initialize<'a>(from:&'a mut MaybeUninit<T>,with:T)->MoveRef<'a,T>

1 Like

Most of the time people are asking this for references to integer types, so the discussions often get interleaved. This includes the OP. There are plenty of use-cases for references to uninitialized integers. Many of those also involve "immediate" uninitialized integers, and then creating a reference to them.

In contrast, I have yet to see a convincing case for an &bool pointing to 4. Or for an &! to exist at all.

&out and placement new have a long history, back before the 1.0 release in 2015. But AFAIK not much progress has been made the last years.

1 Like

In contrast, I have yet to see a convincing case for an &bool pointing to 4. Or for an &! to exist at all.

I can imagine using an &! within a newtype, assuming it cannot be dereferenced, as a method of determining whether instances both hold the same unique value.

    let bar : &! = ...;
    let foo : &! = ...;
    std::ptr::eq(foo, bar)

This allows you to easily reuse unique values, without resorting to incrementing an integer for a unique ID, along with some form of free ID pool...

A write-only &out reference would also need some mechanism to convert it to a read-write &mut reference once initialized.

Why? We don’t have a safe way for converting read-only &T to read-write &mut T, why write-only &out T should be different in this regard? I think &out is mostly about specifying API constraints, e.g. Write::write semantically does not need to read from the provided buffer, but type system has no tools for specifying such constraints.

With &out MaybeUninit could’ve had a safe get_out(&out self) -> &out T and it would’ve been possible to write into uninitialized variable in safe code.

I think I see what you mean. You’re suggesting that, rather than converting from &out T to &mut T, people would use the &out T to initialize something, then after they’ve fully initialized it, they’d go back to the original T if they need a &T or a &mut T?

That would still require an unsafe operation, I’d assume? Or could we potentially detect full initialization of an object and then allow using it safely, the way we currently can with let x; ...; x = func()?

Yes, to both. The &out T which I have in mind is nothing more than a write-only reference, it does not enforce in any way that you have to fully initialize object behind this reference. With MaybeUninit this would allow to reduce two calls to unsafe functions (get_mut and assume_init) to one (assume_init). But my main points is not it, but that &out would allow us to encode more constraints into our APIs. (And "completeness" feeling of having all three r, w and rw references is a nice bonus.)

This discussion around &out seems very similar to my RFC for &out T and &uninit T. What @newpavlov seems to be describing is similar to what I called &uninit. The main concern was that this introduced too many new pointer types, and that these pointer types were a bit complex (although that is specific to my proposal, a simpler proposal may get more traction).

After thinking about this topic for so long, my views have changed since I first made the RFC.

I don't think that &out is a good idea because it I think it is misleading. It makes &T look like a read only reference and &mut T look like a read-write reference. But this way of thinking makes it significantly harder for new users to learn Rust. It is also wrong on the account of UnsafeCell providing a way to mutate T behind a &T. So &T doesn't actually mean read-only, and teaching that it means read-only is actively harmful.

Instead I have found that people understand the compiler better when they think in terms of uniqueness and compile time locks. &T is a shared reference and that &mut T is a unique reference. Then we explain that references are compile time locks, not simple pointers like from other languages. The fact that &T disallows mutation by default is then easily explained as, shared mutability is a hard problem. Once people understand this, they start to understand the problems with their code better and understand why things are defined the way they are in std.

I have two examples that are made significantly more clear by thinking in terms of uniqueness.

Send/Sync

The blanket impl for Send/Sync is as follows,

unsafe impl<T: Sync + ?Sized> Send for &T {}
unsafe impl<T: Send + ?Sized> Send for &mut T {}

If you look at this through the lens of &T is a read only reference and &mut T is a read-write reference, this makes no sense. Why do &T and &mut T have different bounds for Send? It is unclear because there is some very important information missing.

If you look at this through the lens of &T is a shared reference and &mut T is a unique reference, then things become more clear. Sync now carries the meaning that if T can be safely shared across threads then a shared reference to T can be sent across thread boundaries. The bound for &mut T also becomes clear, as you can safely send a unique reference to T across thread boundaries if you can send T across threads, because &mut T is a unique compile time lock to T, so it behaves just like T with few exceptions.

IterMut

The other example I have of how thinking in terms of uniqueness improves understanding is IterMut. Specifically how it requires unsafe to implement for non-trivial collections. This as opposed to Iter which does not require unsafe to implement for non-trivial collections.

Because &mut T is a unique compile time lock and Iterator::collect is safe, you must be able to prove that all &mut T yielded by the iterator are completely disjoint. This is a non-trivial condition and Rust can't prove this for most collections's implementation of IterMut.

An example of a type which could implement IterMut without any unsafe would be a naive linked list. This is because the Rust compile can prove that all yielded &mut T are indeed unique. But this is the simplest data structure you could make, pretty much every other data structure must use unsafe (or ride off of some other unsafe implementation of IterMut) because Rust cannot prove that the yielded &mut T are indeed unique.

Using another implementation of IterMut shifts the burden of proof to the other implementation, so it doesn't really invalidate my point. (yes collections are usually implemented on top of raw pointers which need unsafe anyways, but this also doesn't detract from my point because Rust still can't prove that you are yielding disjoint &mut T).


Because of this, I think it is more important to teach in terms of uniqueness and build constructs around this idea instead of in terms of mutability (which is just a side-effect of safe defaults).

This is something I also thought when I was writing the RFC, but I think it is flawed because we already have completeness, you have either a shared reference or a unique reference, you can't have something that is not shared and not unique (unless it is inaccessible, which is useless), or both shared and unique (That just doesn't make sense). So we have all of the references we need (actually we are still missing one, &own T, which takes ownership of T and allows cheaply owning unsized types).

4 Likes

What about a reference to a memory location representing a write-only register or port? To me, &out, would be great to have to represent that sort of thing. But, that is a different interpretation of &out than others are talking about. It would never be permitted to be turned into an &mut or & because it simply is not readable.

That was what the &out in my RFC was, I’m not too sure about this because I personally have never used such registers and I hear that they are quite rare. I don’t know if it is worth adding a whole new reference type for such a rare thing. You could use *mut T and just never read from it. Maybe even create a wrapper like so,

pub struct Out<'a, T: ?Sized> {
    ptr: *mut T,
    lifetime: PhantomData<&'a mut T>
}

impl<T: ?Sized> Out<'_, T> {
    pub unsafe fn from_raw(ptr: *mut T) -> Self {
        Self {
            ptr, lifetime: PhantomData
        }
    }

    pub fn as_ptr(&mut self) -> *mut T {
        self.ptr
    }
}

impl<T> Out<'_, T> {
    pub fn write(&mut self, value: T) {
        unsafe { self.ptr.write(value) }
    }
}

This should be useable enough as a &out for such registers/ports. If not, there is always unsafe.

4 Likes

I have no idea what you are proposing here and why it is any better than usize.

2 Likes

The primary benefit that references have over usize lies in their lack of an arbitrary safe constructor. If given a usize, you can copy the value, send it through some data channel, and create a new usize on the other end. If that usize is given back to the origin it cannot distinguish them.

If on the other hand, you use an &... you could turn the & into a usize, send it through the channel, but you cannot safely rebuild the &... on the other end of the channel.

That’s not the issue, if you have a &!, then you could dereference it to get an instance of !. Therefore &! must also be uninhabited. This means that it should have all the same properties of !.

Even better:

impl<'a, T : 'a> Out<'a, T> {
    pub
    fn set (self: Out<'a, T>, value: T) -> &'a mut T
    {
        unsafe {
            self.ptr.write(value);
            &mut *self.ptr
        }
    }
}
1 Like

Using &! to represent a unique id feels very hacky even if it was valid.

I would just define a type like this instead:


// Not Copy or Clone,so it's definitely unique
#[derive(Debug)]
pub struct UniqueId{
    id:usize,
}

impl UniqueId{
    pub fn new()->Self{
        use std::sync::atomic::{AtomicUsize,Ordering};
        static UNIQUE_ID:AtomicUsize=AtomicUsize::new(0);
        Self{
            id:UNIQUE_ID.fetch_add(1,Ordering::SeqCst),
        }
    }
    /// Constructs UniqueId from an arbitrary usize.
    /// 
    /// # Safety
    ///
    /// Callers must ensure that there is no other `UniqueId` that returns the same usize from `.id()`,
    /// since users of UniqueId are free to assume that it always has a unique id.
    pub unsafe fn new_unchecked(id:usize)->Self{
        Self{id}
    }

    pub fn id(&self)->usize{
        self.id
    }
}


This is assuming that you only want an id that’s unique within the same process.

1 Like

Right, my thinking was largely that dereferencing &! would need to be statically checkable error, this is problematic for generic functions over of &T -> T… I’m just largely pointing out that there is a case where refence construction semantics are desirable, and where reference equality is well defined, where it matters not whether the reference points to initialized data. To note that there is an instance where having an inhabited &! is useful without triggering undefined behavior. So abolishing references to uninitialized data is not entirely lossless. The tying to this to absurdity is unfortunate.