&own sugar/manual RVO

Disclaimer: post title/post are intentionally provocative; no chance this is implemented soon. Pls. feel free to ignore if you see no merit

Topic on ehanced existentials in return position has prompted me to suggest:

  • make 'a T be approximately synonymous to yet non-existant &own 'a T
  • make let 'a T = ... mean "allocate T on the stack in the correct scope to match lifetime 'a"
  • allow such 'a to optionally be fn's parameter, e.g. a lifetime from a parent stack frame
  • implement this via an implicit buffer pointer passed into the function, one buffer per lifetime
  • implicitly compute the sizes of all such buffers for each fn that takes them
  • make callers implicitly allocate buffers of the right size on their stack
  • introduce rules that ensure that each such fn is invoked at most once per callsight per lifetime used in this manner
  • mark these lifetimes explicitly on fn definition, say via fn foo<super 'a>() -> 'a [u8]
  • the general expectation will be these lifetimes are used in some way in function return type

This is very similar to what I was suggesting on that other topic. The only novelty here is the 'a T syntax. On the one hand it is almost exactly synonymous to &own 'a T much discussed and implementable in user code. On the other hand I find that it can be viewed as a significant leap in intuitiveness and makes the whole construct closer mentally to returning (linked graphs of) unsized types by value.

The suggested syntax allows to "reimplement" RVO/-> impl Trait "manually". It is a more explicit way to implement RVO/-> impl Trait. The special case when a function returns just one value and the pointer to that value matches the buffer pointer passed in could perhaps be internally optimized by the compiler so that the pointer is not passed back again needlessly - the caller knows it already.

My thinking is that when 'a T syntax is used with let in fn body it would be compiled either to use of pointers or to use of normal variables on the stack - if 'a is local to current function invocation. On the other hand when 'a T is used as part of a struct or enum it will always mean what &own 'a T would have meant.

This is unimplementable in Rust's compilation model. Lifetimes are erased before codegen, so there is no knowledge about what scope would be "correct".

2 Likes

Thx, I see. Could we then read the above as suggesting this "sugar expansion" happens earlier?

I'm no expert on compiler internals. But if &own handles value-level deallocation, does codegen really need to know the lifetimes involved? Seems to me that it'd be enough for codegen purposes to treat a let 'a T statement the same as let foo: MaybeUninit<T>;.

IIUC, currently all let within a function body are effectively added up to determine how much stack space to allocate at the beginning of the function (notably, IIUC this is problematic as pointers to dropped values aren't currently invalidated, though unsafe code shouldn't rely on that).

Note: all of the above is assuming alloca isn't involved.

Assuming it's &'a T what's meant here, what does this even mean? What are the semantics of an owning borrow (a contradictio in terminis)?

2 Likes

I should have been more clear. I meant

  • the implementation of 'a T would be a pointer
  • the syntax for its usage would be the same as used for non-reference types
// allocated in the scope associates with 'a
// possibly in a parent stack frame
// if indeed allocated in a parent stack frame
// our frame will contain a pointer to it
let 'a a = A{ a : 8u };

"Owning reference" &own 'a T would mean:

  • memory occupied by the value is borrowed and may possibly not outlive the lifetime of the reference
  • you still own the content of that memory
  • the pointer that you have in your hands is the only valid pointer to that memory
  • you can write to it
  • you can move out of it
  • assigning to another &own a' T is a move
  • if &own a' T is destroyed (was assigned to local variable in a scope and not moved out of it by end of the scope) then drop is executed

As pointed out by @dhm owning references can generally be implemented in user code now.


Above 'a was part of let binding. Could it instead be part of the type?

// struct allocated in a parent frame if 'a is fn's param
// the pointer to the newly allocated struct resides in current stack frame
// implementation of a is &own 'a A
// syntax for a is that of A
let a = 'a A{ a : 8u };

// pair allocated in the current stack frame
// 2nd element of the pair is a pointer
// that pointer points to a struct allocated in the stack frame associated with 'a
let pair = ( 8u, 'a A{ a : 8u } );

Having written this all I started thinking about Deref coercions

I see several issues with your explanation as it currently stands:

If I understand you correctly, this would mean a Rustacean cannot distinguish syntactically between the use of a value type and the use of a reference type. This may or may not be problematic.

Personally I consider this a red flag w.r.t. safety. Specifically it would throw Rust back to the level of safety of C++, in that dangling pointers would not be impossible anymore in safe Rust. Currently that kind of unsafety requires, well, unsafe, which makes it explicit that black magick is being performed. And even then a dangling pointer is pretty much always a bug with potentially grave consequences.

This is the other side of the dangling pointer issue: How does owning that value work if it can be dropped before the owning borrow? Today it's pretty much guaranteed that a borrowed value cannot be dropped in safe Rust. At least not without dropping the borrows first.

As for the rest of the items on that list, those look like the semantics of Box<T> to me. Which begets the question: how does this proposal differ from Box<T>? That is unclear to me.

I'm not well-versed in rustc internals, but it would surprise me if that wasn't a huge change to the conceptual model of how Rust works.

Perhaps it's a good moment to stop for a second and ask: what do you want this for, exactly? What problem are you experiencing that this would solve which current constructs cannot?

2 Likes

I might be wrong, but I think this all can be implemented quite safely..

struct OwnRef<'out, T : 'out + ?Sized> {
    ptr::NonNull<T>,
    PhantomData<&'out mut T>,
}

impl<'out, T : 'out + ?Sized> OwnRef<'out, T> {
    // compiler magic: after this is invoked normal dropping is suppressed on the object referenced
    // it is if we move out of that object but the memory region remains borrowed mutably
    // this means that on the one hand the memory area is not usable anymore except through
    // this OwnRef but it also means that the owner has the obligation not to deallocate it
    // for the duration of the lifetime
    pub fn new(r : &mut T) -> Self {
        OwnRef{ptr : ptr::NonNull::new(r as *mut T)}
    }
}

impl<'out, T : 'out + ?Sized> Drop for OwnRef<'out, T> {
    fn drop(&mut self) {
        unsafe{ ptr::drop_in_place(self.ptr.as_ptr()); }
    }
}

impl<'out, T : 'out + ?Sized> Deref for OwnRef<'out, T> ...
..DerefMut..

I believe the idea of having &own references has been around for a while. I also believe the semantics of those has been understood as I described above. I honestly don't see the reason the concepts of owning memory (having the obligation and means to deallocate it) and owning content of that memory (say owning a block of heap referenced from the said memory) cannot be decoupled.

Surely Drop::drop() can be invoked long before the actual stack frame is popped or free invoked on the relevant block of heap memory. Surely Drop::drop() can be invoked from a few stack frames down from where the memory is deallocated. Provided that Drop::drop() is not invoked after the memory is actually deallocated. And provided that after Drop::drop() that memory remains unusable.

The lifetime engraved in &out (aka OwnRef) reference is the lifetime of memory. Of course whoever holds such an &own (aka OwnRef) cannot hold onto it/use it longer than the lifetime engraved in the type. So he has to dispose of his ownership (dispose of that OwnRef) before that lifetime expires - and that's when drop gets called. But this can happen a few stack frames down from the stack frame in which the memory is actually allocated.

I believe the idea has been around for a while and is uncontroversial. I'm actually surprised by the amount of conversation we're having about it. It appears very obviously safe and correct to me. I was under impression it is not yet implemented chiefly because nobody felt like it would yield enough benefit.

Yes indeed this blurs the line between a reference and a value. This is indeed what I have brought up for discussion. Philosophically what does an identifier that refers to a struct on the stack signify? It can be viewed as signifying the address of such a struct. From this point of view 'a S identifier would signify just the same - but a few stack frames up. Is that good or bad? Open to opinion it seems to me.

In other words an identifier that refers to a struct on stack is an lvalue. So this identifier referencing 'a S is an lvalue as well.

What do I want? Good question :slight_smile: Always good to ask. I guess I want to talk, to ponder, to consider possibilities. I almost have no horse in this race but I really wanted to discuss/consider to death what @petertodd suggested in the neighboring topic. I've taken his idea and put my spin on it. What would it feel like if it really was implemented?

In the process I did answer the question of what it might be good for:

By "may possibly not outlive", I think @atagunov means that the memory may last no longer than the reference. Not shorter.

Here's the key parts of an example user-space implementation:

 #[repr(transparent)]
pub struct RefOwn<'a, T: 'a + ?Sized> {
    marker: PhantomData<&'a T>,
    ptr: NonNull<T>,
}

impl<'a, T: 'a + ?Sized> RefOwn<'a, T> {
    /// Creates a new `RefOwn`
    /// 
    /// # Safety
    ///
    /// `owned` must not be accessed in any way after this call, except
    /// via the `RefOwn` wrapper.
    pub unsafe fn new_unchecked(owned: &'a mut ManuallyDrop<T>) -> Self {
        Self {
            marker: PhantomData,
            ptr: NonNull::from(owned).cast(),
        }
    }
}

impl<T: ?Sized> Drop for RefOwn<'_, T> {
    fn drop(&mut self) {
        unsafe { ptr::drop_in_place(self.ptr.as_ptr()) }
    }
}

impl<T: ?Sized> Deref for RefOwn<'_, T> {
    type Target = T;

    fn deref(&self) -> &T {
        unsafe {
            &*self.ptr.as_ptr()
        }
    }
}

impl<T: ?Sized> DerefMut for RefOwn<'_, T> {
    fn deref_mut(&mut self) -> &mut T {
        unsafe {
            &mut *self.ptr.as_ptr()
        }
    }
}

The key thing is the lifetime of the logical value is that of the RefOwn<'a, T> / &'a own T. However the memory itself is of lifetime 'a. That means the borrow checker is ensuring that the memory the pointer points to is valid for at least as long as the T value is live.

An &own T reference is somewhat like a box. Except that the underlying "memory allocation" isn't limited to the heap: it's anywhere in memory, or even parts of an object.

For example, slices could gain the following:

impl<T> [T] {
    pub fn split_at_own(&own self, mid: usize) -> (&own [T], &own [T]) {
        /* ... */
    }
}

That lets you take a contiguous slice of things that you own, and split that into two parts. The actual memory it came from is guaranteed valid by the borrow checker for the lifetime of the &own references.

1 Like

I'd like to interject with my experience: this is pretty darn inconvenient, even if not technically problematic, — not knowing where a reference is in a chain of types makes certain kinds of reasoning much harder. (Again, especially when it interacts with unsafe).

Thank you for the explanation. So if I understand it correctly, in terms of mechanics it essentially acts like a scoped Box<T>, where both the &'a own T borrow and the T value behind it are essentially dropped at the same time i.e. at the end of the scope.

One last question: I have never used such a construct. What kinds of things would this be useful for? I just found the owning_ref crate, is this similar to that in terms of use cases and conceptual scope?

1 Like

No, owning_ref is for something very different.

I use RefOwn to abstract over how a value is allocated. One of my projects is basically copy-on-write data persistence/serialization. A key bit of its API is Ref, which is similar to Cow:

pub enum Ref<T: ?Sized + IntoOwned> {
    Borrowed(&'a T),
    Owned(T::Owned),
}

...this is used in any function that works with data that may be stored on disk rather than in memory. For example, the Box equivalent in my library, Bag has methods like:

impl<P: Ptr, T: ?Sized> Bag<T, P>
where T: IntoOwned + LoadRef,
      P::Zone: AsZone<T::Zone>,
{
    fn get<'a>(&'a self) -> Ref<'a, T> {
        /*  ... */
    }

    fn take(self) -> T::Owned {
        /* ... */
    }
}

Depending on whether or not the T value is resident in memory ("clean" vs "dirty"), Bag::get will either return a reference or an owned value within the Ref<T>. Since Ref<T> implements Deref<Target=T>, most code doesn't need to care about the difference. Similarly, Bag::take can return an owned value efficiently, even if T is unsized.

IntoOwned is similar to ToOwned. But there is a really important difference: IntoOwned can be implemented even if you can't make a copy of a value:

trait IntoOwned {
    type Owned : Borrow<Self> + Take<Self>;

    fn into_owned(self: RefOwn<Self>) -> T::Owned;
}

That allows a blanket implementation on all sized types:

impl<T> IntoOwned for T {
    type Owned = T;

    fn into_owned(self: RefOwn<Self>) -> Self {
        let this: &mut Self = RefOwn::leak(self);
        unsafe { ptr::read(this) }
    }
}

...with specific implementations for specific unsized types such as slices:

impl<T> IntoOwned for [T] {
    type Owned = Vec<T>;

    fn into_owned(self: RefOwn<Self>) -> Vec<T> {
        /* a bunch of slightly hairy unsafe code */
    }
}

Of course, the above could be done with pointers instead of RefOwn. But having it makes the safety boundaries a fair bit cleaner: for various reasons, without RefOwn I'd need to mark quite a few traits and functions in my crate as unsafe, as they would otherwise have to have safety contracts that can be encapsulated by RefOwn.

I actually did try doing all of the above initially with ToOwned. But the T: Clone requirement in the blanket implementation was just too restrictive, as in my library I can't guarantee in general that data can be cloned. Kind of unfortunate, as IntoOwned really should be a supertrait of ToOwned.

The Take trait mentioned above is another place where RefOwn gets used:

trait Take<T: ?Sized + IntoOwned> {
    fn take_with(self, f: F) -> R
        where F: FnOnce(RefOwn<T>) -> R;
}

It's like Borrow, but for taking ownership of a value. Again, it has a blanket implementation:

impl<T> Take<T> for T {
    fn take_with(self, f: F) -> R
        where F: FnOnce(RefOwn<T>) -> R
    {
        let mut this = ManuallyDrop::new(self);
        unsafe {
            f(RefOwn::new_unchecked(&mut *this))
        }
    }
}

...and a rather trivial implementation for RefOwn<T>!

Finally, I'll point out that you can hack together something sort of similar with the unstable allocator arguments that just got added to Box:

fn into_owned<A: AllocRef>(self: Box<Self, A>) -> Self::Owned;

But I'll leave that as a homework problem!

2 Likes