Pre-Pre-RFC: core::slice::OwnedSlice

I suppose I'll just start with the use-case I've found for this: CPython has a sort of calling convention called "vectorcall", which instead of the normal calling convention of PyTupleObject* args, PyDictObject* kwargs accepts a PyObject** argv vector, a length, and an immutable tuple that shows which objects in argv are actually kwargs. In thinking about how to implement this in RustPython (currently, we use a similar struct to the standard CC that holds Vec<PyObjectRef>, IndexMap<String, PyObjectRef>) I realized that there's no way to make it equivalent to CPython (without writing unsafe code) - there's no type in std to represent a borrowed slice (in the lifetime sense) that still has ownership (in the drop sense) over its data. We'd have to either pass a &[PyObjectRef], which would involve having to clone the arguments out of the argument vector (not necessarily cheap - we use atomic refcounts for multithreading), or pass a Vec<PyObjectRef>, which we're doing already.

So, core::slice::OwnedSlice:

Guide-level explanation

OwnedSlice is a slice type that borrows memory but semantically owns its contents.

fn process_strings(v: OwnedSlice<'_, String>) {
    for owned_string in v {
        process(owned_string);
    }
}

let mut strings: Vec<String> = get_strings();
// or maybe also a Vec::split_owned_slice(&mut self, range_start: usize) -> OwnedSlice<'_, T>
// that can just set_len(range_start)
let mut drain = strings.drain(5..);
process_strings(drain.as_owned_slice());

Reference-level explanation

pub struct OwnedSlice<'a, T> {
    data: *mut [T],
    _marker: PhantomData<(&'a mut T, T)>, // idk how to variance
}

impl<'a, T> OwnedSlice<'a, T> {
    /// SAFETY: `data` must not be used after being passed to OwnedSlice, as it will have been dropped
    pub unsafe fn from_slice(data: &'a mut [T]);

    pub fn as_slice(&self) -> &'a [T];
    pub fn as_mut_slice(&mut self) -> &'a mut [T];
}

impl<'a, T> Deref for OwnedSlice<'a, T> {
    type Target = [T];
}
impl<'a, T> DerefMut for OwnedSlice<'a, T> {}

impl<'a, T> IntoIterator for OwnedSlice<'a, T> {
    type IntoIter = OwnedIntoIter<'a, T>;
    type Item = T;
}

unsafe impl<'a, #[may_dangle] T> Drop for OwnedSlice<'a, T> {
    fn drop(&mut self) { ptr::drop_in_place(self.data) }
}

Prior Art

std::vec::Drain: similar, and the main mechanism for passing around owned borrowed slices in std, but only works for things that have already been allocated in specifically a std::vec::Vec

bumpalo::boxed::Box<'a, [T]>: nearly identical (I believe) to this, but only works for things allocated in a bumpalo::Bump

2 Likes

This is basically a special case of "&own" owned references. OwnedSlice<'a, T> is effectively &own [T].

I think your impl as written is unsound (may_dangle then drop_in_place), and it's not actually useful over &mut unless you can actually semantically move out of the reference.

1 Like

Ah, yeah, I was just about to add a future possibilities: generic OwnedRef<'a, T: ?Sized>. And you can move out of it with the IntoIterator impl, but I suppose remove/drain/other functions from Vec would be useful as well.

Also re: #[may_dangle], I couldn't remember exactly how it functions, but it seems like Vec has #[may_dangle] and it does basically the same thing for its destructor.

I think drop_in_place is fine:

https://rust-lang.github.io/rfcs/1327-dropck-param-eyepatch.html#the-eyepatch-attribute

When used on a type, e.g. #[may_dangle] T , the programmer is asserting the only uses of values of that type will be to move or drop them.

1 Like

I find the name OwnedSlice confusing, because that describes Box<[T]>.

9 Likes

Wouldn't unsized locals solve this problem without any need for weird things like "owning references"? If one could create a slice without being obligated to put it behind a borrow, then it could also be passed around by value just fine, like a primitive array can, except that its size would be dynamic.

1 Like

Possibly, but only if you could be sure the compiler would optimize away all the stack copies, which could otherwise be a problem for something as hot as this.

For example, even in a simple example where a function accepts a [T] by value and then passes it on to another function untouched… under the current implementation, the generated assembly makes a full copy of the slice on that function's own stack. This involves computing the size, doing a stack probe, then making an external call to memcpy. Since at an assembly level the slice passed "by value" is really just a pointer, it could be optimized to pass on the same pointer to the second function, but it currently is not.

2 Likes

Sure. I could however imagine that unsized arguments (or at least slices-by-value) get a new, special-cased calling convention, which defines them to always be passed as a (pointer, length) pair, avoiding reliance on optimizations. Would that be feasible?

This is indeed a &'_ own [T] / &'_ move [T], which can be implemented by a user library:

3 Likes

Ah, so there is a crate for it. I thought there had to be one, but couldn’t think of a good search term. owning_ref exists but is something completely different.

Coerces a StackBox<[T; N]> into a StackBox<[T; N] .

I suppose that should be "into a StackBox<[T]>"?

2 Likes

:no_mouth: Just letting hints to see if "the astute reader would catch them" :grin: Fixed, thanks :slightly_smiling_face:

1 Like

Here's an "owned slice" implementation I was using as part of a merge sort implementation:

use std::alloc::{dealloc, Layout};
use std::marker::PhantomData;
use std::mem::ManuallyDrop;
use std::ops::{Deref, Index, IndexMut, DerefMut};
use std::ptr::{NonNull, slice_from_raw_parts, slice_from_raw_parts_mut};

pub trait VecExt {
    type Item;

    fn consume<U>(self, f: impl for<'a> FnOnce(OwnedSlice<'a, Self::Item>) -> U) -> U;
}

impl<T> VecExt for Vec<T> {
    type Item = T;

    fn consume<U>(self, f: impl for<'a> FnOnce(OwnedSlice<'a, Self::Item>) -> U) -> U {
        let mut vec = ManuallyDrop::new(self);
        let (mem, slice) = unsafe {
            let mem = VacatedMemory {
                ptr: NonNull::new_unchecked(vec.as_mut_ptr()),
                capacity: vec.capacity(),
                phantom: PhantomData,
            };
            let slice = OwnedSlice {
                ptr: NonNull::new_unchecked(vec.as_mut_ptr()),
                len: vec.len(),
                phantom: PhantomData,
            };
            (mem, slice)
        };
        let result = f(slice);
        drop(mem);
        result
    }
}

pub struct VacatedMemory<'a, T> {
    ptr: NonNull<T>,
    capacity: usize,
    phantom: PhantomData<&'a T>, // not completely sure about the variance here!!!
}

impl<'a, T> Drop for VacatedMemory<'a, T> {
    fn drop(&mut self) {
        let layout = Layout::array::<T>(self.capacity).unwrap();
        unsafe {
            dealloc(self.ptr.as_ptr() as *mut u8, layout)
        }
    }
}

pub struct OwnedSlice<'a, T> {
    ptr: NonNull<T>,
    len: usize,
    phantom: PhantomData<(T, &'a ())>,
}

impl<'a, T> OwnedSlice<'a, T> {
    pub fn split_at_owned(self, index: usize) -> (OwnedSlice<'a, T>, OwnedSlice<'a, T>) {
        if index >= self.len {
            panic!("Attempted to split past end of slice: len={}, index={}", self.len, index);
        }

        let me = ManuallyDrop::new(self);

        (
            OwnedSlice {
                ptr: me.ptr,
                len: index,
                phantom: PhantomData,
            },
            OwnedSlice {
                ptr: unsafe { nonnull_add(me.ptr, index) },
                len: me.len - index,
                phantom: Default::default(),
            }
        )
    }
}

impl<'a, T> Deref for OwnedSlice<'a, T> {
    type Target = [T];

    fn deref(&self) -> &[T] {
        unsafe {
            &*slice_from_raw_parts(self.ptr.as_ptr(), self.len)
        }
    }
}

impl<'a, T> DerefMut for OwnedSlice<'a, T> {
    fn deref_mut(&mut self) -> &mut [T] {
        unsafe {
            &mut *slice_from_raw_parts_mut(self.ptr.as_ptr(), self.len)
        }
    }
}

impl<'a, T> Iterator for OwnedSlice<'a, T> {
    type Item = T;

    fn next(&mut self) -> Option<Self::Item> {
        if self.len == 0 {
            None
        } else {
            unsafe {
                let result = self.ptr.as_ptr().read();
                self.ptr = nonnull_add(self.ptr, 1);
                self.len -= 1;
                Some(result)
            }
        }
    }
}

impl<'a, T> Drop for OwnedSlice<'a, T> {
    fn drop(&mut self) {
        for _ in self {}
    }
}


unsafe fn nonnull_add<T>(ptr: NonNull<T>, count: usize) -> NonNull<T> {
    NonNull::new_unchecked(ptr.as_ptr().add(count))
}


#[cfg(test)]
mod tests {
    use crate::owned_slice::VecExt;

    #[test]
    fn test_owned_slice_from_vec_split_at_owned() {
        let vec = vec![Box::new(1), Box::new(2), Box::new(3), Box::new(4), Box::new(5)];
        vec.consume(|slice| {
            assert_eq!(slice.len(), 5);
            let (mut a, mut b) = slice.split_at_owned(2);
            assert_eq!(a[1], Box::new(2));
            assert_eq!(a.len(), 2);
            assert_eq!(b.len(), 3);
            assert_eq!(a.next(), Some(Box::new(1)));
            assert_eq!(a.len(), 1);
            assert_eq!(b.next(), Some(Box::new(3)));
            assert_eq!(b.len(), 2);
            assert_eq!(b.next(), Some(Box::new(4)));
            assert_eq!(b.len(), 1);
            assert_eq!(a.next(), Some(Box::new(2)));
            assert_eq!(a.len(), 0);
            assert_eq!(a.next(), None);
            assert_eq!(a.len(), 0);
            assert_eq!(b.next(), Some(Box::new(5)));
            assert_eq!(b.len(), 0);
            assert_eq!(b.next(), None);
            assert_eq!(b.len(), 0);
            assert_eq!(a.next(), None);
            assert_eq!(a.len(), 0);
            assert_eq!(b.next(), None);
            assert_eq!(b.len(), 0);
        });
    }
}

And a usage example:

pub fn merge_sort<T: Ord>(data: Vec<T>) -> Vec<T> {
    data.consume(merge_sort_slice)
}

pub fn merge_sort_slice<T: Ord>(data: OwnedSlice<T>) -> Vec<T> {
    if data.len() <= 1 {
        data.collect()
    } else {
        let mid = data.len() / 2;

        let (left, right) = data.split_at_owned(mid);

        let left_sorted = merge_sort_slice(left);
        let right_sorted = merge_sort_slice(right);

        merge(left_sorted, right_sorted)
    }
}

I considered using vecshard but didn't want the overhead of refcounting.