Is there any reason that `Vec::split_off` use copy and no in-place operation available?

The source of Vec::split_off use ptr::copy_nonoverlapping to split a vector into two:


    pub fn split_off(&mut self, at: usize) -> Self
    where
        A: Clone,
    {
        #[cold]
        #[inline(never)]
        fn assert_failed(at: usize, len: usize) -> ! {
            panic!("`at` split index (is {at}) should be <= len (is {len})");
        }

        if at > self.len() {
            assert_failed(at, self.len());
        }

        if at == 0 {
            // the new vector can take over the original buffer and avoid the copy
            return mem::replace(
                self,
                Vec::with_capacity_in(self.capacity(), self.allocator().clone()),
            );
        }

        let other_len = self.len - at;
        let mut other = Vec::with_capacity_in(other_len, self.allocator().clone());

        // Unsafely `set_len` and copy items to `other`.
        unsafe {
            self.set_len(at);
            other.set_len(other_len);

            ptr::copy_nonoverlapping(self.as_ptr().add(at), other.as_mut_ptr(), other.len());
        }
        other
    }

which assume T must be Copy, seems quite strange.

Are there any reason that the split_off not perform in-place?

Is it something related to the memory allocator?

Actually, that's a different meaning of the word “copy” in Rust, so T: Copy is not a requirement.

Moving data in Rust happens via

  • (shallow) copying of data to a different place
  • “invalidating” the original value (making sure its no longer used, and no destructors are called)

You usually won't do this manually, but unsafe code like this one sometimes does. The first step is what the ptr::copy_nonoverlapping in this code does, and it works with any type as Vec elements. (And the set_len makes sure the original values are invalidated.)

The reason why there's any moving of elements to a different place happening in the first place is because every Vec always owns its own separate allocation, and you can't just split up an allocation into two in-place.

12 Likes

Where did you get this from? Neither the bounds on the impl nor those on the method include T: Copy.

Yes, because it can't be done. Again, look at the signature, the return type in particular. The vector owns its elements, so for splitting off the tail, a new vector must be allocated and elements must be moved into it. How do you propose one creates two vectors from one in place?

1 Like

sorry for my careless..

I both confused A:Allocator with Vec's generics T since they are both generics and confused Clone with Copy since they both started with C..


the struct of Vec is a pointer, its cap, and its length.

if we modify the cap and length properly while ignore the memory allocator, we could have two Vec in one place:

before:

Vec{RawVec{pointer,cap},size}

after:

Vec{RawVec{pointer,split},split} and Vec{RawVec{pointer+split,cap-split},size-split}

without considering memory allocator, that could be fine.


Actually I'm considering the following question:

I have a function consume a Vec into a value, which uses divide and conquer algorithm like:

use rug::Integer;
fn cal2_div(v:&mut [Integer])->Integer{
    let l=v.len();
    if l>=2{
        let mid=l>>1;
        cal2_div(&mut v[0..mid])*cal2_div(&mut v[mid..l])
    }else{std::mem::take(&mut v[0])}
}

Here we must use std::mem::take to obtain the true value, which create an unnecessary default value to make rustc happy.

If we use Vec here, no more Default::default is called and the performance may better.

Is there some method to consume a vector by devide and conquer without extra allocations?

or, do we need a "consumable vector" that could be divide into two "consumable vector" inplace and we could obtain its element without insert a default value?

What you want is an array of storage cells that can be emptied individually — which for a Vec is not the case (you need to always keep the full cells as a contiguous slice in the beginning of the array). The reason for this is ownership: the Vec needs to know which boxes are still full when the Vec is dropped, because the Vec is the owner and therefore responsible; it therefore keeps a len field to know that the first len boxes are full.

The slices that you show above — which work nicely for divide and conquer — are even dumber than a Vec; they assume that all their boxes are always filled.

Using standard Rust tools there are two ways you can get rid of the Integer::new() that you want to avoid (although I’m not sure that’s an actual cost, given that it is a const fn):

  • The safe variant is to use &mut [Option<Integer>] and Option::take instead of std::mem::take.
  • If those eight bytes of overhead per Integer are too much, then you can go unsafe and use MaybeUninit — which is essentially the same but the compiler doesn’t help you regarding which box is still filled and which not.
1 Like

But you can't just ignore the memory allocator if the very thing you are doing is dealing with memory given out by the allocator.

You are probably looking for a slice and <[T]>::split_at() for performing the simple pointer arithmetic you described. Or you could just use indices and split up the vector into sub-ranges based on those indices only once, at the very end of your algorithm.

6 Likes

Exactly, without considering memory allocators. But that's a crucial detail: it's invalid to free a pointer pointing in the middle of another allocation, and it's also invalid to free a pointer with the wrong length (for the first half). Moreover Vec guarantees it won't do any weird stuff like coordinating with other Vecs to split allocations, and people rely on this, so you can't silently break it.

10 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.