TL;DR: Sometimes, cloning a Box<T> happens in two steps: first T is cloned on the stack, and then the cloned data is copied from the stack to the heap. This can cause stack overflows and serious performance issues if T is very large. I propose to add a new method to the Clone trait: clone_into_uninit(), which can be leveraged by the Clone implementation of Box<T> to offer developers the opportunity of making sure that cloning of boxed data happens directly on the heap and not on the stack.
I maintain a crate (circular-buffer) that contains a CircularBuffer struct which is essentially a wrapper around an arbitrarily-sized array. A CircularBuffer can live on the stack, or on the heap via Box<CircularBuffer>. One problem that I noticed is that cloning a large Box<CircularBuffer> overflows the stack, because the clone is first created on the stack, and then copied to the heap.
This problem can be reproduced with this minimal example:
use std::hint::black_box;
struct MyStruct<const N: usize>([u32; N]);
impl<const N: usize> Clone for MyStruct<N> {
fn clone(&self) -> Self {
black_box(Self(self.0))
}
}
fn main() {
let x = Box::new(MyStruct([0u32; 100_000_000]));
let _ = x.clone();
}
which causes the following on my system:
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)
This can be reproduced on rustc stable (1.72.0) and nightly (1.74.0-nightly) with release builds.
This problem is not specific to Clone: if you replace Box::new(MyStruct([0u32; 100_000_000])) with Box::new(black_box(MyStruct([0u32; 100_000_000]))) you will also get a stack overflow at initialization. The workaround is to use Box::new_uninit() (or equivalent code), and in fact, for my CircularBuffer struct, I provide two methods for initialization: CircularBuffer::new(), which returns a CircularBuffer on the stack, and CircularBuffer::boxed(), which returns a Box<CircularBuffer> allocated directly on the heap.
However, the Box<CircularBuffer> (or Box<MyStruct> in the example above) cannot be reliably cloned using the Clone trait, because there is no way to specialize Clone for Box<CircularBuffer> (this may be possible in the future with specialization). The only workaround possible is to use the newtype pattern to wrap Box<CircularBuffer> and provide a custom Clone implementation on the newtype, but I personally think that this is overkill and in my opinion Box<T> should "just work".
This could be possible if the Clone trait offered a new clone_into_uninit() method:
pub trait Clone {
fn clone(&self) -> Self;
fn clone_from(&mut self, source: &Self) { ... }
fn clone_into_uninit(&self, uninit: &mut MaybeUninit<Self>) { ... }
}
If such method existed, then the Clone implementation for Box<T> could become something like this:
impl<T: Clone> Clone for Box<T> {
fn clone(&self) -> Self {
let mut uninit = Self::new_uninit();
(*self).clone_into_uninit(&mut uninit);
unsafe {
uninit.assume_init()
}
}
// ...other methods...
}
Developers who want to ensure that cloning happens directly on the heap, without going through the stack, can implement their own clone_into_uninit() for their types and they can be assured that cloning a Box<T> will never perform any cloning on the stack.
Similarly, Default could gain a new into_uninit() method to allow initialization of Box<T> directly on the heap.