TL;DR: Sometimes, cloning a Box<T>
happens in two steps: first T
is cloned on the stack, and then the cloned data is copied from the stack to the heap. This can cause stack overflows and serious performance issues if T
is very large. I propose to add a new method to the Clone
trait: clone_into_uninit()
, which can be leveraged by the Clone
implementation of Box<T>
to offer developers the opportunity of making sure that cloning of boxed data happens directly on the heap and not on the stack.
I maintain a crate (circular-buffer) that contains a CircularBuffer
struct which is essentially a wrapper around an arbitrarily-sized array. A CircularBuffer
can live on the stack, or on the heap via Box<CircularBuffer>
. One problem that I noticed is that cloning a large Box<CircularBuffer>
overflows the stack, because the clone is first created on the stack, and then copied to the heap.
This problem can be reproduced with this minimal example:
use std::hint::black_box;
struct MyStruct<const N: usize>([u32; N]);
impl<const N: usize> Clone for MyStruct<N> {
fn clone(&self) -> Self {
black_box(Self(self.0))
}
}
fn main() {
let x = Box::new(MyStruct([0u32; 100_000_000]));
let _ = x.clone();
}
which causes the following on my system:
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)
This can be reproduced on rustc stable (1.72.0) and nightly (1.74.0-nightly) with release builds.
This problem is not specific to Clone
: if you replace Box::new(MyStruct([0u32; 100_000_000]))
with Box::new(black_box(MyStruct([0u32; 100_000_000])))
you will also get a stack overflow at initialization. The workaround is to use Box::new_uninit()
(or equivalent code), and in fact, for my CircularBuffer
struct, I provide two methods for initialization: CircularBuffer::new()
, which returns a CircularBuffer
on the stack, and CircularBuffer::boxed()
, which returns a Box<CircularBuffer>
allocated directly on the heap.
However, the Box<CircularBuffer>
(or Box<MyStruct>
in the example above) cannot be reliably cloned using the Clone
trait, because there is no way to specialize Clone
for Box<CircularBuffer>
(this may be possible in the future with specialization). The only workaround possible is to use the newtype pattern to wrap Box<CircularBuffer>
and provide a custom Clone
implementation on the newtype, but I personally think that this is overkill and in my opinion Box<T>
should "just work".
This could be possible if the Clone
trait offered a new clone_into_uninit()
method:
pub trait Clone {
fn clone(&self) -> Self;
fn clone_from(&mut self, source: &Self) { ... }
fn clone_into_uninit(&self, uninit: &mut MaybeUninit<Self>) { ... }
}
If such method existed, then the Clone
implementation for Box<T>
could become something like this:
impl<T: Clone> Clone for Box<T> {
fn clone(&self) -> Self {
let mut uninit = Self::new_uninit();
(*self).clone_into_uninit(&mut uninit);
unsafe {
uninit.assume_init()
}
}
// ...other methods...
}
Developers who want to ensure that cloning happens directly on the heap, without going through the stack, can implement their own clone_into_uninit()
for their types and they can be assured that cloning a Box<T>
will never perform any cloning on the stack.
Similarly, Default
could gain a new into_uninit()
method to allow initialization of Box<T>
directly on the heap.