Allow cloning into MaybeUninit (to allow cloning boxes directly on the heap)

TL;DR: Sometimes, cloning a Box<T> happens in two steps: first T is cloned on the stack, and then the cloned data is copied from the stack to the heap. This can cause stack overflows and serious performance issues if T is very large. I propose to add a new method to the Clone trait: clone_into_uninit(), which can be leveraged by the Clone implementation of Box<T> to offer developers the opportunity of making sure that cloning of boxed data happens directly on the heap and not on the stack.


I maintain a crate (circular-buffer) that contains a CircularBuffer struct which is essentially a wrapper around an arbitrarily-sized array. A CircularBuffer can live on the stack, or on the heap via Box<CircularBuffer>. One problem that I noticed is that cloning a large Box<CircularBuffer> overflows the stack, because the clone is first created on the stack, and then copied to the heap.

This problem can be reproduced with this minimal example:

use std::hint::black_box;

struct MyStruct<const N: usize>([u32; N]);

impl<const N: usize> Clone for MyStruct<N> {
    fn clone(&self) -> Self {
        black_box(Self(self.0))
    }
}

fn main() {
    let x = Box::new(MyStruct([0u32; 100_000_000]));
    let _ = x.clone();
}

which causes the following on my system:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)

This can be reproduced on rustc stable (1.72.0) and nightly (1.74.0-nightly) with release builds.

This problem is not specific to Clone: if you replace Box::new(MyStruct([0u32; 100_000_000])) with Box::new(black_box(MyStruct([0u32; 100_000_000]))) you will also get a stack overflow at initialization. The workaround is to use Box::new_uninit() (or equivalent code), and in fact, for my CircularBuffer struct, I provide two methods for initialization: CircularBuffer::new(), which returns a CircularBuffer on the stack, and CircularBuffer::boxed(), which returns a Box<CircularBuffer> allocated directly on the heap.

However, the Box<CircularBuffer> (or Box<MyStruct> in the example above) cannot be reliably cloned using the Clone trait, because there is no way to specialize Clone for Box<CircularBuffer> (this may be possible in the future with specialization). The only workaround possible is to use the newtype pattern to wrap Box<CircularBuffer> and provide a custom Clone implementation on the newtype, but I personally think that this is overkill and in my opinion Box<T> should "just work".

This could be possible if the Clone trait offered a new clone_into_uninit() method:

pub trait Clone {
    fn clone(&self) -> Self;
    fn clone_from(&mut self, source: &Self) { ... }
    fn clone_into_uninit(&self, uninit: &mut MaybeUninit<Self>) { ... }
}

If such method existed, then the Clone implementation for Box<T> could become something like this:

impl<T: Clone> Clone for Box<T> {
    fn clone(&self) -> Self {
        let mut uninit = Self::new_uninit();
        (*self).clone_into_uninit(&mut uninit);
        unsafe {
            uninit.assume_init()
        }
    }

    // ...other methods...
}

Developers who want to ensure that cloning happens directly on the heap, without going through the stack, can implement their own clone_into_uninit() for their types and they can be assured that cloning a Box<T> will never perform any cloning on the stack.

Similarly, Default could gain a new into_uninit() method to allow initialization of Box<T> directly on the heap.

2 Likes

The unsafe annotation seems a bit off. To me it seems the clone_into_uninit method is not unsafe to call, but unsafe to implement. I'm not sure if any good workarounds exist for saying "this trait is unsafe to implement unless the default implementation of so-and-so method is used".

3 Likes

Perhaps the functionality should be provided by an unsafe trait CloneIntoUninit: Clone instead.

1 Like

The stdlib already has an internal specialization trait that applies this optimization to Copy types, see:

And in fact Box's Clone implementation already looks very similar to the proposed one:


The problem though is how to expose this outside the stdlib. The proposed additional method on Clone looks very wrong because, as @steffahn said, the safety requirement should be on the implementation and not on the caller. Given that in the future trait implementations may omit the unsafe keyword I would consider the proposed extension to Clone unsound.

On the other hand however an extra trait as @HjVT proposed would likely mean exposing specialization, which is also unwanted.

2 Likes

What this really needs is out-references, to make it fully safe (even if the implementation ends up using unsafe internally in some cases).

fn clone_into(&self, place: &out Self) { ... }

I wonder if there's a library version of out-references that could be used, similar to how ReadBuf is used to provide a safe interface for IO to uninit buffers.

1 Like

This doesn't guarantee that the implementation writes into the out reference though. So a perfectly safe implementation would be the noop, whuch clearly won't work as intended

An &out has a requirement that it is written to before leaving scope builtin, a pure library implementation might not be able to but would have to have some way to check and panic at runtime.

Hmm… :thinking:

pub struct UnsafeMethodImpl { _private: () }
/// Safety: `UnsafeMethodImpl` may only be constructed directly in a trait implementation when
/// overriding the safe default implementation of a method with `UnsafeMethodImpl`
/// return type.
/// The defining trait must have defined safety conditions in an `Implementation Safety`
/// section of the method’s documentation, and implementor that is overriding
/// the method and is calling `we_overrode_the_safe_default_method_implementation`
/// has to follow these safety conditions.
/// The trait method and its implementation must be an `unsafe fn` itself
/// and must require in its `Safety` that callers must
/// immediately discard the `UnsafeMethodImpl` token that the method in question returns.
pub unsafe fn we_overrode_the_safe_default_method_implementation() -> UnsafeMethodImpl {
    UnsafeMethodImpl { _private: () }
}
pub trait Clone {
    fn clone(&self) -> Self;
    fn clone_from(&mut self, source: &Self) { ... }
    /// Implementation Safety: Implementors overriding this method
    /// must ensure that `clone_into_uninit` only returns successfully (i.e. without panic)
    /// when the value in `uninit` has been completely initialized to a valid value
    /// of type `Self`.
    ///
    /// Safety: Callers must immediately discard the `UnsafeMethodImpl` token
    /// returned from this method.
    unsafe fn clone_into_uninit(&self, uninit: &mut MaybeUninit<Self>) -> UnsafeMethodImpl {
        let c = self.clone();
        uninit.write(c);
        // Safety: `uninit` has been initialized as required, and we are implementing
        // an `unsafe fn` that requires users to discard the return value.
        unsafe { we_overrode_the_safe_default_method_implementation() }
    }
}

it’s a bit clunky[1], I’ll admit :grin: but maybe it’s sound?


  1. some new types with new API is involved… clone_into_uninit is only unsafe because of its return value otherwise being re-usable in other method implementations… Safety conditions referring to other item’s documentation… ↩︎

1 Like

You can emulate a fn foo(place: &out T) with something like a fn foo(place: Uninit<'a, T>) -> Init<'a, T>, where:

  • Uninit<'a, T> and Init<'a, T> are invariant over 'a
  • Init<'a, T> can only be created by writing to a Uninit<'a, T>
  • you can't manually create a Uninit<'a, T>

I think these 3 conditions should be sufficient to guarantee that if foo returns then place has been written to. I remember seeing a crate that implemented this idea for pinned/inplace initialization but I can't find it anymore.

Why not just

    fn clone_into_uninit(&self, uninit: &mut MaybeUninit<Self>) -> &mut Self {
        let c = self.clone();
        uninit.write(c)
    }

In order to use the &mut Self as witness that the MaybeUninit slot was actually initialized[1], you need to check that the reference is ptr::eq; consider that an overriden impl could be Box::leak(Box::new(self.clone())) or similar to return an unrelated reference. It is extremely easy to forget to do this validation. A similar footgun resulted in changing the unstable feature(read_buf) APIs.

Plus, that still relies on optimization to cut out the stack allocation. I expect OP wants to avoid that reliance, as they have scenarios where it isn't happening in the current Box::clone which is essentially doing that.


  1. If you know the function being called, you can trust it. If it's a trait method, however, you cannot, and need to at least tolerate any possible implementation permitted by the signature. This is where an extension trait is useful, because it can be unsafe to implement (making it sound to trust documented impl safety requirements) and/or blanket implemented and impossible to override. Some sort of final modifier for default-provided trait methods which prevents overriding the default impl would also be sufficient. ↩︎

3 Likes

Right, I was thinking that one could pass an initialized structure to clone_into_uninit and cause memory leaks (or other resource leaks), but that's not UB. I updated the post to remove the unsafe.

I think this is an interesting approach. I actually wanted to propose something very similar myself. This would force implementors to carefully think about whether something has been thoroughly initialized.

However I'm not sure if this is strictly necessary: writing to a MaybeUninit already requires unsafe code, and that perhaps is a strong enough barrier to make people stop and think on whether their initialization code is thorough enough.

no that’s not quite the case. Reading from MaybeUninit requires unsafe code, but for writing, there’s even an appropriately named (and safe!) method.

2 Likes

Right, right, sorry. This is why I shouldn't write stuff on the internet too late at night or too early in the morning. In that case a Uninit/Init approach may be helpful to communicate expectations, although I'm not 100% sure it's worth it (which is the reason why I omitted in my original post).

There are many known cases where large Box causes stack overflow, including Box::new, so the problem is not limited to cloning.

I think the best way to fix this would be to improve rustc code generation to always carefully avoid stack and initialize Boxes in place. It's already a special type to the compiler.

I don't think that will help? ie in this situation: let b = Box::new(big_structure.clone())
the actual problem is in .clone(), no matter how much more magic you add to Box::new, it won't be able to make clone return by reference instead of by value.

Guaranteed RVO would solve this as far as I understand.

After re-reading: The guaranteed copy elision would need to be far more powerful that what's proposed in the RFC to solve the problem here.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.