Allow cloning into MaybeUninit (to allow cloning boxes directly on the heap)

andrea · August 28, 2023, 8:55am

TL;DR: Sometimes, cloning a Box<T> happens in two steps: first T is cloned on the stack, and then the cloned data is copied from the stack to the heap. This can cause stack overflows and serious performance issues if T is very large. I propose to add a new method to the Clone trait: clone_into_uninit(), which can be leveraged by the Clone implementation of Box<T> to offer developers the opportunity of making sure that cloning of boxed data happens directly on the heap and not on the stack.

I maintain a crate (circular-buffer) that contains a CircularBuffer struct which is essentially a wrapper around an arbitrarily-sized array. A CircularBuffer can live on the stack, or on the heap via Box<CircularBuffer>. One problem that I noticed is that cloning a large Box<CircularBuffer> overflows the stack, because the clone is first created on the stack, and then copied to the heap.

This problem can be reproduced with this minimal example:

use std::hint::black_box;

struct MyStruct<const N: usize>([u32; N]);

impl<const N: usize> Clone for MyStruct<N> {
    fn clone(&self) -> Self {
        black_box(Self(self.0))
    }
}

fn main() {
    let x = Box::new(MyStruct([0u32; 100_000_000]));
    let _ = x.clone();
}

which causes the following on my system:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)

This can be reproduced on rustc stable (1.72.0) and nightly (1.74.0-nightly) with release builds.

This problem is not specific to Clone: if you replace Box::new(MyStruct([0u32; 100_000_000])) with Box::new(black_box(MyStruct([0u32; 100_000_000]))) you will also get a stack overflow at initialization. The workaround is to use Box::new_uninit() (or equivalent code), and in fact, for my CircularBuffer struct, I provide two methods for initialization: CircularBuffer::new(), which returns a CircularBuffer on the stack, and CircularBuffer::boxed(), which returns a Box<CircularBuffer> allocated directly on the heap.

However, the Box<CircularBuffer> (or Box<MyStruct> in the example above) cannot be reliably cloned using the Clone trait, because there is no way to specialize Clone for Box<CircularBuffer> (this may be possible in the future with specialization). The only workaround possible is to use the newtype pattern to wrap Box<CircularBuffer> and provide a custom Clone implementation on the newtype, but I personally think that this is overkill and in my opinion Box<T> should "just work".

This could be possible if the Clone trait offered a new clone_into_uninit() method:

pub trait Clone {
    fn clone(&self) -> Self;
    fn clone_from(&mut self, source: &Self) { ... }
    fn clone_into_uninit(&self, uninit: &mut MaybeUninit<Self>) { ... }
}

If such method existed, then the Clone implementation for Box<T> could become something like this:

impl<T: Clone> Clone for Box<T> {
    fn clone(&self) -> Self {
        let mut uninit = Self::new_uninit();
        (*self).clone_into_uninit(&mut uninit);
        unsafe {
            uninit.assume_init()
        }
    }

    // ...other methods...
}

Developers who want to ensure that cloning happens directly on the heap, without going through the stack, can implement their own clone_into_uninit() for their types and they can be assured that cloning a Box<T> will never perform any cloning on the stack.

Similarly, Default could gain a new into_uninit() method to allow initialization of Box<T> directly on the heap.

steffahn · August 28, 2023, 9:17am

The unsafe annotation seems a bit off. To me it seems the clone_into_uninit method is not unsafe to call, but unsafe to implement. I'm not sure if any good workarounds exist for saying "this trait is unsafe to implement unless the default implementation of so-and-so method is used".

HjVT · August 28, 2023, 9:23am

Perhaps the functionality should be provided by an unsafe trait CloneIntoUninit: Clone instead.

SkiFire13 · August 28, 2023, 11:57am

The stdlib already has an internal specialization trait that applies this optimization to Copy types, see:

github.com

rust-lang/rust/blob/1bd043098e05839afb557bd7a2858cb09a4054ca/library/alloc/src/alloc.rs#L421


      
                      core::panicking::panic_nounwind_fmt(
                          format_args!("memory allocation of {size} bytes failed"),
                          /* force_no_backtrace */ false,
                      )
                  }
              }
          }
          
          /// Specialize clones into pre-allocated, uninitialized memory.
          /// Used by `Box::clone` and `Rc`/`Arc::make_mut`.
          pub(crate) trait WriteCloneIntoRaw: Sized {
              unsafe fn write_clone_into_raw(&self, target: *mut Self);
          }
          
          impl<T: Clone> WriteCloneIntoRaw for T {
              #[inline]
              default unsafe fn write_clone_into_raw(&self, target: *mut Self) {
                  // Having allocated *first* may allow the optimizer to create
                  // the cloned value in-place, skipping the local and move.
                  unsafe { target.write(self.clone()) };
              }

And in fact Box's Clone implementation already looks very similar to the proposed one:

github.com

rust-lang/rust/blob/1bd043098e05839afb557bd7a2858cb09a4054ca/library/alloc/src/boxed.rs#L1293-L1300


      
          fn clone(&self) -> Self {
              // Pre-allocate memory to allow writing the cloned value directly.
              let mut boxed = Self::new_uninit_in(self.1.clone());
              unsafe {
                  (**self).write_clone_into_raw(boxed.as_mut_ptr());
                  boxed.assume_init()
              }
          }

The problem though is how to expose this outside the stdlib. The proposed additional method on Clone looks very wrong because, as @steffahn said, the safety requirement should be on the implementation and not on the caller. Given that in the future trait implementations may omit the unsafe keyword I would consider the proposed extension to Clone unsound.

On the other hand however an extra trait as @HjVT proposed would likely mean exposing specialization, which is also unwanted.

Nemo157 · August 28, 2023, 1:42pm

What this really needs is out-references, to make it fully safe (even if the implementation ends up using unsafe internally in some cases).

fn clone_into(&self, place: &out Self) { ... }

I wonder if there's a library version of out-references that could be used, similar to how ReadBuf is used to provide a safe interface for IO to uninit buffers.

RustyYato · August 28, 2023, 1:58pm

This doesn't guarantee that the implementation writes into the out reference though. So a perfectly safe implementation would be the noop, whuch clearly won't work as intended

Nemo157 · August 28, 2023, 2:00pm

An &out has a requirement that it is written to before leaving scope builtin, a pure library implementation might not be able to but would have to have some way to check and panic at runtime.

steffahn · August 28, 2023, 2:10pm

Hmm…

pub struct UnsafeMethodImpl { _private: () }
/// Safety: `UnsafeMethodImpl` may only be constructed directly in a trait implementation when
/// overriding the safe default implementation of a method with `UnsafeMethodImpl`
/// return type.
/// The defining trait must have defined safety conditions in an `Implementation Safety`
/// section of the method’s documentation, and implementor that is overriding
/// the method and is calling `we_overrode_the_safe_default_method_implementation`
/// has to follow these safety conditions.
/// The trait method and its implementation must be an `unsafe fn` itself
/// and must require in its `Safety` that callers must
/// immediately discard the `UnsafeMethodImpl` token that the method in question returns.
pub unsafe fn we_overrode_the_safe_default_method_implementation() -> UnsafeMethodImpl {
    UnsafeMethodImpl { _private: () }
}
pub trait Clone {
    fn clone(&self) -> Self;
    fn clone_from(&mut self, source: &Self) { ... }
    /// Implementation Safety: Implementors overriding this method
    /// must ensure that `clone_into_uninit` only returns successfully (i.e. without panic)
    /// when the value in `uninit` has been completely initialized to a valid value
    /// of type `Self`.
    ///
    /// Safety: Callers must immediately discard the `UnsafeMethodImpl` token
    /// returned from this method.
    unsafe fn clone_into_uninit(&self, uninit: &mut MaybeUninit<Self>) -> UnsafeMethodImpl {
        let c = self.clone();
        uninit.write(c);
        // Safety: `uninit` has been initialized as required, and we are implementing
        // an `unsafe fn` that requires users to discard the return value.
        unsafe { we_overrode_the_safe_default_method_implementation() }
    }
}

it’s a bit clunky^[1], I’ll admit but maybe it’s sound?

some new types with new API is involved… clone_into_uninit is only unsafe because of its return value otherwise being re-usable in other method implementations… Safety conditions referring to other item’s documentation… ↩︎

SkiFire13 · August 28, 2023, 2:37pm

You can emulate a fn foo(place: &out T) with something like a fn foo(place: Uninit<'a, T>) -> Init<'a, T>, where:

Uninit<'a, T> and Init<'a, T> are invariant over 'a
Init<'a, T> can only be created by writing to a Uninit<'a, T>
you can't manually create a Uninit<'a, T>

I think these 3 conditions should be sufficient to guarantee that if foo returns then place has been written to. I remember seeing a crate that implemented this idea for pinned/inplace initialization but I can't find it anymore.

pitaj · August 28, 2023, 3:37pm

Why not just

    fn clone_into_uninit(&self, uninit: &mut MaybeUninit<Self>) -> &mut Self {
        let c = self.clone();
        uninit.write(c)
    }

CAD97 · August 28, 2023, 4:10pm

In order to use the &mut Self as witness that the MaybeUninit slot was actually initialized^[1], you need to check that the reference is ptr::eq; consider that an overriden impl could be Box::leak(Box::new(self.clone())) or similar to return an unrelated reference. It is extremely easy to forget to do this validation. A similar footgun resulted in changing the unstable feature(read_buf) APIs.

Plus, that still relies on optimization to cut out the stack allocation. I expect OP wants to avoid that reliance, as they have scenarios where it isn't happening in the current Box::clone which is essentially doing that.

If you know the function being called, you can trust it. If it's a trait method, however, you cannot, and need to at least tolerate any possible implementation permitted by the signature. This is where an extension trait is useful, because it can be unsafe to implement (making it sound to trust documented impl safety requirements) and/or blanket implemented and impossible to override. Some sort of final modifier for default-provided trait methods which prevents overriding the default impl would also be sufficient. ↩︎

andrea · August 28, 2023, 4:32pm

Right, I was thinking that one could pass an initialized structure to clone_into_uninit and cause memory leaks (or other resource leaks), but that's not UB. I updated the post to remove the unsafe.

andrea · August 28, 2023, 4:42pm

I think this is an interesting approach. I actually wanted to propose something very similar myself. This would force implementors to carefully think about whether something has been thoroughly initialized.

However I'm not sure if this is strictly necessary: writing to a MaybeUninit already requires unsafe code, and that perhaps is a strong enough barrier to make people stop and think on whether their initialization code is thorough enough.

steffahn · August 28, 2023, 5:00pm

no that’s not quite the case. Reading from MaybeUninit requires unsafe code, but for writing, there’s even an appropriately named (and safe!) method.

andrea · August 28, 2023, 6:11pm

Right, right, sorry. This is why I shouldn't write stuff on the internet too late at night or too early in the morning. In that case a Uninit/Init approach may be helpful to communicate expectations, although I'm not 100% sure it's worth it (which is the reason why I omitted in my original post).

kornel · August 29, 2023, 10:46am

There are many known cases where large Box causes stack overflow, including Box::new, so the problem is not limited to cloning.

I think the best way to fix this would be to improve rustc code generation to always carefully avoid stack and initialize Boxes in place. It's already a special type to the compiler.

HjVT · August 29, 2023, 12:10pm

I don't think that will help? ie in this situation: let b = Box::new(big_structure.clone())
the actual problem is in .clone(), no matter how much more magic you add to Box::new, it won't be able to make clone return by reference instead of by value.

sahnehaeubchen · August 29, 2023, 2:02pm

github.com/rust-lang/rfcs

RFC: Placement by return

rust-lang:master ← PoignardAzur:placement-by-return

opened 12:01PM - 17 Mar 20 UTC

PoignardAzur

+1115 -0

[Rendered](https://github.com/PoignardAzur/placement-by-return/blob/placement-by…-return/text/0000-placement-by-return.md). --- **Glossary:** - **GCE:** [Guaranteed Copy Elision](https://stackoverflow.com/questions/38043319/how-does-guaranteed-copy-elision-work). - **NRVO:** [Named Return Value Optimization](https://shaharmike.com/cpp/rvo/). - **DST:** [Dynamically-Sized Type](https://doc.rust-lang.org/reference/dynamically-sized-types.html).

Guaranteed RVO would solve this as far as I understand.

After re-reading: The guaranteed copy elision would need to be far more powerful that what's proposed in the RFC to solve the problem here.

system · November 27, 2023, 2:02pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Rc and internal mutability libs	22	2485	March 21, 2020
Compiler doesn't know that MaybeUninit<T> and T have the same size libs	3	772	March 27, 2021
Patch testing for 1.27.2 announcements	1	2715	March 25, 2019
Removal of all unstable placement features announcements	21	5948	March 25, 2019
`<[T]>::as_uninit_mut` libs	6	1009	February 28, 2022

Allow cloning into MaybeUninit (to allow cloning boxes directly on the heap)

Related Topics