Pre-RFC: minimal, flexible placement and returning unsized types

Edit: this has been moved to an RFC

This document is also on HackMD. I'll try to keep them in sync.

Pre-RFC: Minimal, flexible placement and returning unsized types

Implement "placement" with no new syntax, by extending the existing capabilities of ordinary return. This involves copying Guaranteed Copy Elision rules pretty much wholesale from C++17, adding functions like fn new_with<F: FnOnce() -> T, T: ?Sized>(f: F) -> Self to Box and fn push_with<F: FnOnce() -> T>(&mut self, f: F) to Vec to allow performing the allocation before evaluating F, providing raw access to the return slot for functions as an unsafe feature, and allowing unsized object to be returned directly by compiling such functions into a special kind of "generator."

Starting with the questions given at the end of the old RFC's mortician's note:

  • Does the new RFC support DSTs? serde and fallible creation? Yes on DSTs. On fallible creation, it punts it into the future section.
  • Does the new RFC have a lot of traits? Is it justified? It introduces no new traits at all.
  • Can the new RFC handle cases where allocation fails? Does this align with wider language plans (if any) for fallible allocation? Yes.
  • are there upcoming/potential language features that could affect the design of the new RFC? e.g. custom allocators, NoMove, HKTs? What would the implications be? Not really. Pin can have a new_with function just like anyone else, custom allocators would happen entirely behind this, true HKT's are probably never going to be added, and associated type constructors aren't going to affect this proposal since the proposal defines no new traits or types that would use them.

Motivation

Rust has a dysfunctional relationship with objects that are large or variable in size. It can accept them as parameters pretty well using references, but creating them is horrible:

  • A function pretty much has to use Vec to create huge arrays, even if the array is fixed size. The way you'd want to do it, Box::new([0; 1_000_000]), will allocate the array on the stack and then copy it into the Box. This same form of copying shows up in tons of API's, like serde's Serialize trait.
  • There's no safe way to create gigantic, singular structs at all. If your 1M array is wrapped somehow, you pretty much have to allocate the memory by hand and transmute.
  • You can't return bare unsized types. You can create them locally, and you can pass them as arguments, but not return them.

As far as existing emplacement proposals go, this one was written with the following requirements in mind:

  • It needs to be possible to wrap it in a safe API. Safe API examples are given for built-in data structures, including a full sketch of the implementation for Box, including exception safety.
  • It needs to support already-idiomatic constructors like fn new() -> GiantStruct { GiantStruct { ... } } Since this proposal is defined in terms of Guaranteed Copy Elision, this is a gimme.
  • It needs to be possible to in-place populate data structures that cannot be written using a single literal expression. The write_return_with intrinsic allows this to be done in an unsafe way. Sketches for APIs built on top of them are also given in the future-possibilities section.
  • It needs to avoid adding egregious overhead in cases where the values being populated are small (in other words, if the value being initialized is the size of a pointer or smaller, it needs to be possible for the compiler to optimize away the outptr). Since this proposal is written in terms of Guaranteed Copy Elision, this is a gimme. The exception of the "weird bypass functions" read_return_with and write_return_with may seem to break this; see the example desugarings here for info on how these functions work when the copy is not actually elided.
  • It needs to solve most of the listed problems with the old proposals. Since this one actually goes the distance and defines when copy elision will kick in, it fixes the biggest problems that the old box literal system had. It is also written in terms of present-day Rust, using impl Trait, and with the current direction of Rust in mind.

Guide-level explanation

If you need to allocate a very large data structure, or a DST, you should prefer using Box::new_with over Box::new. For example:

let boxed_array = Box::new_with(|| [0; 1_000_000]); // instead of Box::new([0; 1_000_000])
let boxed_data_struct = Box::new_with(DataStruct::new); // instead of Box::new(DataStruct::new())

The new_with function will perform the allocation first, then evaluate the closure, placing the result directly within it. The new function, on the other hand, evaluates the argument then performs the allocation and copies the value into it.

A similar function exist for Vec and other data structures, with names like push_with and insert_with.

When writing constructors, you should create large data structures directly on the return line. For example:

// This function will allocate its return value directly in the return slot.
fn good() -> [i32; 1_000_000] {
    [1; 1_000_000]
}
// This function will return a raw slice, and can also be called as `Box::new_with(good_dst)`
fn good_dst() -> [i32] {
    let n = 1_000_000;
    [1; n]
}
// This function will copy the array when it returns.
fn bad() -> [i32; 1_000_000] {
    let mut arr = [0; 1_000_000];
    for i in 0..1_000_000 {
        arr[i] = i;
    }
    arr
}
// This function will compile successfully, but it will allocate the array on the stack,
// which is probably very bad.
fn bad_dst() {
    fn takes_dst(_k: [i32]) {}
    fn returns_dst() -> [i32] { let n = 1_000_000; [1; n] }
    takes_dst(returns_dst())
}

This is guaranteed to see through if expressions, function calls, literals, unsafe blocks, and the return statement. It is not guaranteed to see through variable assignment or break with value.

// "good" functions will write their results directly in the caller-provided slot
fn good() -> Struct {
    if something() { Struct { ... } } else { Struct { ... } }
}
fn good() -> Struct {
    loop { return Struct { ... } }
}
fn good() -> Struct2 {
    Struct2 { member: Struct { ... } }
}
// "bad" functions will not necessarily do that
fn bad() -> Struct {
    loop { break Struct { ... } }
}
fn bad() -> Struct {
    let q = Struct { ... };
    q
}
fn bad() -> Struct2 {
    let q = Struct { ... }
    Struct2 { member: q }
}

In other words, Rust does not currently guarantee Named Return Value Optimization.

In the rustonomicon, it should mention that the write_return_with intrinsic can be used to build a function that's equivalent to bad():

use std::mem::{write_return_with, ReturnSlot};
use std::alloc::Layout;
fn not_bad() -> [i32] {
    unsafe {
        write_return_with(Layout::new::<[i32; 1_000_000]>(), |arr: *mut u8| {
            let arr: &mut [i32] = slice::from_raw_parts_mut(arr, 1_000_000);
            for i in 0..1_000_000 {
                arr[i] = i;
            }
            arr
        })
    }
}

Reference-level explanation

Did you say I can return unsized types?

A function that directly returns an unsized type should be compiled into two functions, essentially as a special kind of generator. Like this, basically:

// sugar
fn my_function() -> str {
    *"hello world"
}
fn function_that_calls_my_function() -> str {
    println!("Hi there!");
    my_function()
}
// desugar my_function
// this is written in order to aid understanding, not because these are good APIs
use std::alloc::Layout;
struct __MyFunction__Internal {
  local_0: &'static str,
}
impl __MyFunction__Internal {
  /* "unsize coroutines" are different from normal generators because their behavior is much more
    restricted. They yield exactly one Layout, and then they are finished by writing to a pointer. */
  unsafe fn start(state: *mut __MyFunction__Internal) -> Layout {
    /* As usual for generators, locals (including anonymous ones) get desugared into member
      variables of an anonymous struct. In this case, the local variable is just a string literal. */
    state.local_0 = "hello world";
    Layout::for_value("hello world")
  }
  unsafe fn finish(state: *mut __MyFunction__Internal, slot: *mut u8) -> &mut str {
    ptr::copy(state.local_0, slot.as_mut_ptr(), mem::size_of_val(state.local_0));
    /* Notice how I also have to return a properly-built fat pointer?
       This isn't very important for string slices, but it's mandatory for trait objects, because I need to
       supply a vtable. */
    str::from_raw_parts_mut(slot, state.local_0.len())
  }
}
// desugar function_that_calls_my_function
use std::alloc::Layout;
struct __FunctionThatCallsMyFunction__Internal {
  delegate: __MyFunction__Internal,
}
impl __FunctionThatCallsMyFunction__Internal {
  unsafe fn start(state: *mut __FunctionThatCallsMyFunction__Internal ) -> Layout {
    /* first, run everything leading up to the return */
    println!("Hi, there!");
    /* then, desugar the return (in this case, by delegating */
    __MyFunction__Internal::start(&mut state.delegate)
  }
  unsafe fn finish(state: *mut __FunctionThatCallsMyFunction__Internal, slot: *mut u8) -> &mut str {
    __MyFunction__Internal::start(&mut state.delegate, slot)
  }
}

This interface ends up putting some pretty harsh limitations on what functions that return unsized types can do. An unsized type-returning function can always return the following kinds of expression:

  • Constants and dereferencing constants (as shown above). These are desugared by yielding the layout of the literal and returning the literal by copying it.
  • Directly returning the value of another function that also returns the same unsized type. These are desugared by forwarding, as shown above with function_that_calls_my_function.
  • RFC 1909 variable-length array literals. These are desugared by yielding the length variable, and returned by writing the value repeatedly though ptr offsets and ptr write.
  • Unsized coersions. These are desugared by storing the sized type as function state, yielding the layout of the sized type, and returning by copying.
  • Blocks, unsafe blocks, and branches that have acceptable expressions in their tail position.

As is typical for generators, these functions may need to be desugared into simple "state machines" if they return branches or have more than one exit point.

fn with_branch() -> Option<Error> {
    if coin_flip() {
        None as Option<!>
    } else {
        Some("tails")
    }
}
fn with_multiple_exit_points() -> [i32] {
    while keep_going() {
        if got_cancelation() {
            return *&[]; // returning constant
        }
    }
    let n = 100;
    [1; n] // returning variable-length-array expression
}
// desugar with_branch
// this is written in order to aid understanding, not because these are good APIs
use std::alloc::Layout;
enum __WithBranch__Internal {
  S0(Option<!>),
  S1(Option<&'static str>),
}
impl __WithBranch__Internal {
  unsafe fn start(state: *mut __WithBranch__Internal) -> Layout {
    if coin_flip() {
        *state = __WithBranch__Internal::S0(None);
        Layout::new::<Option<!>>()
    } else {
        *state = __WithBranch__Internal::S1(Some("tails"));
        Layout::new::<Option<&'static str>>()
    }
  }
  unsafe fn finish(state: *mut __MyFunction__Internal, slot: *mut u8) -> &mut [i32] {
    match *state {
        __WithBranch__Internal::S0(value) => {
            ptr::copy(literal, slot, mem::size_of_value(value));
            slice::from_raw_parts_mut(slot, value.len())
        }
        __WithBranch_Internal::S1(value) => {
            ptr::copy(literal, slot, mem::size_of_value(value));
            slice::from_raw_parts_mut(slot, value.len())
        }
    }
  }
}
// desugar with_multiple_exit_points
enum __WithMultipleExitPoints__Internal {
  S0(&[i32]),
  S1(i32, usize),
}
impl __WithMultipleExitPoints__Internal {
  unsafe fn start(state: *mut __WithMultipleExitPoints__Internal) -> Layout {
    while keep_going() {
        if got_cancelation() {
            *state = __WithMultipleExitPoints__Internal::S0(&[]);
            return Layout::for_value(&[]);
        }
    }
    let n = 100;
    *state = __WithMultipleExitPoints__Internal::S1(1, n);
    Layout::from_size_align_unchecked(n * mem::size_of::<i32>(), mem::align_of::<i32>())
  }
  unsafe fn finish(state: *mut __WithMultipleExitPoints__Internal, slot: *mut u8) -> &mut Option<Error> {
    match *state {
        __WithMultipleExitPoints__Internal::S0(literal) => {
            ptr::copy(literal, slot, mem::size_of_value(literal));
            /* as you can see, I need to extract the vtable pointer */
            let t: TraitObject = mem::transmute(&mut literal as &mut Option<Error>);
            mem::transmute(TraitObject { data: slot, ..t })
        }
        __WithMultipleExitPoints_Internal::S1(value, n) => {
            for i in 0..n { ptr::write(slot.offset(i), value) };
            let t: TraitObject = mem::transmute(&mut literal as &mut Option<Error>);
            mem::transmute(TraitObject { data: slot, ..t })
        }
    }
  }
}

Important point: returning unsized variables is not allowed.

// fn invalid() -> [i32] {
//     let n = 100;
//     let k = [1; n];
//     k // ERROR: cannot return unsized variable
// }

This is because there's no way to break all such functions in half the way we need to. Where would we put k between the calls to start() and the call to finish()? We can't just put it in the struct, because we need to know how big a generator's stack frames will be in order to generate its state machine struct.

On the other hand, a function that returns an unsized value can have its return value assigned to a local variable (the compiler will start() the function to get a layout for it, then alloca() the space, then finish() the function into that space). This is a natural thing to support, and it's great for trait objects, but it's probably a horrible idea to do it with dynamically-sized slices, and should be linted against.

fn valid_but_terrible() {
    let n: [i32] = returns_slice();
    takes_slice(n);
}

How this works with sized types

Any function that returns a sized type can be trivially adapted to the "unsized return generator" interface of start() and finish(), by having start return a constant and have finish do all the work.

fn return_a_sized_value() -> i32 {
    println!("got random value");
    4 // determined by a fair dice roll
}
// desugar return_a_sized_value
// this is written in order to aid understanding, not because these are good APIs
use std::alloc::Layout;
struct __ReturnASizedValue__Internal;
impl __ReturnASizedValue__Internal {
  unsafe fn start(state: *mut __WithBranch__Internal) -> Layout {
    // there's only one possible layout for sized types; that's the definition of being "sized"
    Layout::new::<i32>()
  }
  unsafe fn finish(state: *mut __MyFunction__Internal, slot: *mut u8) -> &mut i32 {
    // just invoke the function and copy its return value into the slot
    // this mostly gets the behavior we want, but see below for info about copy elision
    ptr::copy(&return_a_sized_value(), slot, mem::size_of::<i32>());
    slot as *mut i32 as &mut i32
  }
}

This is how it would conceptually work whenever a facility designed to handle placement is invoked on a function that returns a sized value. This is not how it should be implemented, though, because we don't want a bunch of unnecessary copying.

Absolutely minimum viable copy elision

To make sure this functionality is useful for sized types, we should probably also guarantee some amount of copy elision:

  • Directly returning the result of another function that also returns the same type.
  • Blocks, unsafe blocks, and branches that have acceptable expressions in their tail position.
  • Struct and array literals.

The first two cases are chosen as "must-have" copy elision because they allow functions that use the unsafe methods to be composed with other functions without introducing overhead. The last rule is added because it's trivial to guarantee, and because the line between "tuple struct literal" and "function call" is blurry anyway.

Guaranteed Copy Elision only kicks in when a GCE-applicable expression is directly returned from a function, either by being the function's last expression or by being the expression part of a return statement.

The unsafe method of writing return values

The std::mem module exposes two symmetrical intrinsics that can allow you to achieve Copy Elision in any circumstance.

write_return_with

pub unsafe fn write_return_with<T, F: for<'a> FnOnce(*mut u8)>(f: F) -> T {
    write_unsized_return_with(Layout::new::<T>(), |p| {f(p); &mut *(p as *mut T)})
}
extern "intrinsic" pub unsafe fn write_unsized_return_with<T: ?Sized, F: for<'a> FnOnce(*mut u8) -> &mut T>(layout: Layout, f: F) -> T;

Directly access the caller-provided return slot.

The return slot is an uninitialized memory space provided by a function's caller to write the return value, and implicitly passed as a pointer. A function which directly returns the result of evaluating another function, like fn foo() -> ReturnedValue { bar() }, may simply pass the slot along, thus avoiding expensive copies. In cases where it is impossible to implement a function by directly returning the value of a function call or a primitive expression, directly accessing the return slot and writing it directly may be necessary.

Return slots are not always used; small values like pointers, numbers, and booleans may be returned by filling a register instead. In this case, write_return_with will implicitly create a "return slot", pass a pointer to it to f, and then copy the return value from the "return slot" to the register.

The pointer returned by write_unsized_return_with's callback should be the same as the one provided to it, along any necessary unsize information (such as the slice's length or the trait object's vtable pointer). Additionally, when using write_unsized_return_with for sized types, the provided layout must be exactly the same as the one produced by Layout::new<T>(). These two values must be correct even when they're redundant. A function that returns a slice may return a slice that is smaller than the requested allocation Layout, in case it is unable to predict the amount of data that will be available to it, but when producing a trait object, it must know exactly the right size for its allocation.

Panicking

f is allowed to panic. If it panics, the underlying memory should still be freed (the built-in collections are well-behaved), but the return slot implementation will assume that the allocated memory was not initialized, and shall not call T's destructor. If f allocates additional resources itself before panicking, it is responsible for freeing it.

If allocation fails, write_return_with may panic without calling f. The return slot may also be pre-allocated, resulting in the allocation failure before the call to write_return_with is reached.

IMPORTANT IMPLEMENTATION NOTE: Because of the way unsized return generators are codegenned as generators, it would be possible to tell that write_unsized_return_with wasn't actually panicking by wrapping its invocation in catch_panic. To ensure the user cannot do this, the closure passed to catch_panic must return a sized type; we still technically won't be unwinding through their stack frames, but we will be calling the drop functions with is_panicking set to true, so they won't be able to tell. Additionally, of course, the return slot for sized types is always pre-allocated, so this function will never panic in that case.

Example: raw_as_bytes_with

mod str {
  /// This is a function adapter. Usage:
  ///
  ///     fn get_str() -> str;
  ///     let get_bytes = str::raw_as_bytes_with(get_str);
  ///     get_bytes()
  pub fn raw_as_bytes_with<Args, F: FnOnce<Args, Output=str>>(f: F) -> impl FnOnce<Args, Output=[u8]> {
    unsafe {
      struct ConvertFn<F>(F);
      impl<Args, F: FnOnce<Args, Output=str>> FnOnce<Args> for ConvertFn<F> {
        type Output = [u8];
        fn call(self, a: Args) -> [u8] {
          let finish = read_unsized_return_with(|| self.0.call(a));
          write_unsized_return_with(
            finish.layout(),
            |slot: &mut MaybeUninit<[u8]>| finish.finish(MaybeUninit::from_mut_ptr(slot.as_mut_ptr() as *mut str))) as *mut str as *mut [u8] as &mut [u8]
        }
      }
    }
  }
}

read_return_with

pub unsafe fn read_return_with<'a, T, F: FnOnce() -> T>(f: F, slot: &mut MaybeUninit<T>) {
    let finish = read_unsized_return_with(f);
    debug_assert!(finish.layout() = Layout::new::<T>());
    finish.finish(slot);
}
#[lang(read_unsized_return_with_finish)]
pub trait ReadUnsizedReturnWithFinish<T: ?Sized> {
    pub fn finish(self, &mut MaybeUninit<T>);
    pub fn layout(&self) -> Layout;
}
extern "intrinsic" pub unsafe fn read_unsized_return_with<'a, T: ?Sized, F: FnOnce() -> T>(f: F) -> impl ReadUnsizedReturnWithFinish<T>;

Directly supply a return slot. See [write_return_with] for information on what return slots are.

Example: Box::new_with

struct BoxUninit<T: ?Sized> {
    p: Option<NonNull<MaybeUnint<T>>>,
}
impl<T: ?Sized> Drop for BoxUninit<T> {
    fn drop(&mut self) {
        if let Some(p) = self.p {
            System.dealloc(p.as_mut_ptr() as *mut u8, Layout::for_value(&*p);
        }
    }
}
struct Box<T: ?Sized> {
    p: NonNull<T>,
}
impl<T: ?Sized> Box<T> {
    fn new_with<F: FnOnce() -> T>(f: F) -> Self {
        unsafe {
            let mut uninit = BoxUninit { p: None };
            let state = read_unsized_return_with(f);
            let p = NonNull::from_mut_ptr(GlobalAlloc.alloc(finish.layout()));
            uninit.p = Some(p);
            state.finish(p.as_mut_ptr() as *mut MaybeUninit<T> as &mut MaybeUninit<T>);
            forget(uninit);
            Box { p }
        }
    }
}
impl<T: ?Sized> Drop for Box<T> {
    fn drop(&mut self) {
        let layout = Layout::for_value(&*p);
        drop_in_place(&mut *p);
        System.dealloc(p.as_mut_ptr() as *mut u8, layout);
    }
}

Example: Vec::extend_from_raw_slice_with

Copy-and-paste this example to create String::extend_from_raw_str_with.

impl<T> Vec<T> {
    pub fn extend_from_raw_slice_with<F: FnOnce() -> [T]>(&mut self, f: F) {
        let finish = read_unsized_return_with(f);
        let layout = finish.layout();
        debug_assert_eq!(layout.size() % size_of::<T>(), 0);
        debug_assert_eq!(layout.align(), align_of::<T>());
        let count = layout.size() / size_of::<T>();
        self.0.reserve(count);
        let p = ((&mut self[..]) as *mut [u8]).offset(self.len());
        // this slice may be smaller than the given allocation, as described above
        let slice = finish.finish(p as *mut MaybeUninit<[u8]> as &mut MaybeUninit<[u8]>);
        debug_assert!(slice.len() <= count);
        self.set_len(self.len() + slice.len());
    }
}

Lint: incorrect use of write_return_with

All usage of write_return_with should call it directly in their return clause, or in an unsafe block directly in their return clause.

fn good() -> BigDataStructure {
    unsafe {
        write_return_with(Layout::new::<BigDataStructure>(), |slot| {
            populate_big_data_structure(slot);
        })
    }
}
fn bad() -> BigDataStructure {
    unsafe {
        let k = write_return_with(Layout::new::<BigDataStructure>(), |slot| {
            populate_big_data_structure(slot);
        });
        k
    }
}
fn also_bad_even_though_it_technically_works() -> BigDataStructure {
    unsafe {
        if coin_flip() {
            write_return_with(Layout::new::<BigDataStructure>(), |slot| {
                populate_big_data_structure(slot);
            })
        } else {
            unimplemented!()
        }
    }
}

Assigning the result of write_return_with to a local variable, like in bad(), is essentially a no-op. The data structure is being emplaced into k, and then it's being immediately copied on return. If you actually intend to manually initialize a local value, just use MaybeUninit like a normal person.

Since it's always possible to write a function with write_return_with in the return clause of a function or an unsafe block which is itself in the return clause of a function (if you need a conditional somewhere, you can just put the conditional inside the closure), the lint should just require you to do that.

How do the return slot functions work when the copy is not actually elided?

When the copy is not elided (which is only ever the case for values with a statically known size), they simply use temporaries for their implementation. The return slot functions still have to "work" when the copy is not elided, so that they can be used in generic contexts.

// Imagine these functions being used only when T is sized and less than one pointer.
#[inline(always)]
unsafe fn write_return_with<T, F: FnOnce(*mut u8)>(f: F) -> T {
    let slot: MaybeUninit<T> = MaybeUninit::empty();
    f(&mut slot as *mut MaybeUninit<T> as *mut u8);
    slot.take()
}
#[inline(always)]
unsafe fn read_return_with<'a, T, F: FnOnce() -> T>(f: F, slot: *mut u8) {
    let value = f();
    ptr::write(&value, slot.get() as *mut T);
}

New functions added to existing types

impl<T: ?Sized> Box<T> {
    fn new_with<F: FnOnce() -> T>(f: F) -> Self;
}
impl<T> Vec<T> {
    fn from_raw_slice_with<F: FnOnce() -> [T]>(f: F);
    fn push_with<F: FnOnce() -> T>(&mut self, f: F);
    fn insert_with<F: FnOnce() -> T>(&mut self, index: usize, f: F);
    fn extend_from_raw_slice_with<F: FnOnce() -> [T]>(&mut self, f: F);
}
impl <K: Eq + Hash, V, S: BuildHasher> HashMap<K, V, S> {
    /// Unlike the regular `insert` function, this one returns a `&mut` to the new one,
    /// not an optional old one.
    /// This is because the other one can't be returned without copying the old value
    /// from the map to the return slot, which is wasteful for large objects.
    fn insert_with<'a, FV: FnOnce() -> V>(&'a mut self, k: K, fv: FV) -> &'a mut V;
}
impl <K: Eq + Hash, V, S: BuildHasher> hash_map::OccupiedEntry<K, V, S> {
    fn insert_with<FV: FnOnce() -> V>(&mut self, k: K, fv: FV) -> V;
}
impl <K: Eq + Hash, V, S: BuildHasher> hash_map::VacantEntry<K, V, S> {
    fn insert_with<'a, FV: FnOnce() -> V>(&'a mut self, k: K, fv: FV) -> &'a mut V;
}
impl String {
    fn from_raw_str_with<F: FnOnce() -> str>(f: F);
    fn extend_from_raw_str_with<F: FnOnce() -> str>(&mut self, f: F);
}
mod str {
    // Yes, this is a generic function adapter, up to and including use of the unstable Fn* trait.
    // Use it like raw_as_bytes_with(fn_that_returns_raw_str)
    fn raw_as_bytes_with<Args, F: FnOnce<Args, Output=str>>(f: F) -> impl FnOnce<Args, Output=[u8]>;
}

Drawbacks

The biggest issue with Guaranteed Copy Elision is that it's actually kind of hard to specify it. The abstract machine, after all, doesn't specify the amount of memory that a function call occupies; that's part of the ABI. So how do you phrase GCE in an ABI-agnostic way? The C++ answer is "a convoluted combination of requirements about when the address of an object may change and when a move constructor may be called". The specification of GCE in the to-be-written Rust specification will probably be just as bad, since while it's pretty obvious how it works for unsized types (which cannot be moved without invoking an allocator, and we can certainly specify when that's allowed), we also want to guarantee it for sized types.

There have also been people recommending a more do-what-I-mean approach where Box::new(f()) is guaranteed to perform copy elision. That would induce the absolute minimal churn, though how you'd handle intermediate functions like fn foo(t: T) { box::new(t) } is beyond me.

The third drawback, and I personally think this is the worst drawback, is that it's invisible. This means that when a user accidentally writes code that is performs copies, it isn't always obvious that they messed up. There isn't any way to get existing code to achieve GCE without making it invisible, and I wanted to avoid churn in everyone's new() functions, as described in rationale-and-alternatives. Presumably, it can be solved using optional lints.

Another major drawback is that it doesn't compose well with pattern matching (or ?). Imagine you have a function fn foo() -> Result<Struct, Error> and you want to do something like Box::new(foo()?), but you want to directly place it. This is impossible; not just because with closures Box::new_with(|| foo()?) won't do what you want, but any system where the function being called writes its entire result through a pointer (like the old Placer proposal) cannot have the function being called write part of its result in one place (the Ok value should be written into the Box<Struct>) while other parts are written in other places (the discriminator should be written to a slot on the stack, since there isn't room for it). You can only assemble a Box<Result<Struct, Error>>, or abandon Result entirely and use an error handler callback or a panic/catch.

Also, like all placement proposals, it involves adding a new API surface area to most of the built-in data structures. Since there are known problems with how this proposal works with error handling, the GCE-based version may end up being replaced with an approach that does.

Additionally, this proposal deliberately does not implement NRVO. This means people will end up writing stilted code, or just using the unsafe bypass functions, instead of doing what's most readable.

Rationale and alternatives

Why not use an alternative ABI for returning Result?

The biggest drawback to this proposal, like most placement proposals, is that it works poorly with Result. You can emplace a function returning Result<T> into a Box<Result<T>>, and you cannot emplace it into a Box<T>. Making this work would require the inner payload to be written to the box, while the tag is written to somewhere on the stack.

This proposal does not define an alternative ABI for results because it would be work poorly with read_return_with and write_return_with. Both of these functions expose the return slot as a raw pointer, which means the return slot has to be in a continuous region of memory. This proposal does not, strictly speaking, lock out alternative ABIs for Result, but it does impose copying whenever such ABIs are used along with the return_with functions. These functions have to work with results, because they have to work in generic contexts.

Using an alternative ABI remains practical as a solution for the "peanut butter" conditional cost (the return_with functions will be rarely used in most Rust code, especially when the code is just propogating error flags), but the way these functions work precludes using an alternative ABI as a solution for placement.

Vs older emplacement proposals

The previous emplacement RFC was rejected for the following reasons, most of which this new RFC addresses:

  • The implementation does not fulfil the design goals

    • Place an object at a specific address

      This is literally what read_return_with does.

    • Allocate objects in arenas (references the original C++ goals)

      This is what new_with and related functions achieve.

    • Be competitive with the implementation in C++

      Passing a closure is essentially the same thing as passing an initializer list, so it should have the same performance as C++ emplacement.

  • The functionality of placement is unpredictable

    This is probably the least-well-addressed concern. This RFC spends a lot of its text spelling out when GCE can kick in, which means it should be possible for a coder (or a lint) to tell whether a value is being copied or not, but it's still invisible.

    The other way it tries to address predictability is by offering the explicit-mode write_return_with intrinsic. Unfortunately, it's unsafe.

  • Specific unresolved questions

    • make placement work with fallible creation

      This RFC honestly has terrible support for this, mostly because it's painted itself into a corner. The sized-value-returning functions all have to be emplaceable, and when they return a Result, they return it by writing it directly through a pointer all at once. Making placement work really well with fallible creation rould require the Result to be split up, with the discriminant going into a stack frame or something, while the payload goes into the allocated area.

    • trait design

      The ReadUnsizedReturnWithFinish trait will probably be somewhat controversial. It is an unsafe abstraction, but that's not a good excuse for a poor design. OTOH, I don't know of a better design to use.

      For providing a way to allocate unsized types by return, I don't think its design could be any better: you're passing a layout, and getting a pointer that you can fill with your data.

      The sized type story is the annoying one, since we're using an abstraction that pretends you're allocating space at the callee-time when you really allocated the space before it was even called and the compiler has to generate a fake implementation just to keep up the illusion. The only-for-sized-types wrapper functions tries to paper over that, so it at least isn't requiring write_return_slot_with users to provide a Layout that only has one valid value anyway.

    • support for DST/unsized structs

      Check!

  • More speculative unresolved questions include:

    • better trait design with in the context of future language features

    I suppose the read/write return slot with functions would probably be affected by future proposals for custom unsized types. Since they use fat pointers, it should work no matter what they are, but there might eventually be a better way to handle that.

    • interaction between custom allocators and placement

      This interface should support any allocator, since it uses a "supply your own pointer" philosophy for read_return_with users like collections.

Vs other possible tweaks to this RFC

The design for this proposal is meant to allow already-idiomatic "constructors" like this one to be allocated directly into a container. Any proposal that wishes to pull that off has to involve GCE, including more elaborate ones like RFC-8089. The choice of modeling that part of the proposal after existing parts of C++17 was made for familiarity and ease of implementation. Clang already does this stuff in LLVM, so we know it's possible.

The weird bypass functions, write_return_with and read_return_with, both accept closures because the only alternatives I could think of involved even weirder calling convention hacks, even weirder type system hacks (fn get_return_slot<T>() -> *mut T, for example, would be required to ensure that T is the same as the return type of the current function), or new language features like Go's nameable return slot.

The idea of compiling functions that return unsized types into pseudo-generators came from the Rust Discord server. Thanks @rpjohnst!

This RFC supports returning DSTs for two reasons:

  • It allows emplacing dynamic byte blobs, which is really important for use cases like Mio and Serde.
  • It is one of the main requirements set out in the older RFC's mortician's note.

The specific design for DST returning is, of course, optimized for emplacement, and the main use case is for emplacing big byte blobs. It supports returning trait objects as well, but that's not really what it's for, and it kind of shows. A proposal using implicit boxing would be more useful for such a case, though at that point it seems unnecessary (if you're given a function that returns a bare trait object, you can call it with Box::new_with and get on with your life).

Prior art

Any place where my proposal is ambiguous? Let me know and I'll try to make it the same as C++17 Guaranteed Copy Elision.

It was after I came up with the idea that I realized write_return_with is basically a feature in the Go language, but Go has dedicated syntax for it and zero-initializes the return slot. I don't know of any prior art for read_return_with; it's not really all that similar to "placement new", even though it's arguably achieving the same goal.

The _with suffix is based on functions like get_or_insert_with. They're basically filling the same role as C++ emplace_ methods.

Unresolved questions

Commence the bikeshedding for alternative names and designs for the new functions:

  • write_return_with could be named get_return_slot, and read_return_with could be named put_return_slot.
    • Or we could use the existing names, but with "slot" in them, like write_return_slot_with.
    • I personally oppose having "slot" in the function names, because for small values that get returned in registers, there isn't actually a return slot anywhere.
    • read_return_with seems like a particularly bad name, since it writes to the supplied pointer.
  • What happens if a read_return_with user simply doesn't call finish()? It would be as if it had panicked, but the destructors would be called with is_panicking set to false.
    • This is the best way to support non-panicking fallible allocation, so it should probably be allowed. Existing allowed functions cannot experience this problem, since they return sized types, sized types have a no-op start(), and do all of their work in finish(), they won't experience breakage. Additionally, this sort of happy nonsense is already possible in async functions and generators, so most data types will already be able to handle this. Just don't return an unsized type if you don't want to act like a stackless coroutine.
  • Right now, the API exclusively uses raw pointers, for syntactic simplicity. Maybe it should use MaybeUninit?
  • NRVO is not impossible. In fact, since the implementation of RVO is almost certainly going to be based on MIR or LLVM IR, it's almost impossible not to provide NRVO in cases where the "named" variant desugars to exactly the same MIR code as the "unnamed" variant that this proposal guarantees RVO for. How much do we want to guarantee, vs just "accidentally providing it" because of the way the compiler is implemented?

Future possibilities

Additional lints

In an attempt to directly address the problem of "it's too implicit", an HIR lint might be used to detect functions that are copying large amounts of data around. Deciding the cutoffs for this thing sounds like a mess, and it should probably be off by default for just that reason. Worse yet, we're stuck deciding when to warn on a struct literal where the struct itself gets returned in-place, but none of its components do. Honestly, the more I think about it, the more it seems like the "big copy" lint is just a gigantic quagmire of edge cases.

Additionally, a few much-simpler lints can push users in the direction of getting GCE. For example, since break-with-value isn't GCE'ed but return short-circuiting is, a lint should recommend returning from top-level loops instead of breaking from them. Similarly, a lint should recommend assigning struct members inline instead of going through local variables.

Safe abstractions on top of write_return_with

While this RFC specifies a bunch of cases where existing containers should incorporate read_unsized_return_with to allow in-place construction of container members, it doesn't specify any built-in functions that should use write_(unsized_)return_with exclusively.

Functions which return potentially-large data structures that they construct will probably wind up using it. For example, std::mem::zeroed can be implemented like this:

unsafe fn zeroed<T>() -> T {
    write_return_with(|slot: *mut u8| {
        for i in 0 .. size_of::<T>() {
            *slot.offset(i) = 0;
        }
    });
}

Once type-level integers are provided, a function for constructing arrays in-place would also be possible:

fn fill_array<T, F: Fn() -> T, const N: usize>(f: F) -> [T; N] {
    unsafe {
        write_return_with(|slot: *mut u8| {
            let start = slot as *mut T;
            let filled = Filled(start, start);
            for _ in 0 .. N {
                read_return_with(&f, filled.1 as *mut u8);
                filled.1 = filled.1.offset(1);
            }
            forget(filled);
        });
    }
}
// This struct is used to take care of freeing the already-created items
// in case `f` panics in the middle of filling the array.
struct Filled<T>(*mut T, *mut T);
impl<T> Drop for Filled<T> {
    fn drop(&mut self) {
        while self.0 < self.1 {
            drop_in_place(self.0);
            self.0 = self.0.offset(1);
        }
    }
}

Integration with futures, streams, serde, and other I/O stuff

It's an annoying fact that this RFC does not compose terribly well with Result, Option, or other pattern-matching constructs, specifically because it's not possible to write the value and the discriminator to separate parts of memory (ideally, the error flags would be written to a stack variable or register somewhere, while the data would be written directly to the heap- or kernel-backed buffer somewhere). Future and Stream, in particular, wraps the result of polling in the Poll enum, so while it's certainly possible to allocate such a result into a Box<Poll<T>> without copying, its not possible to get a Box<T> without copying given the existing futures API.

Mapping doesn't work, either, because it will pass the payload of a future as a parameter, which means that futures have to allocate storage for their contents. Sized types work as usual, and Future<[u8]> would wind up allocating stack space for the byte blob in order to pass it to the mapping function as a move-ref, as described in RFC 1901 "unsized rvalues".

The only real option for handling zero-copy data, in any of these cases, is to build these things with errors treated as side-channels. None of this is hard, but it is annoying and not very dataflow-y. It's largely inherent to the goal of having zero-copy I/O while the type system doesn't support keeping pattern-matching semantics orthogonal to memory representation.

trait Read {
    /// Read into a raw slice. This allows the reader to determine the proper size to read, unlike the
    /// other `read` function, which allow the caller to do so.
    ///
    /// # Example
    ///
    ///     fn read_to_end<R: Read>(r: mut Read, v: &mut Vec<u8>) -> Result<(), Error> {
    ///     let mut prev_len = v.len();
    ///     let mut err = None;
    ///     v.extend_from_raw_slice_with(|| reader.read_to_raw_slice(&mut err));
    ///     while err.is_none() && v.len() != prev_len {
    ///         prev_len = v.len();
    ///         v.extend_from_raw_slice_with(|| reader.read_to_raw_slice(&mut err));
    ///     }
    ///     if let Some(err) { Err(err) } else { Ok(()) }
    ///     }
    fn read_to_raw_slice(&mut self, err: &mut Option<Error>) -> [u8] {
        unsafe {
            mem::write_unsized_return_with(
                // This function should be overloaded with better default buffer sizes.
                // For example, BufReader can just use the size of its own buffer,
                // and File can use fstat.
                Layout::from_size_align_unchecked(super::DEFAULT_BUF_SIZE, 1),
                |slot: *mut u8| {
                    let slice = slice::from_raw_parts(slot, super::DEFAULT_BUF_SIZE);
                    match self.read(slice) {
                        Ok(count) => {
                            assert!(count <= super::DEFAULT_BUF_SIZE),
                            slice[..count]
                        }
                        Err(e) => {
                            *err = e;
                            []
                        }
                    }
                }
            )
        }
    }
}

The default implementation is kind of dumb, but a properly specialized implementation would be able to know how big its buffer should be instead of requiring the caller to guess (or use something like fstat to communicate to the caller manually), and allowing you to read directly into buffers without those buffers being able to give you an &mut [u8], like the internal buffers maintained by Serde or Tokio. Additionally, because the Read implementation can request a buffer, scribble all over it, and then return an error anyhow, it's much better doing it this way than by returning a Result anyway.

Presumably, the asynchronous systems could yield non-blocking Read implementations as streams or futures, but, based on previous discussion of what a good I/O abstraction would be, we probably want to avoid using a get-a-byte abstraction for everything.

/// A non-blocking, non-copying, unbounded iterator of values.
/// This trait should be used instead of Iterator for I/O backed streams,
/// since it can report error values separately from result values,
/// and it can be used with async functions and futures-based reactors.
trait Stream {
    type Item;
    type Error;
    /// Fetch the next batch of values.
    ///
    /// This function has six possible results:
    ///  * returns an empty slice, result is `Async::Pending` means the async operation is in progress
    ///  * returns an empty slice, result is `Async::Ready(None)` means the stream has terminated and `poll_next` should never be called again
    ///  * returns an empty slice, result is `Async::Ready(Some(_))` means there was a fatal error
    ///  * returns a non-empty slice, result is `Async::Pending` is a violation of the API contract and should lead to a panic
    ///  * returns a non-empty slice, result is `Async::Ready(None)` means that the underlying async operation is done, and the caller should call `poll_next` again immediately to get the rest of the data
    ///  * returns a non-empty slice, result is `Async::Ready(Some(_))` means that a fatal error occurred and the results should be ignored and dropped without further processing
    fn poll_next(&mut self, cx: &mut Context, result: &mut Async<Option<Error>>) -> [Item];
}

The requirements here are the same as Read:

  • You need to be able to hand more than one value into some recipient, at once. Byte-at-a-time reading is not acceptable.
  • You need to be able to implement the API using direct I/O, writing the result directly to wherever the consumer wants it written. Reading a Vec<> is not acceptable, though it seems easy enough to write an adapter that will convert to one. The goal here is to request a page and DMA directly into it (in this case, the implementation will probably have to use weird special-cases to fall back to software I/O if the slice isn't aligned).
  • We want the same API to be usable for types other than u8.
  • We need to be able to request our output buffer, scribble all over it, and then yield an error.
  • It needs to allow infinite streams, so just returning a single raw slice and calling it done won't be enough.

That last point isn't so bad if you're just populating a Vec or other data structure that allows you to just append forever, but many abstractions, like Serde, would actually have to directly support Streams themselves.

trait Serializer {
  /// This function uses an intermediate buffer by default,
  /// but third-party implementations can use a zero-copy implementation.
  /// The produced format is identical to using `serialize_bytes`.
  async fn serialize_byte_stream<S: Stream<Item=u8>>(self, stream: S) -> Result<Self::Ok, Self::Error> {
    let mut v = Vec::new();
    let mut e = Async::Ready(None);
    let mut prev_len = 1;
    while prev_len != v.len() {
      prev_len = v.len();
      // completely hand-waving the right way to await a stream,
      // by basically asking the stream for a Future<Output=()> and awaiting on it.
      await stream.unit();
      v.extend_from_raw_slice_with(|| stream.poll_next(&mut e));
    }
    if let Async::Ready(Some(e)) = e { return Err(e.into()); }
    self.serialize_bytes(&v[..])
  }
}
8 Likes

I came up with a perhaps-similar ABI for DST returns a while back: https://github.com/rust-lang/rust/issues/48055#issuecomment-415980600.

I think mine is easier to implement, however? Sadly, I don’t have the time right now to compare them, and I haven’t thought about my own proposal for a while.

1 Like

I’m … torn about your comment over there. You yourself listed a big problem with DST-return-through-continuation-passing: you either have to implicitly box (which means it won’t work in no_std) or you have to deal with a call stack that always grows (which is obviously unacceptable).

Another problem with implicit boxing is that it doesn’t work with emplacement. It seems like your purpose in writing that comment was returning dyn Trait objects directly, while my main purpose in this pre-RFC was returning raw [u8] slices in order to emplace massive byte blobs that might not even fit in the data stack.

On the other hand, CPS is an academically-sound transformation that allows normal, unconstrained, non-generator-approved code to work. Implicit boxing just works, while compiling into a generator often doesn’t. That kind of thing is a big deal.

We might let the user decide this, using an atte like #[must_use]. We can have #[must_rvo] attribute which is applicable to functions, and issues a warning if rvo doesn’t occur. Library authors should also be able to apply this attr to closure parameters.

3 Likes

I haven't fully read the pre-RFC, but I've thought about this idea a bunch, and it really appeals to me – despite the limitations, such as...

It's not impossible to make it compose.

The base proposal is essentially defining a special ABI for returning large structs from a function. Well, there's no rule that you can't define a special ABI for returning a Result from a function, which is entirely different from the standard in-memory representation of Result. It could desugar like this (only if the returned type is large):

fn foo() -> Result<BigStruct, SomeError>

->

fn foo(ok_out: *mut BigStruct, err_out: *mut SomeError) -> bool

where the bool return value would indicate the variant. It wouldn't be necessary to special-case Result; a similar transformation could be applied generically to any enum.

Then, to the list of "operations that preserve placement", you can add constructing a variant of an enum: for instance, return Ok(foo) would desugar to *ok_out = foo; return true;, where *ok_out = foo can itself be subject to copy elision.

Calling another function with the same return type would also be okay: return bar() would desugar to return bar(ok_out, err_out) (same idea as the base proposal, but with multiple output parameters).

However... there is one significant limitation to this scheme. If you call a method on the Result, the &self parameter would have to point to a full Result object, and there is no generic way to split it up as part of the ABI (since a method could do things with the reference other than dereference it). On the other hand, all of the methods on Result are meant to be inlined; they could be marked #[inline(always)], and MIR-based inlining would make it feasible to avoid materializing the full Result (or even LLVM inlining + memcpy elision, but I think that would be tricky).

Alternately, or more generally, the idea of a "fat enum reference" could be added to the language. For example, &fat Result<T, E> could be represented as (bool, *const T, *const E), although it would probably be best to make the specifics implementation-defined. Then methods on Result could take &fat self instead of self.

If that seems insufficiently justified, well, there are other use cases for the general concept of alternate data representations. If you have a struct like, say:

struct MyStruct (
    a: Option<u64>,
    b: Option<u64>,
    c: Option<u64>,
    d: Option<u64>,
    ...
}

currently it wastes a lot of memory: due to alignment requirements, each Option's tag consumes 8 bytes, even though it's semantically a single bit. It would be nice to have an attribute to allow the compiler to lay it out like

    a_is_some: bool,
    b_is_some: bool,
    c_is_some: bool,
    d_is_some: bool,
    a_val: u64,
    b_val: u64,
    c_val: u64,
    d_val: u64,

The downside would be that you could no longer take a reference to one of the fields. If you try to do so, the compiler could implicitly copy it to the stack first, but the resulting reference wouldn't have the same lifetime as the original, which is why an explicit annotation would be needed.

Anyway, that's a different feature, but I think it's closely related in concept. In fact, with the above layout you could create a fat reference to a field, although I'm not sure if that's something that should be guaranteed...

4 Likes

Oh, and –

This wouldn’t just enable returning unsized objects. It would also provide a nice framework for truly immovable types, if we choose to take that route.

…Yes, it’s the same dead horse I’ve been beating for a long time now. But it’s not too late. The final Pin design is in better shape than the one I’ve criticized in the past, but it’s far from a complete solution for immovability: it does not support types that are always immovable rather than only after creation, and without compiler support it makes things like defining structs with immovable fields unnecessarily hacky/unsafe. The latter needs to be fixed eventually, somehow. The most obvious approach is to define a native &pin and alias Pin<&T> to &pin T, or something along those lines. But an alternative is to create ?Move and deprecate Pin entirely.

If this placement design is adopted, I think using it as a base for ?Move will be compelling. You could return immovable types from functions, and manipulate them with the same APIs you’re proposing here – e.g. constructing a Box of an immovable type using Box::new_with. The restrictions on handling immovable types would be roughly the same as what you propose for functions returning unsized types. And not coincidentally, the approach to adding them backwards compatibly would be to make existing uses of ?Sized imply ?Move.

But for now I’ll file that under “future directions”.

3 Likes

I've thought about that, and will put a note about why it doesn't work that way in the "Rationale and alternatives" section. Suffice to say it works poorly with (read|write)_return_with.

Hmm... interesting.

For starters, I think read_return_with would work fine with an alternative ABI. If a function is desugared to

fn foo(ok_out: *mut BigStruct, err_out: *mut SomeError) -> bool

and you have a *mut Result<BigStruct, SomeError> you want it to write to, you can just pass pointers into your Result as ok_out and err_out. read_return_with could do this automatically.

But yep, write_return_with would be a problem. However... I'm not sure write_return_with needs to exist, at least in such a generic form. Suppose we add guaranteed named return value optimization, i.e. your 'bad' examples would now be guaranteed to optimize:

fn bad() -> Struct {
    let q = Struct { ... };
    q
}
fn bad() -> Struct2 {
    let q = Struct { ... }
    // OK to take references to q here!
    Struct2 { member: q }
}

Then write_return_with would not be needed to initialize large structs.

It wouldn't be needed for arrays, either. By the same logic, this would also be guaranteed to optimize:

fn not_bad() -> [i32] {
    let mut arr = [0i32; 1_000_000];
    for i in 0..1_000_000 {
        arr[i] = i;
    }
    arr
}

Even ignoring the Result issue, it seems like a clear benefit to be able to do those things without needing unsafe code or low-level APIs. Including NRVO would also make the design less surprising, avoiding a "gotcha" if you refactor an expression to introduce a temporary variable, a change that's usually harmless. What's the downside?

With NRVO, what use cases remain for write_return_with? I imagine there may be some, but they may be solvable with more specialized APIs that would avoid incompatibility with an alternate Result return ABI.

2 Likes

But what happens when the function doesn't return a result? How do you make it so that people can use the return_with functions in generic code?

I don't think it's possible to avoid returns-by-copy in general. Let's take your last not_bad example, and switch it into a Result and do a bit more complicated stuff.

fn not_bad() -> Result<[i32; 1_000_000], [i32; 8]> {
    let mut arr = [0i32; 1_000_000];
    for i in 0..1_000_000 {
        arr[i] = i;
    }
    if coin_flip() {
        Ok(arr)
    } else {
        Err([arr[7], arr[6], arr[5], arr[4], arr[3], arr[2], arr[1], arr[0]])
    }
}

In order to avoid copying, the compiler would have to compile it into this set of pseudocode steps:

  • First, it would need to perform the for loop, writing each list item directly into the return slot prepared by the caller. It does not yet populate the Result discriminant, and as such is treating the return slot as scratch space. This is fine.
  • Next, it does the coin flip.
  • If it returns true,
    • it should write the Ok discriminant,
    • and then jump into the caller.
  • Otherwise,
    • it writes the Err discriminant
    • It it reads from arr, which is in the caller's stack frame, while simultaneously writing its new return value, which is in the same spot in the caller's stack frame. <-- this is where the problem is
    • and then jump into the caller
1 Like

Yeah, you can’t always avoid copying when there are multiple returns of different variables.

In your case, it is solvable: the compiler would have to create a temporary [i32; 8] on the stack, construct it there, and right before returning, copy it into place (overwriting arr).

But if you have something like

fn yikes() -> Result<Huge, AlsoHuge> {
    let huge: Huge = ...;
    let also_huge: AlsoHuge = ...;
    if coin_flip(&huge, &also_huge) {
        Ok(huge)
    } else {
        Err(also_huge)
    }
}

it’s impossible to avoid a copy of at least one of the huge variables.

But it’s not particularly common for a Result’s Ok and Err types to both be huge. The tricky part is coming up with a definition of what circumstances elision is guaranteed under that’s robust, understandable, and stable across compiler upgrades. But I think it’s doable, especially if there’s a way to ask the compiler to error out if it can’t elide (which, btw, would be mandatory for my hypothetical immovable types).

3 Likes

I don't understand this question. I agreed that my idea would be incompatible with write_return_with in generic code. For read_return_with, the user code is supplying a pointer. If the function takes no hidden result pointer, as you noted, read_return_with would write to the pointer itself after calling the function. If it takes one, it passes on the pointer directly. And if it takes multiple, like ok_out and err_out, read_return_with would supply the appropriate offsets from the user's pointer.

I don't think it's okay for vital unsafe abstractions, that we expect people to build libraries on top of, to not work in generic code. The UCG working group has been talking about deprecating existing parts of the language, like mem::uninitialized() and static mut, because of poor composition in generic code.

Speaking of which, I just realized that you can use write_return_with to implement mem::uninitialized in surface Rust syntax.

pub unsafe fn uninitialized<T>() -> T {
    write_return_with(|_| {})
}
1 Like

Sorry, to clarify: I definitely don't want a function that magically doesn't work in generic code. I was suggesting that write_return_with be removed. Most use cases don't need it if you have NRVO, and if there are remaining use cases, they may be solvable with, as I said, more specialized APIs (...not sure exactly those would look like, but it depends on the use cases in question :slight_smile: ).

I do kinda like this proposal in that a) it moves the needle and b) it's very clear about what it's not going to try and do. My key hesitation, as mentioned in the proposal is how implicit it is.

I liked the idea from @matklad to have an attribute that allows you to enforce that your return is RVOd, though I wonder if there are other useful interactions you'd want to enforce - imagine a push_with function on Vec and you want to make sure that the function you pass in actually has RVO leveraged in the 'right' way. I think you'd need collaboration from the callee, and even then I don't know how you'd express it.

FWIW, it was intended less as a requirement and more a consideration given that some people mentioned this aspect as being important (my own interest lies in other areas as it happens).

1 Like

@aidanhs @matklad

I don’t like “explicit RVO” or “must RVO” because they don’t help with the how much question. Take a structure like this

struct TaggedThingy<T: ?Sized> {
    tag: Tag,
    content: T,
}

and then imagine a function that puts a variable-length array inside it

#[require_rvo]
fn this_function_performs_lots_of_copying() {
    let n = 1_000_000_000;
    let content = [0; n];
    let tag = Tag::Trusted; // whatever this means
    TaggedThingy {
        tag, content
    }
}

As you can figure out, the TaggedThingy is returned by RVO, but the inner array is copied. Worse, while it’s clear in this case what should be done, it’s not possible to know in general. If the developer knows that the VLA will always be small, then they can just go ahead and copy it, but if it’s possible that the VLA will be big, then it should be disallowed. This isn’t the provenance of the type system; this kind of stuff belongs in Clippy.

I don't think this work for serde and faillible creation (as you mentioned for Result). I guess you'll need to clarify this.

My two cents idea on this is an out T qualifier that allows you to call generic methods with some kind of preallocated memory. For example:

fn new() -> Result<out T, E> { ... }

gets processed as

fn new(&uninit T) -> Result<(), E> { ... }

(This was what I wrote for a stalled private proposal, and this assumed NRVO and also has other kind of problems.)


The reason I stopped working on placement proposals is that I couldn't find a concrete use case. Big arrays? Arrays are terrible without const generics anyway. Gigantic structs? People tend to just use Vec anyway.

Do you have any thoughts regarding use cases?

They're fine if they're variable length arrays. That's why this proposal goes to so much hassle to define how returning a bare slice works.

The how about putting the attribute on a let binding to ensure NRVO? That should clearly answer how much.

fn foo() -> T {
    #[must_nrvo]
    let a = MaybeUninit::uninit();
    // ... initialize a
    unsafe { a.assume_init() }
}

Potentially even allowing it on substructures of the return.

fn bar() -> TaggedThingy<T> {
    let n = 1_000_000_000;
    #[must_nrvo]
    let content = [0; n];
    #[must_nrvo]
    let tag = Tag::Tusted;
    #[must_nrvo]
    let ret = TaggedThingy {
        tag, content
    };
    ret
}

I also has an idea similar to this but written differently.

fn new() -> Result<&'return T, E> { ... }

That way the written return type is the actually return type. A binding with #[must_nrvo] being the only thing with 'return lifetime. The caller would only use the returned ref to debug_assert_eq!() it to the passed in pointer.

1 Like

Two reasons:

  • Tag::Trusted probably shouldn't be subject to RVO. It's a C-like enum. It's faster to just copy it than to mess with output pointers.

  • If you accidentally leave out one link in the chain, then you lose all of the guarantees. That's not much better than implicit RVO.

What passed-in pointer? Your new() function has no parameters. Are you talking about something like the pointer passed to read_return_with?

There wouldn't be a out-pointer for every NRVO, just the outermost one. tag would be allocated directly in it's final location. I would only see there being problems with this if it's Copy and used after the outer stuct is constructed but the compiler could just have a default deny lint about a NRVO value being copied.

You only need to be sure to put it on all the big parts. For something small like a tag it doesn't really matter if it has NRVO or not.

Yes. I think something alone the lines of read_return_with would be the only way a placed value returned in a Result could work. The API would need adjustments, and I haven't finalized my idea of what I think it should be but I think it will work.