Alternative design for (some of) RFC 2884

RFC 2884 proposes a design for allowing functions to return unsized or large types types directly on the heap. It does this by allowing callers to provide a pointer to an allocation of the right size to the callee.

With the current design of the RFC, functions that return unsized types are transformed into a generator. When called for the first time, the generator returns the size of the allocation it needs to the caller. When called the second time with a pointer to said allocation, it writes out its return value to that pointer. Unfortunately, this generator-based design imposes two limitations on functions that return unsized types:

  • They can't be converted into a function pointer.
  • They can't store their unsized return value on their stack using alloca.

There is an alternative desugaring for functions that return unsized values that solves both of these issues. Instead of compiling to a generator, a function that returns an unsized value gets an extra implicit "emplacer" parameter, which is a closure that tells it how to construct the allocation it needs. For example,

fn return_unsized(a: u32) -> [u32] {
   [a, a, a]
}

Becomes, approximately:

fn return_unsized(a: u32, emplacer: dyn FnOnce(Layout, <[u32] as Pointee>::Metadata, dyn FnOnce(*mut ())) {
    emplacer(
        Layout::from_size_align(mem::size_of::<u32>() * 3, mem::align_of::<u32>()).unwrap(),
        3,
        |dest| unsafe { ptr::write(dest as *mut [u32; 3], [a, a, a]) },
    )
}

This allows frobnify to construct the allocation for its unsized value without ever yielding to its caller, obviating the need for a generator.

The emplacer closure could be passed in via one of the various "context" proposals that people have suggested.

A more complete desugaring:

fn return_unsized(a: u32) -> [u32] {
   [a, a, a]
}

fn main() {
    // Function with unsized return can coerce to function pointer!
    let fn_ptr: fn(u32) -> [u32] = return_unsized;
    dbg!(Box::new_with::<[u32]>(|| fn_ptr(1)));
}

would desugar to something like this:

#![feature(ptr_metadata)]

extern crate alloc;
use alloc::alloc::{alloc, Layout};
use core::mem::{self, MaybeUninit};
use core::ptr::{self, Pointee};

fn box_new_with<T: ?Sized>(
    unsized_ret: impl FnOnce(dyn FnOnce(Layout, <T as Pointee>::Metadata, dyn FnOnce(*mut ()))),
) -> Box<T> {
    let mut uninit_box = MaybeUninit::uninit();
    unsized_ret(|layout, meta, closure| {
        let box_ptr = unsafe { alloc(layout) as *mut () };
        closure(box_ptr);
        let init_box = unsafe { Box::from_raw(ptr::from_raw_parts_mut(box_ptr, meta)) };
        uninit_box.write(init_box);
    });

    return unsafe { uninit_box.assume_init() };
}

fn return_unsized(
    a: u32,
    emplacer: dyn FnOnce(Layout, <[u32] as Pointee>::Metadata, dyn FnOnce(*mut ())),
) {
    // a more complicated function could do stuff here

    // call to "emplacer" must be last thing `return_unsized` does
    emplacer(
        Layout::from_size_align(mem::size_of::<u32>() * 3, mem::align_of::<u32>()).unwrap(),
        3,
        |dest| unsafe { ptr::write(dest as *mut [u32; 3], [a, a, a]) },
    )
}

fn main() {
    let fn_ptr: fn(
        u32,
        dyn FnOnce(Layout, <[u32] as Pointee>::Metadata, dyn FnOnce(*mut ())),
    ) = return_unsized;
    dbg!(box_new_with::<[u32]>(|e| fn_ptr(1, e)));
}

(Playground for closest equivalent that works on today's Rust)

When called without an implicit emplacer, functions that return unsized types would use a default emplacer provided by Rust that allocates the unsized value at the end of the callee's stack, and then ensures it is copied back to the caller's stack when the function returns. This may require functions that return unsized values to have a different calling convention. The special "magic alloca emplacer" could also be exposed as an intrinsic that other emplacers can call.

2 Likes

Nit: I expect you meant "on the stack."

edit: oh maybe not -- is it actually doing heap allocation? Nevermind then, sorry.

I mean "on the heap"—or "in an arena", or with whatever other allocation method the caller provides. Though I would expect that a program could use the same mechanisms to return them on the stack, too; if no read_return_with is called (original RFC) / no emplacer is provided (this proposal), the default could be to use alloca.

1 Like

I don't think this design is compatible with stack allocations in the caller's context (aka alloca), because, at the point the emplacer is called, the stack frame that needs to be enlarged will be buried underneath the unsized-returner's frame and the emplacer's frame. And stack frames may contain internal references to other things in the same or any lower frame, so they're pinned -- you can't fix this with a memmove().

The generator-based design doesn't have this problem because the caller will be the topmost stack frame at the time the alloca needs to happen.

4 Likes

The callee would alloca at the end of its own stack, and when the callee returns the caller would move the allocation back (apparently Ada does this?). That loses guaranteed copy elision, unfortunately. Maybe there could be a way for the callee to specify whether it needs copy elision… @y86-dev's post seems relevant here

---the data that is now below the stack pointer must be assumed to have been overwritten by a signal handler or a side trip through the dynamic loader or some other such invisible operation that does stuff with that memory.

3 Likes

Rust can just not move the stack pointer until the data is copied?

I feel like this approach will also result in a lot of intermediate copies.

How would this interact with Box::new? It does not allow unsized values at the moment (is that going to change?). If it changes how do we avoid copying potentially GBs to the stack and then to the heap? If e.g. we introduce an indirection?

fn handle(foo: dyn Foo) -> Box<dyn Foo> {
    Box::new(foo)
}

fn main() {
    let foo = handle(create_foo());
}

What about this?

fn handle(foo: dyn Foo) -> Result<Box<dyn Foo>, dyn Foo> {
    match Box::try_new_uninit() {
        Ok(mut b) => Ok(b.write(foo)),
        Err(AllocError) => Err(foo)
    }
}

Could we even prevent copying here?

This approach is no different from the RFC's original approach in that, to guarantee avoiding intermediate copies, you still need to use Box::new_with.

As for fallible initialization, I could imagine some sort of try keyword generics solution:

try? fn return_unsized(
    a: u32,
    emplacer: dyn FnOnce(Layout, <[u32] as Pointee>::Metadata, dyn FnOnce(*mut ())) -> 
    (if try { Result<(), AllocError> } else { () }),
) ->  (if try { Result<(), AllocError> } else { () }) {
    emplacer(
        Layout::from_size_align(mem::size_of::<u32>() * 3, mem::align_of::<u32>()).unwrap(),
        3,
        |dest| unsafe { ptr::write(dest as *mut [u32; 3], [a, a, a]) },
    )?
}

For your second example, you would need some way to write:

fn box_new_with_maybe<T: ?Sized>(
    unsized_ret: impl FnOnce(&mut dyn FnMut(Layout, <T as Pointee>::Metadata, &mut dyn FnMut(*mut ()))),
) -> Result<Box<T>, T> {
    let mut uninit_box = MaybeUninit::uninit();
    let mut boxed = false;
    let mut magic_place_at_bottom_of_stack: MaybeUninit<T>;
    unsized_ret(&mut |layout, meta, closure| {
        match unsafe { Global::allocate(layout) } {
            Ok(box_ptr) => {
                closure(box_ptr);
                let init_box = unsafe { Box::from_raw(ptr::from_raw_parts_mut(box_ptr, meta)) };
                uninit_box.write(init_box);
                boxed = true;
            }
            Err(alloc_error) => {
                let emplacer = alloca_emplacer_copy_back_to!(magic_place_at_bottom_of_stack);
                emplacer(layout, meta, closure)
            }
        }
    });

    
    if boxed {
        Ok(unsafe { uninit_box.assume_init() })
    } else {
        Err(unsafe { magic_place_at_bottom_of_stack.assume_init() })
    }
}

that looks really complicated...

What would be the main application of unsized return values (except futures)? The RFC is a bit lacking in the motivation section about that point (to be fair, it is mainly designed to solve another issue).

In practice I think what I wrote would not be supported at all. Copying the Box (2 x usize, ptr + metadata) is probably not going to be a big deal anyway, especially compared to the cost of allocating. Unless you were asking about something different with "prevent copying"?

  • returning dyn Trait, dyn Trait methods that return impl Trait
  • slices with small number of elements (for example, function that returns ~ 5-10 u32s) - or even a large number, if they get emplaced directly on the heap

If you don't care if the Box is moved, and just want to avoid copying its contents and only copy when heap allocation fails, I think we can get away with only a little magic. What you would do is, inside your own emplacer, construct the Result, with whatever variant it ends up having, and store it as a local variable (with feature(unsized_locals)). Then, as the last statement in your own emplacer, you would call magic_alloca_emplacer to emplace the Result you constructed.

I've gotten a little bit confused about what's being proposed, but I still don't see a way to achieve all three of these goals simultaneously with your design: (1) the returned object can be emplaced into alloca space in the caller's stack frame; (2) memory cells on the inactive side of the stack pointer are never considered to hold live data [required since the kernel could deliver a signal/AST/APC/whatever at any point]; (3) the normal calling convention is used [required for unsized-returning functions to be convertible to function pointers].

It might be easier for you to see the problem I'm seeing if you write down the assembly language that the compiler would have to generate to implement your proposal. Try to do it for both x86 and for a branch-and-link architecture.

I don't see how normal calling convention is required at all. A pointer to a function that returns an unsized type would be of different type than a pointer to a function that returns a sized type, so they can have different codegen and calling convention, no?

(To be clear, most of the code examples I have posted are approximate desugarings to demonstrate and prototype how this feature might work, not an exact specification.)

Yeah.

To give a bit of history: the idea of using generators or some other scheme to enable functions to return unsized values is one that has come up independently a few times (as in, I saw it on forums a few times before I wrote PR #2884).

This RFC is based on a proposal by notriddle, who designed it as a solution for writing to uninitialized memory and the Read trait in general. I rewrote the proposal to include RPIT-in-dyn-trait which was the use case I'd had in mind at the time after reading Niko Matsakis's posts.

(RPIT: return-position impl Trait)

In retrospect, if I were writing the RFC now, I'd focus mostly on RPIT, with a few other use-cases (eg returning unsized arrays) as potential future expansions.

Unsized returns aren't a good fit for IO: they require you to know the exact size of your return type ahead of time, and IO calls might write less than the buffer you reserved. Also, most IO calls return Result types, and handling Result with an unsized Ok variant is an open problem. And since the RFC was posted, another proposal was accepted for reading to uninitialized buffers that better matched the use-case.

Also, if I were writing the RFC today, I wouldn't start with a specific implementation in mind, and instead I'd try to list multiple possible implementation strategies (eg with generators, with closures, with static info) and their respective limitations and drawbacks.

I think the possibility space there hasn't really been explored, and I think even if the lang team decides to go with unsized returns, it won't be in the way RFC 2884 describes.

3 Likes

Has anyone ever created a proc-macro prototype of the generator approach? I think it often uncovers problems and helps to understand the underlying issue better.


@Jules-Bertholet I think I now understand your approach better. To me it seems as if it is a more general in-place constructor. There are some more details I am not sure about (e.g. how do you handle more complex logic deciding which other unsized value returning function to run).

Last time I looked at this RFC I thought that it would be better to split the in-place, fallible, pinned constructor feature from returning from returning unsized values. As I am also particularly interested in a performant (similar to just writing unsafe code to initialize the raw pointer directly) solution, I am worried, that the complexity introduced by all of that dyn closure stuff can be reliably optimized away (a prototype would help here too).


That makes sense, thanks for the history.

1 Like

I made one! unsized-vec::UnsizedVec<T> is like Vec<T>, but T can be ?Sized.

3 Likes

I have rewritten this crate, and extracted the placement machinery into a dependency.