Returning a large type into a `Box` without copies

I've been under the impression that rustc tends to return large types from functions by allocating the space for it before the function call, passing the address of the allocated memory to the function, and writing the return value inside the function through that pointer.

In the following example, based on calling the functions foo_ok and bar_ok it seems like this kind of thing might be happening also when the function call expression is being passed as the argument to a std::boxed::box_new call, because if something like this didn't happen you'd expect both foo_ok and bar_ok to cause a stack overflow, which they don't.

But in the same example code, the slightly different functions foo_bad and bar_bad do cause a stack overflow when used in the same way. Can someone explain this behavior to me? By the way, whether compiling in debug or release mode doesn't seem to make a difference. You can try the following code here at the playground.

#![feature(liballoc_internals)]
#![allow(dead_code)]
#![allow(internal_features)]
#![allow(unused_variables)]

fn foo_ok() -> [u32; 10_000_000] {
    let arr = [10; 10_000_000];
    arr
}

fn foo_bad() -> [u32; 10_000_000] {
    let mut arr = [5; 10_000_000];
    arr[0] = 42;
    arr
}

fn bar_ok(take_five: bool) -> [u32; 10_000_000] {
    if take_five {
        [5; 10_000_000]
    } else {
        [0; 10_000_000]
    }
}

fn bar_bad(take_five: bool) -> [u32; 10_000_000] {
    let arr = if take_five {
        [5; 10_000_000]
    } else {
        [0; 10_000_000]
    };
    arr
}

fn main() {
    {
        let foo = std::boxed::box_new(foo_ok());
        let bar_true = std::boxed::box_new(bar_ok(true));
        let bar_false = std::boxed::box_new(bar_ok(false));

        // Prints 10 5 0
        println!("{} {} {}", foo[0], bar_true[0], bar_false[0]);
    }

    {
        // Any one of the following lines causes stack overflow
        // let foo = std::boxed::box_new(foo_bad());
        // let bar_true = std::boxed::box_new(bar_bad(true));
        // let bar_false = std::boxed::box_new(bar_bad(false));
    }
}

There are currently multiple proposals in flight for "initializer expressions" or "placement new", which would allow initializing a new value directly in a Box without putting it on the stack first.

So, it's a simple case of std::boxed::box_new not being fully implemented for all its intended use cases, but it (or some other syntax/function for the same purpose) is being worked on.

Currently the most reliable method to avoid it is to create an uninit box, then writing the values into it, one by one for large arrays or slices and then calling assume_init.

1 Like

Dead while waiting for a language feature :skull:

The most recent work I am aware of is Public view of rust-lang | Zulip team chat , but I didn't yet seen it going anywhere public

In my production code, I am usually having fn(Pin<&mut Uninit>) -> Pin<&mut Init> signatures, but you are losing the Drop and most likely will have to have some (Initialized) bool in Uninit to ensure user is not trying to reuse the allocation again.

BTW, if your actual use case is large arrays, build them up in vectors or with iterator APIs. There's no good reason to use [u8; 10_000_000] ever. (Or, really, even [RealisticNontrivialType; 2_000].)

let foo: Box<[u8]> = repeat_n(0, 9_999_999).chain([1, 2]).collect();
dbg!(foo.len());

works even in debug mode https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=8438fa1a1436b2a12e78dde9eab67482

2 Likes

I would add a bit of nuance.

In embedded with there are use cases for statically allocated arenas and buffers when you don't have dynamic allocation. But that is more likely it be 8 or 16k on the upper end (most likely smaller than that).

When working with microphone sound data over I2S I did need a static buffer for doing DMA of the samples to, that ended up being 8k treated as a ring buffer of 4x 2k buffers). And since it is a static it won't run into the copying issues as far as I know.

Framebuffers for ui tend to be 200kb+ sometimes, but you don't usually move them by value :grinning_face_with_smiling_eyes:

1 Like

And that's still about 3 orders of magnitude smaller than the type I said.

There's absolutely a place for types like [u8; 1024] or even [u16x8; 256].

But if you're doing megabytes at a time, did you really need the array type, especially outside of a box?

@Ddystopia mentioned frame buffers. That does seem like a relevant use case. Same for textures etc. (Of course frame buffers on embedded are a bit smaller than on the latest GPU. I have not worked with graphics on embedded so I don't know the actual values. But their 200k doesn't seem unreasonable. And as they said: you don't move them by value for obvious reasons.)

And "outside box" by ref or mut ref is the norm on embedded of course. Not becuse alloc is expensive (it is, but if you are dealing with MB anyway, it isn't that bad). But without an MMU you are dealing with physical addresses, this makes it much harder to deal with memory fragmentation for example. You definitely don't want to waste any of your precious RAM.

And without an MMU you lack the safety net of an OS, so traditionally in C if you don't allocate and free, you can't have use-after-free (much less of a concern in Rust of course).

Oh, turns out format_args! can now be stored in let binding and it makes use of super let!

2 Likes

No one actually answered the question .. I am also interested why the placement init works sometimes but not others. It should not have to be explicit because that would violate Separation of Concerns -- explicit placement init is (should be) reserved for manual optimisation and IMO is not related to the original question.

Rust is generally really bad at optimizing moves... It is really bothering on embedded, where stack is limited while Rust can easily make stack usage 3-10 times more than needed