Mandatory inlined functions for unsized types and 'super lifetime?

Proposal

Allow functions to be declared as inline fn, which would require them to be inlined into the callee. This makes it easy to do two key things: return unsized types, and return references to the stack.

Here's an example of both capabilities at the same time. I'm using the 'super reserved keyword for the special "callee" reference:

impl<T: ?Sized> Pin<T> {
    pub inline const fn value(mut value: T) -> Pin<&'super mut T> {
        // SAFETY: value is on a fixed position in the stack
        // and is inaccessible except via Pin wrapper 
        unsafe { Pin::new_unsafe(&mut value) }
    }
}

Example usage:

fn main() {
    let generator = Pin::value(|| {
        let mut i = 0;
        loop {
            yield i + 1;
            i += 1;
        }
    });

   // do stuff with the generator 
}

Similarly, this would allow wrapper types like ManuallyDrop to take and return unsized values:

impl<T: ?Sized> ManuallyDrop<T> {
    pub inline const fn new(inner: T) -> Self {
        ManuallyDrop { inner }
    }
}

...and could be used to allow alloca to be implemented:

pub unsafe inline fn alloca(layout: Layout) -> &'super [u8] {
   // platform specific stack manipulation asm goes here
}

Semantics

And inline function would be syntactical sugar for effectively cutting-and-pasting the body of the function into the callee, similar to the way macros work right now.

The key difference between inline fn and #[inline(always)] is the latter is just a (strong) hint that may still be ignored by the compiler, eg in the case of a function that recurses. An inline function would not be allowed to to have call graph with recursion and would never show up in stack frames. (how it should interact with #[track_caller] etc is an interesting question!).

I'm frankly not a compiler expert, so I'm sure someone can chime in on how that simple model wouldn't quite work with unsized function arguments and unsized return values. But the 'super lifetime by itself seems valuable to me on its own.

Implementation

Again, I'll defer to actual experts on the viability of this. :slight_smile:

1 Like

I think your use cases are best solved by a combination of other features.

  • The Pin::value method can be implemented as a macro (see futures::pin_mut).

  • Better support for creating/returning unsized values will be addressed by the unsized_locals feature

  • alloca would probably need to be implemented as a compiler builtin, so that it can be properly integrated with stack probes.

2 Likes

While macros can provide some of this functionality now, because they are much more general they are harder to understand and reason about. There also is no path for them to be part of a trait. An inline fn on the other hand could certainly be, which could have allowed:

trait Clone {
    inline fn clone(&self) -> Self;
}

impl<T: Clone> Clone for [T] {
    inline fn clone(&self) -> Self {
        let uninit = [MaybeUninit::uninit(); self.len()];
        /* do clone, handle drop-on-panic, etc. */
        unsafe { mem::transmute(uninit) }
    }

As for returning unsized values, that's actually where I came up with the idea for this: the best proposal I've seen so far for returning unsized values is effectively to return a special type of generator that 1) determines how much stack space needs to be allocated, and 2) actually generates the value with a closure. I would argue this is a much simpler alternative that doesn't require the unsafe code that the "return_with" proposal does, while still making it clear what is and isn't inlined into the callee.

The only other proposal I'm aware of is to do a CPS conversion on the function and return a continuation: https://github.com/rust-lang/rust/issues/48055#issuecomment-415980600 But that strikes me as much more complex to implement, and much less obvious to the programmer as to what exactly is going on.

This is a really interesting proposal. Do you have thoughts on what scope the inlined function should have? Should it create a new scope? Or be merged into its caller-site's scope (not for accessing variables, but for creating new ones)?

The reason I ask is that one of the places I'd like easier ergonomics, which I think this could possibly provide, is functions which return a reference which may need a backing variable. Things like:

fn upper_if_odd(s: &str) -> &str {
    let upper;
    if s.len() %2 == 1 {
        upper = s.to_uppercase()
        &upper
    } else {
        s
    }
}

This code doesn't work today, because ownership of upper can't be transferred to the caller and gets dropped at the end of the function. You can work around this with something like std::borrow::Cow, or by writing a macro, but I find that I sometimes don't extract a function where I would in other languages where I wouldn't have this issue, and that makes me a bit sad.

That said, implicit magic scope expansion is pretty magical, but something along the lines of the super lifetime could potentially be a route to a solution here, as an instruction to place something in the calling stack frame...

1 Like

I think scope is a really interesting question for the proposal, particularly since destructors can have side effects. The most restrictive way - and least risk - way to do it would be to have the scope end, with everything dropped prior to the function "returning". That'd be sufficient for the unsized return values use-case, but not the 'super lifetime usecase:

inline fn make_slice(len: usize) -> [u8] {
    let inner = DoesSomethingOnDrop;

    let r = [0u8; len];
    r
    // inner dropped here
}

Note how you could even enforce a requirement that all values in scope be manually dropped, so as to avoid a compatibility question with other potential semantics. Similar to how we required union fields to be Copy, and more recently, ManuallyDrop.

Beyond that, I think there's basically three options:

  1. Treat the function body as part of the callee scope, and thus drop everything when that scope ends.
  2. Drop everything except values referenced by a 'super lifetime.
  3. Require values that will escape the scope to be explicitly declared as such within the inline function:
impl<T: ?Sized> Pin<T> {
    inline fn value(value: T) -> Pin<&'super mut T> {
        let super mut value = value;
        unsafe { Pin::new_unchecked(&mut value) }
    }
}

I think option #3 is my favorite, as it's the most explicit.

There is a library prototype implementation of this proposal with:

Examples

1 - Pin::value()

impl<T> Pin<T> {
    #[with]
    pub fn value (local: T) -> Pin<&'ref mut T>
    {
        pin_mut!(local);
        local
    }
}

#[with]
fn main ()
{
    let mut generator: Pin<&'ref mut _> = Pin::value(|| {
        let mut i = 0;
        loop {
            yield i + 1;
            i += 1;
        }
    });

    // do stuff with the generator 
}

2 - Clone unsized value:

#[with] does work with trait methods:

trait CloneUnsized {
    #[with]
    fn clone (self: &'_ Self) -> StackBox<'ref, Self>;
}

Implementing this for a generic [T] of unbounded length will still require special features, such as unsized_locals or alloca; both of which, to be honest, worry me: we are talking here of the bad case of alloca: unbounded stack allocations :grimacing:

So here come a more sane version:

trait CloneUnsizedBounded {
    type CapacityError;

    #[with]
    fn clone_bounded<const MAX: usize> (self: &'_ Self)
      -> Result<StackBox<'ref, Self>, Self::CapacityError>
    ;
}

impl<T : Clone> CloneUnsizedBounded for T {
    type CapacityError = ::core::convert::Infallible;

    #[with]
    fn clone_bounded<const MAX: usize> (self: &'_ T)
      -> Result<StackBox<'ref, T>, Self::CapacityError>
    {
        Ok(stackbox!(self.clone()))
    }
}

impl<T : Clone> CloneUnsizedBounded for [T] {
    type CapacityError = (/* … */);

    #[with]
    fn clone_bounded<const MAX: usize> (self: &'_ [T])
      -> Result<StackBox<'ref, Self>, Self::CapacityError>
    {
        let buf: &mut [MaybeUninit<T>] =
            uninit_array![T; MAX]
                .get_mut(.. self.len())
                .ok_or((/* … */))?
        ;
        buf.iter_mut().zip(self).for_each(|out, in_| *out = in_.clone().into());
        let buf: &mut ManuallyDrop<[T]> = unsafe {
            // Safety: the `.len()` elems have been initialized.
            ::core::mem::transmute(buf)
        };
        Some(unsafe {
            // Safety: the local function won't drop the given elems.
            StackBox::new(buf)
        })
    }
}
  • All of the code above can be done with today's Rust. Maybe not the const generic part, although it is not far off, but that part could be replaced with a generic parameter from ::typenum (see also uninit_array!).

    • I haven't written yet the StackBox<'frame, T> abstraction, but it is basically &'frame mut T but with move semantics (no reborrowing) and thus the ability to drop in place the pointee when this wrapper is dropped (thus it is implemented using &mut ManuallyDrop<T>). Requires a macro to be constructed for the sized case, or an unsafe fn constructor for any case, as showcased above.

3 - Input unsized values

This is mainly a matter of using the StackBox abstraction; or a language bless version of &move owning pointers.


EDIT / Addendum

Note that I am not saying that your proposal doesn't have merit, quite the opposite! But as with most proposals, having prototypes to play and experiment with is actually a good element not to be deliberating around hypotheticals. The ::with_locals crate is something concrete that already exists out there, and that showcases how all this could be implemented.

I'd definitely love to have some language support for &move references, since the to-be-done StackBox will encounter the limitation of only being able to work with a limited set of hardcoded traits, w.r.t. trait objects; which is a pity.

The whole CloneUnsized proposal is, imho, especially interesting for trait objects, where there is no danger of going unbounded.

  • Example.
    #[with(dyn_safe = true)]
    trait CloneTraitObject<DynTrait : ?Sized> {
        fn clone_local (self: &'_ Self)
          -> StackBox<'ref, DynTrait>
        ;
    }
    
    trait AnyClone : Any + CloneTraitObject<dyn AnyClone + 'static> {}
    impl<T : ?Sized> AnyClone for T where Self
        : Any + CloneTraitObject<dyn AnyClone + 'static>
    {}
    
    impl dyn AnyClone + 'static { /* downcasting stuff */ }
    
    [with(dyn_safe = true)]
    impl<T : Any + Clone> CloneTraitObject<dyn AnyClone + 'static> for T {
        fn clone_local (self: &'_ T)
          -> StackBox<'ref, dyn AnyClone + 'static>
        {
            stackbox!(self.clone()) as StackBox<'_, dyn AnyClone + 'static>
        }
    }
    
1 Like

Heh, I really shouldn't be surprised you can implement all that with proc_macro! I have some code in a serialization/deserialization crate I'm writing along kinda similar lines. I also have a &move equivalent called RefOwn:

pub trait Get {
    fn get_then<T: ?Sized + Pointee, F, R>(&self, f: F) -> R
        where F: FnOnce(RefOwn<T>) -> R;
}

(&own references is a feature I'm realizing would be really useful in a lot of situations! a &put for uninitialized memory that you write to once would be nice too - potentially useful for a stackbox-like concept)

My concern with any kind of CPS transform in the language itself for unsized return values is that the actual code after compilation will either be equivalent to just inlining the function, or will radically change the call graph . In the first case, I think it'd be simpler and more explicit to just have inline functions; in the second case, I'm sure people will run into situations where they wished it was more explicit. :slight_smile:

For example, consider the case where an inline fn is called within a loop:

inline fn ret_unsized(i: usize) -> dyn fmt::Debug {
    /* ... */
}

for i in 0 .. n {
    let x: dyn fmt::Debug = ret_unsized(i);
    dbg!(x)
}

So long as stack frames allocas are undone after a variable goes out of scope, it's pretty easy to understand what the above might compile down too, even without any optimizations. Not so easy with any kind of CPS transform.

In fact, the combination of &own references and inline fn can avoid the alloca problem in that example:

inline fn ret_unsized(i: usize) -> &'super own dyn fmt::Debug {
    if i == 0 {
        let super r = "zero";
        &own r
    } else {
        let super r = i;
        &own r
    }
}

for i in 0 .. n {
    let x: &own dyn fmt::Debug = ret_unsized(i);
    dbg!(x)
}

Since ret_unsized returns a sized type, without alloca the stack space it uses must be 1) bounded, and 2) reused.

1 Like

Thinking about this more, the combination of let super variables and the 'super' lifetime does not need inline fn`, as callee-allocated super bindings can be put into a datastructure provided by the callee. Basically, return slot pointers on steroids. Consider the following:

fn foo_or_bar(flag: bool) -> &'super dyn Debug {
    if flag {
        let super foo: Foo = Foo::new();
        &foo
    } else {
        let super bar: Bar = Bar::new();
        &bar
    }
}

We know statically that there are two codepaths that need to allocate variables in the callee stack frame, so we can transform that function into one that takes an enum:

enum Super {
    Uninit,
    Foo(Foo),
    Bar(Bar),
}

fn foo_or_bar<'callee>(flag: bool, mut callee: &'callee mut Super) -> &'callee dyn Debug {
    if flag {
        *callee = Super::Foo(Foo::new());
        if let Super::Foo(foo) = callee {
            foo
        } else {
            unreachable!()
        }
    } else {
        *callee = Super::Bar(Bar::new());
        if let Super::Bar(bar) = callee {
            bar
        } else {
            unreachable!()
        }
    }
}

fn call_foo_or_bar(flag: bool) {
        let mut callee = Super::Uninit;
        let r: &dyn Debug = foo_or_bar(flag, &mut callee);
        dbg!(r);
}

Furthermore, if one 'super returning function calls another 'super returning function, the straightforward thing to do is to combine the scratchpad datastructures together. For example:

fn foo(foo: Foo) -> &'super Foo {
    /* ... */
}

fn bar_or_foo(f: bool) -> &'super dyn Any {
   /* ... */
}

would transform to:

enum FooScratchpad {
    Uninit,
    Foo(Foo),
}

fn foo(foo: Foo, scratchpad: &mut FooScratchpad) -> &'super Foo {
    /* ... */
}


enum BarScratchpad {
    Uninit,
    Bar(Bar),
    Foo(FooScratchpad),
}

fn bar_or_foo(f: bool, scratchpad: &mut BarScratchpad) -> &'super dyn Any {
   /* ... */
}

Scope and Dropping

Again, the simplest approach to scope/drop would be to just forbid types with drop glue from being declared with let super.

In particular, in the nested version of this idea, you could wind up having drop called quite high up the call stack. This may be surprising! It might be better to treat all &'super references as akin to &own, and have the returned reference be responsible for calling drop, with all other variables in scope always dropped prior to the function returning. That way the return value would still reflect 100% of what might be dropped after the function returns.

Unsized Return Values

Without special syntax, this "allocate the maximum possible space needed" approach should also work for unsized return values, with the actual return value being a &own reference. Maybe I'm missing something, but I don't seem to see this approach having been proposed in RFC1909, or the discussion around it.

When you think about it, it'd actually be quite similar to how impl Future and so-on works in case of deeply nested futures.

Object Safety

This technique could even be used with trait methods! Since the above transformation means we know the maximum size of the r-value, we can store that in the vtable and allocate the neccessary space dynamically with alloca!

Expanding scopes of bindings is an interesting idea, but as-written, the lifetimes in this signature itself don't work:

fn upper_if_odd(s: &str) -> &str {
    let upper;
    ...

This says that no matter how long-lived the original s is, we're going to return a &str that lives just as long as it. Or maybe that's what the 'super lifetime is supposed to help with?

fn upper_if_odd(s: &str) -> &'super str {
    let super upper;
    ...

But I'm not quite sure what 'super is intended to mean. Is it the lifetime of the caller function? The lifetime of the smallest containing scope in the caller function? Either one has problems. It can't be the lifetime of the caller function, because the number of bindings needed wouldn't be known at compile time:

fn broken_1(input: &[&str]) {
  let uppers: Vec<&str> = Vec::new();
  for s in input {
    uppers.push(upper_if_odd(s));
  }
}

But if it's the lifetime of the smallest containing scope, then it has some very awkward limitations:

fn broken_2(input:&str) {
  // lifetime error because the output of `upper_if_odd`
  // doesn't outlive the `else` block!
  let foo = if some_condition { input } else { upper_if_odd(input) };
}

I'm very concerned that this feature doesn't have enough generality, and thus doesn't play nicely with other Rust patterns (like the above). At the very least, if we're making a way to declare variables that are promoted into an ancestor scope, it should be possible to specify which ancestor scope to use, perhaps something like this:

fn not_broken_2(input:&str) 'a: {
  let foo = if some_condition { input } else { upper_if_odd::<'a>(input) };
}

fn upper_if_odd::<'binding>(s: &str) -> &'binding str {
    let 'binding upper;
    ...

But the thing is… We already have a feature for producing a value in the callee whose storage location and scope is determined by the caller. It's "return values".

fn not_broken_2_but_in_current_rust(input:&str) {
  let cow;
  let foo = if some_condition {
    input
  } else {
    cow = upper_if_odd(input);
    &*cow
  };
}

fn upper_if_odd(s: &str) -> Cow<str> {
    ...

I would much rather improve this approach, through generalizing current Rust features (like guaranteed return value optimization and making it easier to return self-referential structs), rather than add a new complex feature, unless the new feature has some very compelling additional advantages.

1 Like

I guess the most inconvenient thing about not_broken_2_but_in_current_rust is the fact that we have to make an explicit binding for cow even though we don't care about it for anything other than the fact that we need foo to reference it. That problem could use some attention, possibly through a feature for promoting temporaries (which has been discussed before, and would have benefits beyond the inline function issue). IIRC the main challenge was the fact that temporaries are currently guaranteed to be dropped, so we would need an ergonomic explicit way to specify that you want to extend its lifetime. Probably not like this, but just for example:

fn not_broken_2_with_with_promoting_temporaries(input:&str) {
  let foo = if some_condition {
    input
  } else {
    &*#[promote_this_temporary_as_needed] upper_if_odd(input)
  };
}

I was thinking smallest containing scope originally. But I hadn't originally considered how that'd interact with narrower scopes like if statements and the like, and you make a good point re: ergonomics.

The simplest way to salvage the concept would be if from the caller's perspective, you could pick the scope the `'super' lifetime applied to by binding the return value to a name, with the default being the smallest enclosing scope:

fn upper_if_odd(s: &str) -> &'super str {
    if s.len() % 2 == 1 {
        let super upper: String = s.to_upper();
        &upper
    } else {
        s
    }
}

fn fixed_2(input: &str) {
    let upper;

    let foo = if some_condition {
        input
    } else {
        upper = upper_if_odd(input);
        &upper
    };
    /* ... */

    // String dropped here
}

However, the more I think about it, the less I like the hidden drop. That particular example isn't a good one either, as once you need the rebinding, why not just return a impl Borrow<str>, and use Either internally? We're dealing with a heap allocation anyway, so there isn't much need for return value optimization.

let super and self-referential structs

If let super allows us to name stack space in our caller, 'super' let's us reference that stack space, and &own Thas ownership ofT`, we can return both an owned value and, and a reference to within it at the same time:

fn to_upper(s: &'super str) -> (&'super own Option<String>, &'super str) {
    let mut super r: Option<String> = None;
    if s.len() % 2 == 1 {
        (&own r, s)
    } else {
        r = Some(s.to_upper());
        let r = &own r;
        (r, r.as_ref().unwrap())
    }
}

fn use_to_upper(s: &str) {
    let maybe_string;
    let s = if some_condition {
        s
    } else {
        let (opt, r) = to_upper(s);
        maybe_string = opt;
        r
    }
    /* do stuff */
    // the Option<String> is dropped here when the &own Option<String> goes out of scope
}

Needs more thought into what exactly the semantics should be, and that's kinda verbose compared to the original. Also, I'm glossing over the exact lifetimes a bit. But it's explicit as to what is happening, and plausible.

Secondly, I'll point out that return value optimization in general has issues when you want to return part of a larger object: let super could allow you to be explicit as to what you want to put in the callers stack frame, even in cases where you're only going to logically return a subset of the data. Eg if I return the latter half of ([u8; 1_000_000], [u8; 1_000_000]) I may still want both parts allocated once, at the top of the call stack so as to avoid copying it over and over.

Hmm.. would unsized_locals actually help here? The RFC says

1 Like

That is an interesting point. So, that scenario is basically, memory-wise, equivalent to returning the entire ([u8; 1_000_000], [u8; 1_000_000]), but for the programmer, equivalent to returning just the second half. The fact that the whole object goes in the parent stack frame is just an implementation detail. It's decoupling the logical return value from the memory-efficiency return value.

I'm still not sure what semantics would be able to accomplish that in a good way, but it seems worth thinking about.

It really just looks like you want what is already provided by so-called return value optimization (RVO).

I'm strongly against changing the language in any direction that complicates reasoning with lifetimes – lifetimes are already hard enough, and I don't even want to imagine how much unsound unsafe code would result from functions that are allowed to violate normal lifetime rules.

As almost always, if you need to violate the memory management principles of Rust, it's overwhelmingly likely that you are doing something wrong, and you should re-design your algorithms and refactor your code instead of try and bend the language so that it allows for more sloppy code.

It would be great if you could provide a concrete, practical use case where your proposal allows something that isn't possible with well-structured, safe code along with rustc's and LLVM's existing optimizations.

1 Like

I think it's enough if let super simply means that the value is put in callee-provided memory, and that returning a value (or part of a value) bound with let super is recursive.

Consider this example:

fn make_pair() -> ([Foo; 1000], [Bar; 1000]) {
    /* ... */
}

fn first_half() -> [Foo; 1000] {
    let super pair: ([Foo; 1000], [Bar; 1000]) = make_pair();
    pair.0
}

fn call_half_pair() -> Foo {
    let mut half_pair: [Foo; 1000] = half_pair();
    mem::replace(&mut half_pair[0], Foo::default())
}

Assumming that Foo and Bar have drop glue, and returning &own to indicate returned ownership (the actual asm doesn't need to return a pointer, as it has a known offset), that would desugar as:

fn make_pair(r: &own MaybeUninit<([Foo; 1000], [Bar; 1000])>) -> &own ([Foo; 1000], [Bar; 1000]) {
     /* ... */
}

fn first_half(r: &own MaybeUninit<([Foo; 1000], [Bar; 1000])>) -> &own [Foo; 1000] {
     let pair = make_pair(r);
     pair.0
     // [Bar; 1000] dropped here
}

fn call_half_pair(r: &own MaybeUninit<Foo>) -> &own Foo {
     let r = MaybeUninit::uninit();
     let half_pair: &own [Foo; 1000] = half_pair(&own r);

    let returned_foo = mem::replace(&mut half_pair[0], Foo::default())

    /* copy returned_foo into r, etc. */

    // half_pair dropped here
}

Critically, note how the last function, call_half_pair, did not use let super, which means it has to copy the one Foo that it does return into the callee provided return slot, as usual.

Now, this may look like "spooky action at a distance", because suddenly callers have to provide some unknown amount of stack space that isn't visible in the function definition. But remember, that's how it works already! There's no way to know the total amount of stack space a function call will use up front from the function declaration. Asking the caller to provide it from their stack frame just expedites the process. You could even provide some low level unsafe intrinsics to determine the layout of the required space (remember that the maximum possible size is known up front if let super is restricted to sized types) and provide it from somewhere other than the stack. That might actually be worthwhile for things similar to Box::new_with, even if you have to shrink the actual allocation later.

I'm not. There's a few ideas I'm talking about in this thread. An important category of them being able to return unsized types. Currently, Rust functions can only return sized values, and it's not at all clear what's the best way to support unsized values. RVO can't help that use-case.

Secondly, RVO can't decide what is the best trade-off when you're logically returning part of a large value. Read the two comments above yours.

My intent here is to explore adding new forms of lifetime and memory management to the Rust language, in part to avoid the need for unsafe code.

Just for the matter of argument how would this work? You have caller frame on top, you have callee frame bellow it (stack grows down on x86-64 if my memory serves..). Where will this Option<String> go? Is it an alloca bellow the callee frame?

let super at the ABI level means the callee has to provide a pointer to some number of bytes of memory from their own stack frame (or potentially somewhere else). Similar idea as how large values are actually returned by writing to a pointer provided by the caller.

So no alloca: rather the opposite actually, as the let super requirements for the entire call tree would be known in advance at compile time.

..ahh, that can be done because the function is inlined, I see. Been a slow-thinker today.

Actually no! My first writeup at the very top of this thread was to do this via inlining. But I then realized I didn't actually need that, as calls to non-inlined functions can just supply the necessary stack space in the frame of the caller via a pointer.

Now, if you do need alloca, I don't see a way to avoid inlining (or something entirely different like returning a closure, CPS, etc.). But many things don't need it.