Mandatory inlined functions for unsized types and 'super lifetime?

My take. @petertodd may prefer a different flavor

fn f1() {
   // implicitly allocates a buffer for [u8; 100] and passes it to f2
   let a = f2();
}

// similarly to how -> Impl Trait communicates the size of the resulting object
// this function communicates to the callers that it needs to be implicitly
// passed a pointer to a buffer of [u8; 100] size
fn f2<super 'a>() -> &'a[u8] {
   ...
   // implicitly allocates a 2nd buffer for [u8; 1000]
   // passes both implicit buffers to f3
   let (a, b) : (&'a [u8], &[u8]) = f3();
   ...
   a
}

// similarly to how -> Impl Trait communicates the size of the resulting object
// this function communicates to the callers that it needs to be implicitly
// passed two pointers to a buffers - of [u8; 100] and [u8; 1000] sizes respectfully
fn f3<super 'a, super 'b>() -> (&'a [u8], &'b [u8]) {
   ..
   let super 'a a = [u8; 100]; // allocates in f1 stack frame
   let super 'b b = [u8; 1000]; // allocates in f2 stack frame
   ...
   (&a, &b)
}

The main thing needed to do that, and imho, it wouldn't really need that much sugar, would be existential types in argument position.

We could achieve that with type slot_a = impl Sized; and then taking an &'a mut Option<slot_a> that we .get_or_insert()-initialize in the callee's body, but I have tested that and currently, a get_or_insert() does not result in a defining usage of the existential type, it results, instead in a cycle.


The issue I see with this sugar, is the lack of control over when exactly the caller-allocated memory is allocated. For instance, how would that work with:

let mut v = vec![];
for _ in 0 .. n {
    v.push(f2());
}

We agree that that should not compile, right? (None slot reused).

And yet an if cond() { v.push(f2()); } ideally would compile (None slot could be put outside the if).

  • this would, however, result in that much stack usage even when cond() == false.

So the sugar should, at least, offer some way to manually provide the slot, so as to control its lifetime.

And I feel that we don't gain that much compared to manually writing the &mut Option slots; the way to gain some sugar could come from a proc-macro at that point:

#[with_slots(a: 'a)] // fn f2<'a> (a: &'a mut Option<_>, …) -> &'a [u8]
fn f2 (…) -> &'a [u8]
{
    let (a, b) = #[with_slots(a, _)] f3(…); // f3((a, &mut None), …);
    …
}

#[with_slots(a: 'a, b: 'b)] // f3<'a, 'b> ((a: …, b: …), …)
fn f3 (…) -> (&'a [u8], &'b [u8])
{
    set!(a = …); // let a = a.get_or_insert(…);
    set!(b = …);
    // …
    (a, b)
}

fn advanced (v: &mut Vec<_>)
{
    slot!(a); // let ref mut a = None;
    if cond() {
        v.push(#[with_slots(a)] f2(…));
        // or:
        v.push(f2(a, …));
    }
}

My naive plan was (sorry for not spelling it out earlier):

  • f2() in my example is a shorthand for f2::<'a>()
  • f1() when invoking f2() "provides" this lifetime somehow
  • if f2() is invoked in a loop it is an error to "provide" a lifetime that outlives a single loop iteration
  • the choice of this lifetime determines when the memory slot is "allocated" and "freed"
1 Like

What about using labelled scopes for this?

fn foo1<'a>(x:usize) -> &' a Data {
	///Creates a `Data` variable into a reserved space of the caller stack with lifetime 'a.
	let data:Data = outer_stack!('a,Data::new(x));
	&data
}

fn foo2<'a>(x:usize) -> &' a Data {
	let data:Data = outer_stack!('a,Data::new(x));
	&data
}

fn bar(){
	for i in 0..10 {
		'a:{
			let rd :&Data = if cond(i) {
				foo1::<'a>(i);
			} else {
				foo2::<'a>(i);
			};
			something(rd)
		}// *rd dropped at the end of the 'a scope
	}
}

I hoped it could be even more automatic:

  • if the caller never uses that lifetime it can "allocate" the buffer immediately before invoking the callee and "free" it immediately after
  • if the caller is using the lifetime then it can "allocate"/"free" according to the bounds of that lifetime

Errors:

  • if that lifetime outlives the invocation of caller and is not caller's own super lifetime parameter
  • if the callee is invoked unknown number of times - like in a loop - within the duration of that lifetime

So in your example the exact placement of "allocation"/"release" of the two buffers involved (one for foo1 and one for foo2) would be dictated by when lifetime associated with rd starts and ends.

The more I think about this, the more I feel like "return references" are the right answer here.

We currently have the following matrix:

read-only read-write write-only
owned let foo = bar; let mut foo = bar; let foo;
reference &foo &mut foo ???

It feels like what we're trying to do in this thread is fill in the last cell in that matrix. I'll strawman this as &write (or perhaps &uninit).

Concretely, this code works fine because the compiler can do enough analysis to verify that v is initialised before any possible reads:

fn main() {
    let mut r = "blah";

    let v;
    if r.len() % 2 == 1 {
        v = r.to_uppercase();
        r = &v;
    }
    println!("{}", r);
}

But there's currently no formalisation for saying that a reference can be uninitialised, and performing this analysis.

MaybeUninit is working along the same lines, but feels like a it's catering towards a very different kind of analysis. MaybeUninit is saying "Here's some memory, the caller will manage its initialisation out of band" (hence its unsafety).

&write is saying "The compiler can already do initialisation analysis, I'd like it to expand the scope of where it does that analysis".

You can currently write:

fn main() {
    let mut v = String::new();
    println!("{}", upper_if_odd("blah", &mut v));
}

fn upper_if_odd<'a>(s: &'a str, backing_store: &'a mut String) -> &'a str {
    if s.len() % 2 == 1 { 
        *backing_store = s.to_uppercase();
        backing_store
    } else {
        s   
    }   
}

&write would enable writing something like:

fn main() {
    let v;
    println!("{}", upper_if_odd("blah", &write v));
}

fn upper_if_odd<'a>(s: &'a str, backing_store: &'a write String) -> &'a str {
    if s.len() % 2 == 1 {
        *backing_store = s.to_uppercase();
        backing_store
    } else {
        s
    }
}

The caller declares the binding, so has control over scope and dropping, and the &write effectively instructs the compiler to treat upper_if_odd as inlined for the purposes of its initialisation analysis.

By framing this as a kind of borrow rather than a new kind of returned lifetime, and by treating it as an instruction to existing analysis, this feels like a much smaller and more consistent addition the the language.

Its concrete benefits over &mut are:

  • Avoids the boilerplate and overhead of unnecessary Options.
  • Avoids adding extra unsafe with MaybeUninits.

I feel like this is low enough boilerplate to avoid needing extra sugar as @dhm is suggesting around with_slots...

It opens up the possibility for &write impl Trait, and if we end up with auto-generated sum types (a la RFC 2414, and some IRLO threads) these could extend fairly naturally to &write to drive the ergonomics for multiply-sized values in a consistent way, similar to how @petertodd was describing with automatic enum derivation.

I'm not sure I see a clear path to unsized support, though...

1 Like

A while ago I put together the hoist_temporaries crate and an accompanying blog post as an experiment to delve into exactly the temporaries boilerplate problem :slight_smile: Thoughts very welcome!

Hi, intriguing indeed. Re your 3rd example.. How can the code/compiler decide when/if to drop that upper-case String? Doesn't it seem that main cannot possibly know if v has been assigned?

P.S. my suggestion doesn't solve it any better.. my suggestion requires that a new &own kind of references is introduced.. so that such a String could be returned up the call stack as a reference..

P.P.S. it does seem like all these is about playing with type state.. as if I knew much about type theory in general and type state in particular..

Whether something needs to be dropped is actually (surprisingly!) tracked at runtime: Drop Flags - The Rustonomicon - in many cases this can be optimised away to static drops, but that's an optimisation rather than a guarantee :slight_smile:

Hmm okay.. Isn't it the case however that in your 3rd example it is upper_if_odd() that knows if v has been assigned but it is main() that needs to decide to drop or not to drop?

If no information is passed implicitly between the two wouldn't it be the case that the boolean flag that would have been needed to decide on dropping/not dropping would be lost as upper_if_odd() exits and would not be known to main()?

If &write was added I suppose one would also wish to use v in main() directly once it has been assigned in upper_if_odd(). It would be quite unexpected however for one's ability to do this to depend on internal details of upper_if_odd(). Rust tradition seems to demand that this valuable info needs to be encoded in upper_if_odd() signature instead...

Indeed, that's why I used &mut Option references instead of &out references that can be implemented in user code. Plus, when the optimizer does get knowledge of the functions involved, it can optimize away the check of the drop flag:

#[no_mangle] pub
fn f1 ()
{
    // `a` is an array of elems with drop glue.
    match lib::f2(&mut None) { a => unsafe {
        ::libc::write(1, a.as_ptr().cast(), 0);
    }}
}

becomes:

f1:
	pushq	%rbx
	subq	$16, %rsp
	movb	$1, (%rsp)
	leaq	1(%rsp), %rbx
	movq	$0, 1(%rsp)
	movw	$0, 9(%rsp)
	movl	$1, %edi
	movq	%rbx, %rsi
	xorl	%edx, %edx
	callq	*write@GOTPCREL(%rip)
	movq	%rbx, %rdi
	callq	core::ptr::drop_in_place
	addq	$16, %rsp
	popq	%rbx
	retq

So, as you can see, the flag check can be elided in some situations. When the buffer has no drop glue, it will be elided too. But when the function that initializes the caller value is not inlined, and if the value has drop glue, then indeed there may be a necessary flag check.


Regarding language sugar, I much prefer @illicitonion's suggestion of let a; … &out a (with an a: &'a out <existential impl Sized> for the callee) than the less usual super 'a notation :slightly_smiling_face:. But in the meantime, we do not need sugar, we just need that existential impl Sized in argument position to work. Once we get that, we'll be able to write this kind of pattern and experiment with it. Then (and imho, only then) would it be useful to discuss any sugaring of the pattern.

1 Like

Nice. For the sake of the argument the following imaginary style

fn foo<super 'a>() -> &own [u8] {...}

can be seen as offering more flexibility:

  • single implicitly passed buffer can be used to construct several objects (connected or perhaps returned as a tuple)
  • it allows to express the fact that fn foo() always returns an object (&own)
  • it also allows to express the fact that fn foo() optionally returns an object (Option<&own ..>)

Compared to &'a out <existential impl Sized> it can also be seen as less economical when there's just one object to return as a whole pointer needs to go both in (the implicit buffer) and out (&own return)

To be clear, note how with &own, you can also do:

fn foo(dst: &mut MaybeUninit<[Foo; MAX_SIZE_NEEDED]>) -> &own [Foo] {...}

I'm using Foo rather than u8, to make clear that in the event of a panic, the &own would not be returned, resulting in the actual initialized parts of dst getting dropped somewhere within foo. Either by the &own value within the function, or some slice initialization mechanism prior to the &own being created.

This is all more boilerplate of course. But with the exception of my original inline fn idea, which allows alloca can be used, that is pretty much what any of my proposals would actually compile down too.

It's also something that can be implemented today: &own can of course be written as a library. You'll miss out on some nice features like re-borrowing and coercions, and it's nicer if you have the nightly-only arbitrary self types. But the basic concept can be experimented with today.

Here's a very incomplete, quick-n-dirty, implementation from one of my projects: https://github.com/petertodd/proofmarshal/blob/edb5f47b488e314acc8988d38687d4631c623e2f/hoard/src/owned/refown.rs

Used here, among other places, with arbitrary self types for convenience: https://github.com/petertodd/proofmarshal/blob/edb5f47b488e314acc8988d38687d4631c623e2f/hoard/src/owned/mod.rs#L14

IIRC some other people have written prototypes too.

2 Likes

Right. It is exactly this kind of non-uniformity that bothers me. As always, I was first thinking about how it could break unsafe code, but this example points out another scary thing: a function that pretends to return something, but it doesn't actually return anything (since it merely writes to a buffer), but it still looks like a normal function call.

That is definitely way too much magic (I don't want my programming language and source code to lie to me) – furthermore, @atagunov 's example still doesn't show anything that's not possible today by passing a &mut [T] explicitly – which, incidentally, describes the behavior of the function way more clearly.

I would expect that to work. No memory is being "allocated" in the usual sense in any of the proposals I've written about in this thread. Rather, at compile time, space is being reserved on the stack, just like any other return value.

The one exception is inline fn with alloca. And even then, I'd expect a usable implementation to undo the alloca allocation after the returned value is used. So in your example, the stack pointer would be reset after v.push(f2()).

I'm not sure what you're trying to say here. At the ABI level, all return values that can't fit into a register or two are actually returned by writing to a return value pointer. You could call that a "buffer".

Other than inline assembly, I don't see how any reasonable, non-platform-specific, unsafe code would ever observe the low-level details of exactly how return values are actually returned at the ABI level.

For a function returning an owned [u8], there are two possible cases:

  1. The maximum length of the size is known at compile time.
  2. The length is known only at run time, with no (reasonable) maximum known at compile time.

In the first case, all these let super proposals are equivalent to returning a FixedVec:

struct FixedVec<T; const N: usize> {
   buf: [MaybeUninit<T>; N],
   len: usize,
}

At the ABI level, the compiler reserves space on the stack, and the caller provides a return slot pointer to write to the FixedVec. Transferal of logical ownership is signified by a successful (non-panicking) function return.

For the latter case, I've proposed at the top of this thread that the function be inlined, allowing alloca to be used to allocate however much stack space is needed. Obviously, this has the potential for a stack overflow. But that's true in almost any language, including Rust.

So, I'm not really clear on why you describe this as "magic" or the programming language "lying to you": these ideas are all relatively small extensions to existing semantics, that don't change the fundamentals of how function calls work under the hood. After all, we already have fn() -> impl Trait.

Your proposal is not ABI level. It affects the human-readable surface of the language. I have no problem with how the compiler implements returning large values or RVO; I have a problem with making that into a leaky abstraction.

If the intention is to be able to return unsized locals by-value, then the proper solution is to allow unsized locals to be passed around by-value, without the need of introducing special cases into the lifetime model. That's why the abstraction is leaky: it conflates machine-generated lifetimes of raw object representations with human-readable lifetimes of abstract entities in the source.

This proposal is not really comparable to impl Trait in this sense, because impl Trait values can be used just like any other value (cf. the context sensitivity of the Vec::push() example above). If anything, impl Trait hides information and adds abstraction, instead of adding information and removing abstraction.

After @H2CO3 complaints I believe that it could enough to just add annotations. We already have the annotations #[no_mangle] an #[track_caller] and have the blocks extern "C"{ ... }. For example we could try to write

#[extend_lifetimes_into_caller_space]
fn foo() -> &Data{
   let data = Data::new();
   &data
}
fn main(){
   let first : &Data = foo();
   for _ in 0..10{
      let some_space : _ ;
      let second= if cond(){
         //We may want to require annotation to make it survive the immediate scope.
         #[bind_lifetimes_extensions(some_space)]
         foo()
      } else {
         #[bind_lifetimes_extensions(some_space)]
         other_foo()
      };
      //some_space gets dropped and with it the `Data`s created by the `foo()`.
   }
}

To be clear, what do you mean by "the context sensitivity of the Vec::push() example above"?

I have re-read the thread and I think I have been missing some point. As @elidupree says we can already return a reference or data by using Cows.

  • If the issue is that we want to allocate a different type than the one we want to reference, then we could generalize Cow. For example to allocate (A,B) and get &A we could impl CowFor<A> for (A,B). This could help with the comment
  • If the issue are that we do not get enough return value optimizations then we could help the compiler with annotations to hint what variables could be allocated in the caller stack, but without changing any meaning.
  • If the issue is that we are too lazy to write Cows or find them to pollute the programming logic, then we could make some macros to wrap them automatically.
  • If the issue is specific to unsized types, as mentioned by @petertodd then I have yet to see more clearly the intended use.

Although I have mixed them before, now I think these are different issues and should be addressed separately.

Sorry, I was referring to dhm's reply.