Mandatory inlined functions for unsized types and 'super lifetime?

A while ago I put together the hoist_temporaries crate and an accompanying blog post as an experiment to delve into exactly the temporaries boilerplate problem :slight_smile: Thoughts very welcome!

Hi, intriguing indeed. Re your 3rd example.. How can the code/compiler decide when/if to drop that upper-case String? Doesn't it seem that main cannot possibly know if v has been assigned?

P.S. my suggestion doesn't solve it any better.. my suggestion requires that a new &own kind of references is introduced.. so that such a String could be returned up the call stack as a reference..

P.P.S. it does seem like all these is about playing with type state.. as if I knew much about type theory in general and type state in particular..

Whether something needs to be dropped is actually (surprisingly!) tracked at runtime: https://doc.rust-lang.org/nomicon/drop-flags.html - in many cases this can be optimised away to static drops, but that's an optimisation rather than a guarantee :slight_smile:

Hmm okay.. Isn't it the case however that in your 3rd example it is upper_if_odd() that knows if v has been assigned but it is main() that needs to decide to drop or not to drop?

If no information is passed implicitly between the two wouldn't it be the case that the boolean flag that would have been needed to decide on dropping/not dropping would be lost as upper_if_odd() exits and would not be known to main()?

If &write was added I suppose one would also wish to use v in main() directly once it has been assigned in upper_if_odd(). It would be quite unexpected however for one's ability to do this to depend on internal details of upper_if_odd(). Rust tradition seems to demand that this valuable info needs to be encoded in upper_if_odd() signature instead...

Indeed, that's why I used &mut Option references instead of &out references that can be implemented in user code. Plus, when the optimizer does get knowledge of the functions involved, it can optimize away the check of the drop flag:

#[no_mangle] pub
fn f1 ()
{
    // `a` is an array of elems with drop glue.
    match lib::f2(&mut None) { a => unsafe {
        ::libc::write(1, a.as_ptr().cast(), 0);
    }}
}

becomes:

f1:
	pushq	%rbx
	subq	$16, %rsp
	movb	$1, (%rsp)
	leaq	1(%rsp), %rbx
	movq	$0, 1(%rsp)
	movw	$0, 9(%rsp)
	movl	$1, %edi
	movq	%rbx, %rsi
	xorl	%edx, %edx
	callq	*write@GOTPCREL(%rip)
	movq	%rbx, %rdi
	callq	core::ptr::drop_in_place
	addq	$16, %rsp
	popq	%rbx
	retq

So, as you can see, the flag check can be elided in some situations. When the buffer has no drop glue, it will be elided too. But when the function that initializes the caller value is not inlined, and if the value has drop glue, then indeed there may be a necessary flag check.


Regarding language sugar, I much prefer @illicitonion's suggestion of let a; … &out a (with an a: &'a out <existential impl Sized> for the callee) than the less usual super 'a notation :slightly_smiling_face:. But in the meantime, we do not need sugar, we just need that existential impl Sized in argument position to work. Once we get that, we'll be able to write this kind of pattern and experiment with it. Then (and imho, only then) would it be useful to discuss any sugaring of the pattern.

1 Like

Nice. For the sake of the argument the following imaginary style

fn foo<super 'a>() -> &own [u8] {...}

can be seen as offering more flexibility:

  • single implicitly passed buffer can be used to construct several objects (connected or perhaps returned as a tuple)
  • it allows to express the fact that fn foo() always returns an object (&own)
  • it also allows to express the fact that fn foo() optionally returns an object (Option<&own ..>)

Compared to &'a out <existential impl Sized> it can also be seen as less economical when there's just one object to return as a whole pointer needs to go both in (the implicit buffer) and out (&own return)

To be clear, note how with &own, you can also do:

fn foo(dst: &mut MaybeUninit<[Foo; MAX_SIZE_NEEDED]>) -> &own [Foo] {...}

I'm using Foo rather than u8, to make clear that in the event of a panic, the &own would not be returned, resulting in the actual initialized parts of dst getting dropped somewhere within foo. Either by the &own value within the function, or some slice initialization mechanism prior to the &own being created.

This is all more boilerplate of course. But with the exception of my original inline fn idea, which allows alloca can be used, that is pretty much what any of my proposals would actually compile down too.

It's also something that can be implemented today: &own can of course be written as a library. You'll miss out on some nice features like re-borrowing and coercions, and it's nicer if you have the nightly-only arbitrary self types. But the basic concept can be experimented with today.

Here's a very incomplete, quick-n-dirty, implementation from one of my projects: https://github.com/petertodd/proofmarshal/blob/edb5f47b488e314acc8988d38687d4631c623e2f/hoard/src/owned/refown.rs

Used here, among other places, with arbitrary self types for convenience: https://github.com/petertodd/proofmarshal/blob/edb5f47b488e314acc8988d38687d4631c623e2f/hoard/src/owned/mod.rs#L14

IIRC some other people have written prototypes too.

2 Likes

Right. It is exactly this kind of non-uniformity that bothers me. As always, I was first thinking about how it could break unsafe code, but this example points out another scary thing: a function that pretends to return something, but it doesn't actually return anything (since it merely writes to a buffer), but it still looks like a normal function call.

That is definitely way too much magic (I don't want my programming language and source code to lie to me) – furthermore, @atagunov 's example still doesn't show anything that's not possible today by passing a &mut [T] explicitly – which, incidentally, describes the behavior of the function way more clearly.

I would expect that to work. No memory is being "allocated" in the usual sense in any of the proposals I've written about in this thread. Rather, at compile time, space is being reserved on the stack, just like any other return value.

The one exception is inline fn with alloca. And even then, I'd expect a usable implementation to undo the alloca allocation after the returned value is used. So in your example, the stack pointer would be reset after v.push(f2()).

I'm not sure what you're trying to say here. At the ABI level, all return values that can't fit into a register or two are actually returned by writing to a return value pointer. You could call that a "buffer".

Other than inline assembly, I don't see how any reasonable, non-platform-specific, unsafe code would ever observe the low-level details of exactly how return values are actually returned at the ABI level.

For a function returning an owned [u8], there are two possible cases:

  1. The maximum length of the size is known at compile time.
  2. The length is known only at run time, with no (reasonable) maximum known at compile time.

In the first case, all these let super proposals are equivalent to returning a FixedVec:

struct FixedVec<T; const N: usize> {
   buf: [MaybeUninit<T>; N],
   len: usize,
}

At the ABI level, the compiler reserves space on the stack, and the caller provides a return slot pointer to write to the FixedVec. Transferal of logical ownership is signified by a successful (non-panicking) function return.

For the latter case, I've proposed at the top of this thread that the function be inlined, allowing alloca to be used to allocate however much stack space is needed. Obviously, this has the potential for a stack overflow. But that's true in almost any language, including Rust.

So, I'm not really clear on why you describe this as "magic" or the programming language "lying to you": these ideas are all relatively small extensions to existing semantics, that don't change the fundamentals of how function calls work under the hood. After all, we already have fn() -> impl Trait.

Your proposal is not ABI level. It affects the human-readable surface of the language. I have no problem with how the compiler implements returning large values or RVO; I have a problem with making that into a leaky abstraction.

If the intention is to be able to return unsized locals by-value, then the proper solution is to allow unsized locals to be passed around by-value, without the need of introducing special cases into the lifetime model. That's why the abstraction is leaky: it conflates machine-generated lifetimes of raw object representations with human-readable lifetimes of abstract entities in the source.

This proposal is not really comparable to impl Trait in this sense, because impl Trait values can be used just like any other value (cf. the context sensitivity of the Vec::push() example above). If anything, impl Trait hides information and adds abstraction, instead of adding information and removing abstraction.

After @H2CO3 complaints I believe that it could enough to just add annotations. We already have the annotations #[no_mangle] an #[track_caller] and have the blocks extern "C"{ ... }. For example we could try to write

#[extend_lifetimes_into_caller_space]
fn foo() -> &Data{
   let data = Data::new();
   &data
}
fn main(){
   let first : &Data = foo();
   for _ in 0..10{
      let some_space : _ ;
      let second= if cond(){
         //We may want to require annotation to make it survive the immediate scope.
         #[bind_lifetimes_extensions(some_space)]
         foo()
      } else {
         #[bind_lifetimes_extensions(some_space)]
         other_foo()
      };
      //some_space gets dropped and with it the `Data`s created by the `foo()`.
   }
}

To be clear, what do you mean by "the context sensitivity of the Vec::push() example above"?

I have re-read the thread and I think I have been missing some point. As @elidupree says we can already return a reference or data by using Cows.

  • If the issue is that we want to allocate a different type than the one we want to reference, then we could generalize Cow. For example to allocate (A,B) and get &A we could impl CowFor<A> for (A,B). This could help with the comment
  • If the issue are that we do not get enough return value optimizations then we could help the compiler with annotations to hint what variables could be allocated in the caller stack, but without changing any meaning.
  • If the issue is that we are too lazy to write Cows or find them to pollute the programming logic, then we could make some macros to wrap them automatically.
  • If the issue is specific to unsized types, as mentioned by @petertodd then I have yet to see more clearly the intended use.

Although I have mixed them before, now I think these are different issues and should be addressed separately.

Sorry, I was referring to dhm's reply.

What specifically does this enable? I had been reading the existential impl Sized as mostly sugar (to avoid needing to write out more verbose types, and to make sugar easier to write), rather than fundamentally changing much?

1 Like

Do you mean to say it could work with inlining and alloca-s?

Without allocas I don't see how it could work.. f2() allocates an object in its parent's frame and returns a reference to it. The reference is stored in a Vec and outlives a single loop iteration. This is done an unknown number of times. It wouldn't be safe to always return reference to the same object - the vector would end up containing duplicate references. And we wouldn't know how large a buffer to allocate if we wanted to allocate a new object on each iteration of the loop..

  • -> impl Trait already exists
  • -> [T] could similarly exist

What has been suggested in this thread is a little more general: a way to create several (potentially linked) objects in (grand)-grand-..parents' frames. If that is useful enough or not is a different question.

I guess my example implied some extra niceties:

  • the total buffer size is automatically computed for the caller and used in the callee
  • the caller does not know nor care what types exactly the callee allocates in the buffer
  • multiple values of different types can be constructed in the same buffer without unsafe code
  • the buffer does not need to be initialized by the caller

...but indeed the sprit is very similar to passing in a &mut[..]

Oh sorry, I misunderstood the definition of f2() that @dhm was using: I thought it was referring to returning logical ownership, not a borrowed reference.

Yes, I would expect returning a borrowed reference to not work, even with alloca. Or to be precise, if we choose semantics where it does work, the lifetime of that reference can't escape the loop body. There's actually a very simple reason for this:

fn f2<super 'a, T>() -> &'a T;

What has ownership of T? Something has to own the T value for the reference to be valid. Meanwhile I'd expect this to work, because push is taking ownership of a T value once per iteration:

fn f2<super 'a, T>() -> &'a own T {
    /* ... */
}

fn foo(n: usize) -> Vec<T> {
    for _ in 0 .. n {
        v.push(f2());
    }
}

I think the most clear semantics for this is via &'a own T, which makes clear what is responsible for running drop.

Second, in implementations that do support alloca, I'd suggest ensuring that alloca can only be called once per function call site, and the effect of that call must be undone prior to that function call site happening again. That would ensure that total stack usage is bounded even in loops. Though actually implementing that rule could be tricky.... I haven't thought about this super carefully. But it feels like modulo references, you can at least always come up with some sequence of data moves to undo the effect of an alloca with the above rule. But needs more thought.

Hmm.. I hoped it would be fairly simple. Say, for each super lifetime we parameterize a function invocation with in the caller

  • make sure that this lifetime outlives no enclosing loop
  • limit all closures that it outlives but is enclosed by to implementing FnOnce

Anything else really?

1 Like

Can't yet put a finger on it.. but it seems this sort of buffers/existentials could help support anonymous enum types for example to return errors Zig-style.

Yes this doesn't take trait objects into account yet.. But if you limit yourselves to monomorphisations only it seems entirely feasible to examine your whole call graph and compute what is the biggest number of bytes you need to store an Error which can arise at this point. Then some parent level function could provide the buffer to allocate such errors in, say on its stack.. so that they could be returned by value..

Yes the idea is still very rough

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.