Indirect borrows in generators

I read @withoutboats' recent post about generators and was thinking about cases where you might want to borrow over a yield point and what could be done about it. It seems to me that a lot of the cases that would be problematic would be when the generator owns some object that has data on the heap and wants to hold a non-consuming iterator over yield points. For example, you’ve got a String and want to hold a Chars<'a> iterator. e.g.

gen fn foo(&self) -> Token {
   let s: String = self.to_string();
   for ch in s.chars() {
     match ch {
       'a' => yield Token::Something,
       …
     }
   }
}

This isn’t valid because s.chars() returns a Chars<'a> which borrows s which is owned by the generator. However, it should actually be fine. Moving the generator will do nothing to the heap allocation of the String and we never mutate the String, so the references held by the Chars<'a> won’t be invalidated.

What I’m wondering is, could the borrow checker be made smart enough to allow this? I was thinking of a special kind of “indirect borrow” that is like a regular borrow, but with the relaxation that a shared, indirect borrow can be held over a yield point. i.e. if String::chars returned a Chars<'indirect a> (or without a new keyword, Chars<'*a>) then the generator could be moved and the borrow over the yield permitted. Effectively, if the function passes the borrow checker as a regular (non-generator) function and also passes the additional requirement that only indirect borrows are held over yield points, then the generator passes the borrow checker.

A fully general system (e.g. this) that allowed arbitrary structs to take advantage of this in order to have self referential structs seems like it would add a lot more complexity to the syntax, but just limiting this to generators holding shared, indirect borrows seems more tractable to me. It would only require one new bit of syntax (declaring a lifetime as indirect).

Changing an existing inherent method or function to mark its return lifetime as indirect, should I think be a non-breaking change. I guess one downside is that changing a trait method’s return type to an indirect lifetime would be a breaking change, so the return value from Deref::deref couldn’t be indirect - although you wouldn't want it to be indirect anyway, since that would prevent implementing Deref for anything that didn’t use heap allocation.

If generators were stabilised with no support for borrowing over yields, something like this could be kept as a potential relaxation of that requirement if not allowing borrows over yields turned out to be too restrictive.

3 Likes

One thing that would help here for Deref is "overconstrained impls," an RFC-accepted (IIRC, unfortunately don't have link) concept that you should be able to implement a trait with a stricter signature than the trait was declared with, and that users of the trait functionality on your concrete type can take advantage of that greater flexibility. Examples would be implementing an unsafe method as safe, returning 'static instead of a limited lifetime, or this case of returning a heap borrow instead of an unrestricted borrow.

Note I said heap borrow, though, instead of indirect; if you only require indirection, you run into the potential problem of smuggling through nonindirect borrows anyway, such as

let x = ...;
let a1 = &x; // direct borrow of x
let a2 = &*a1; // indirect borrow through a1

It might be possible to track this properly, since a2 being live keeps a1 logically live. But it's complicated, because consider

let x: String = ...;
let a1 = &x; // direct borrow of x
let a2 = String::chars(a1); // indirect borrow through a1

This is exactly your example, but just the same involves taking an indirect borrow through a direct borrow. The difference is that the first case is (roughly) &&T -> &T where T is on the local stack, and the latter is (roughly) &Box<T> -> &T and T is not on the local stack (it's on the heap).

Then you also have the escape path of

let x = ...;
let a1 = Box::new(&x);
let a2 = &*a1;
let a3 = &**a2;

where you're putting a local reference onto the heap. So you're going to struggle with any methods on your indirectly borrowed structure, because its methods could potentially have access back to local data.

At a minimum you'd need to extend lifetime subtyping such that &'indirect short &'direct long _ cannot become &'indirect short &'indirect short _ but only &'direct short &'direct short _.

But this is a fractally complicated pit of edge cases that you need to find a system subset within that is probably sound. That's not going to be simple.

1 Like

RFC 3245, right?

1 Like

I was imagining that creating an indirect/heap borrow would be an unsafe operation and that it would only be done in low-level code. e.g. something like RawVec might provide a way to get a &'heap a [T]. Higher level APIs (in Vec and String) could then pass on that type.

So you're going to struggle with any methods on your indirectly borrowed structure, because its methods could potentially have access back to local data.

Something like Chars holds no reference to the String that it came from - it only stores a slice into the String's heap allocation. So there would be no way for it to call methods on the String. If you did try to change Chars to hold a reference to the String as well as to the String's heap storage, then the lifetime on the resulting Chars should have a regular, non-heap lifetime.

This is true for this specific type, but it's only knowable because it's a completely concrete type. For something like Box<T> you don't know what the T is, so you have to solve the problems there.

Additionally, because covariant lifetimes can typically be collapsed together, most types will only carry multiple lifetimes if some of them are invariant (behind mut borrows). This means that most types won't be able to take advantage of indirect lifetimes without breaking changes to disentangle multiple disjoint borrows' lifetimes.

That's the one, yep.

I guess the simplest thing might be to say that the generator only allows 'indirect x where x contains no non-static lifetimes. i.e. you can borrow data that's on the heap, but only if that data itself contains no further borrows.

One more thought: this analysis already needs to basically exist at some basic level such that input references can be held over yield points. I don't recall whether the unstable semicoroutine syntax does this analysis (it supports defining both static/!Unpin and Unpin impl Generator), just trusts the developer to get it correct, or limits you even further to only 'static types across yield points.

(NOT A CONTRIBUTION)

The answer is not necessarily "no" but it would require genuinely novel research. Right now the borrow checker does not have any notion of where a reference points, and has no real way to make the necessary distinction.

If the lifetime is external to the block it's permitted to be stored across yield points in a "static generator", but if it's a lifetime within the block it's not permitted. It's a really simple analysis today that could not support this.

2 Likes

A related issue is how it's impossible to write a function that takes an exclusive borrow and returns a non-exclusive borrow. For example:

fn to_shared(a: &mut u8) -> &u8 {
    a
}

fn main() {
    let mut i = 3;
    let mut ia = to_shared(&mut i);

    // There is no `&mut` around at this point,
    // but the following doesn't compile
    let mut ib = &i;
    dbg!(ia);
}

This is just unsound, or you could call get_mut on a RefCell, convert that to a shared reference and then call .borrow_mut(), getting a shared and an exclusive reference at the same time.

2 Likes

Why would you be allowed to call .borrow_mut()? get_mut takes &mut self, and thus the lifetime analysis you'd need to do to make this even vaguely sound would say that the converted reference still has an exclusive borrow of the RefCell (but not of its contents), and thus you can't use the &mut self methods on RefCell.

I'm not convinced that the analysis you'd have to do to have a sound way of doing this is worth doing (it's very non-trivial, and it runs the risk of breaking code that exploits the current rules), but you could do it soundly if you wanted to.

I wasn't imagining the borrow checker having to figure out where things point, so much as adding an extra boolean flag to borrows then making changes to the standard library to add that flag on selected borrows. The changes to the borrow checker would be around correctly handling supertype/subtype relationships between different borrows with and without that flag.

Here's how I imagine that might work:

    let x = String::new();
    let a1: Box<&'x String> = Box::new(&x);
    let a2: &'indirect(a1) &'x String = &*a1;
    let a3: &'x String = &**a2    

The type of a2 says that it borrows the box (a1), but that it's indirect, so the Box can be moved (but not mutated) without invalidating a2. We've got a reference to a reference. The inner reference is what's in the box. The outer reference is pointing to the contents of the box. Nothing however is pointing at the Box itself, so moving the Box shouldn't be a problem for a2. We do however need to make sure that if we move the Box, that it continues to not be mutated, so wherever it gets moved would need to inherit the fact that there is a borrow of the Box's contents. In the case of generators, this can be achieved by running the borrow checker on the non-generator form of the generator function.

a3 however has dereferenced the indirect borrow, so now we only have the inner borrow, which isn't indirect. At that point only x is borrowed.

I believe this is a special case of lending generators(where the value yielded contains local borrows). Not sure what would it take to support this in the language, though.

Wait, hold on, this is already possible today with lifetimes. See here.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.