Pre-RFC: Encapsulating private lifetimes


#1

I read recently @matklad’s thought provoking blog post Encapsulating Lifetime of the Field https://matklad.github.io/2018/05/04/encapsulating-lifetime-of-the-field.html. This got me thinking about lifetime ergonomics. I think that while Rust’s ergonomics story has generally improved, lifetimes have still a long way to go both in ergonomics and expressiveness. The blog post hilights a certain ergonomics problem with lifetimes in structs. I recommend reading it. This provoked me to think about the problem; here’s an attempt to improve the situation!

It’s late o’clock and I’m getting tired so pardon me if there is a bunch of errors and brainfarts. The rationale and prior art parts still need writing and the whole thing needs feedback, but I think I wrote enough for the readers to get the gist to start receiving feedback.

Summary

Enable data types to hide lifetimes that are not part of their public API: that is, a lifetime is not required to appear in the generic type signature if it is declared “private” to that type, using syntax priv 'p: 'a in the struct/enum body. All private lifetimes must be declared to outlive some public lifetime. That means that the shortest lifetime that ultimately limits the lifespan of the containing type is always public and a part of the generic type signature of the containing type. As the private lifetime declared as priv 'p: 'a is not in scope outside of the containing type itself, it doesn’t unify with any lifetime expect for lifetimes that are derived from it.

Motivation

Rust has a problem of proliferation of lifetimes. Data types are required to have in their signature the lifetimes of the data types they contain. For example, as one can see, struct Foo<'a> { s: &'a str }; has to declare the lifetime 'a of it’s field s. When nesting types, this may cause ugly and cumbersome type signatures, such as Context<'f, 'a: 'f>. This problem manifests itself especially with invariant (&mut T) lifetimes, as they cannot be bundled up under a single lifetime in the signature. They must always declared separately for soundness reasons. (Bundling them under a short-living lifetime would violate the Liskov substitution principle: one would be able to smuggle a short-living type behind a reference with a long lifetime.)

An example of proliferating lifetimes:

struct Foo<'s> {
    string: &'s mut String,	
}

struct Bar<'f, 's: 'f> {
    foo: &'f mut Foo<'s>
}

struct Hoge<'b, 'f: 'b, 's: 'f> {
    bar: &'b mut Bar<'f, 's>
}

// As you see, the declarations are getting longer and longer!
struct Piyo<'p, 'b: 'p, 'f: 'b, 's: 'f> {
    hoge: &'p mut Hoge<'b, 'f, 's>
}

fn main() {
    let mut string = String::new();
    {
        let mut foo = Foo { string: &mut string };
        {
            let mut bar = Bar { foo: &mut foo };
            {
                let mut hoge = Hoge { bar: &mut bar };
                {
                    let mut piyo = Piyo { hoge: &mut hoge };
                }
            }
        }
    }
}

Generally the lifetimes signal useful and valid information, but in some cases the only relevant piece of information from the viewpoint of the user is the shortest lifetime in the signature, as that limits the span the type can live. Sometimes other, longer lifetimes in the signature are also relevant: for example, if a struct allows mutably accessing one of its fields that contains a long lifetime. One can’t soundly provide write access to a field “as if” the lifetime would be shorter than it actually is, so the long lifetime is relevant from the viewpoint of the user-facing API.

However, data types can also contain lifetimes that aren’t relevant to the user. They may be lifetimes of fields that are private and can be considered as implementation details, or they may be lifetimes of user-facing fields, but the user doesn’t care about the actual lifetime.

It wold be highly desirable to enable data type authors to stop the proliferation of “excess” lifetimes if they so wish. It would enable more succint API’s and improve the ergonomics of using types with lifetimes.

Guide-level explanation

(See the code snippet above).

As one can see when nesting types, lifetimes are not easily contained or “encapsulated” – they leak through type abstractions and proliferate. To fix the problem, this RFC provides a way to hide lifetimes in the type signatures – this equals to saying that the exact lifetime is an implementation detail and the users of the type shouldn’t bother to think about it. As lifetimes guard the correctness of the lifespans of our values (most often, references, preventing us from dangling pointer bugs), not all of them can be hidden. Specifically, every type has a set of lifetimes that are the shortest ones. There might be other lifetimes that outlive the shorter ones, and only those kind of lifetimes can be hidden. Why? Because hiding them doesn’t affect the span our value is allowed live – it’s already restricted by the shorter ones.

One can hide lifetimes using the syntax priv 'ss: 'f in type definitions. See how this prevents the proliferation in deeply nested types:

struct Foo<'s> {
    string: &'s mut String,	
}

struct Bar<'f> {
    priv 'ss: 'f,
    foo: &'f mut Foo<'ss>,
}

struct Hoge<'b> {
    priv 'ff: 'b,
    bar: &'b mut Bar<'ff>,
}

// The signatures stay nice and tidy no matter how deeply we nest!
struct Piyo<'p> {
    priv 'bb: 'p,
    hoge: &'p mut Hoge<'bb>,
}

fn main() {
    let mut string = String::new();
    {
        let mut foo = Foo { string: &mut string };
        {
            let mut bar = Bar { foo: &mut foo };
            {
                let mut hoge = Hoge { bar: &mut bar };
                {
                    let mut piyo = Piyo { hoge: &mut hoge };
                }
            }
        }
    }
}

As you can see, the lifetime that is part of the signature of the type, is the lifetime of the deepest scope. Our types contain references to longer-living scopes, but those don’t matter, since when the lifetime of those ends, our types in the deeply nested scopes are long gone.

Note, however, that using private lifetimes brings forth some restrictions too. With public lifetimes you are able to do this:

fn replace_hoge<'a, 'b, 'c, 'd>(piyo: &mut Piyo<'a, 'b, 'c, 'd>, new_hoge: &'a mut Hoge<'b, 'c, 'd>) {
    piyo.hoge = new_hoge;
}

but with private lifetimes:

fn replace_hoge<'a, 'b>(piyo: &mut Piyo<'a>, new_hoge: &'a mut Hoge<'b>) {
    piyo.hoge = new_hoge; // Lifetime mismatch! `b and 'bb don't match!
}

Why does this happen? Note that because the lifetime 'bb is private, we lose the ability to equate it to lifetime 'b. The actual lifetime is “erased”, so to speak. If the compiler would allow placing stuff with arbitrary lifetimes to piyo.hoge, we could try and smuggle a shortly-lived object in there. If that object were to be deallocated or invalidated some other way before the piyo is, we have a dangling pointer! So writing to references with private references is more restricted than normally, because we can’t name the lifetime of the reference.

However, that doesn’t mean that we can’t mutate types with private lifetimes at all. Let Hoge have an additional field counter: u32. We can do:

fn mut_hoge<'a>(piyo: &mut Piyo<'a>) {
    piyo.hoge.counter += 1;
}

Now, let’s imagine that Piyo's hoge field is actually Option<&'p mut Hoge<'bb>>. We can also do:

fn mut_hoge<'a>(piyo: &mut Piyo<'a>) {
    let h = piyo.hoge.take();
    piyo.hoge = h.map(|h| { h.counter += 1; h });
}

This only works because h has the exact same lifetime has hoge! The compiler knows that the lifetime 'bb lives longer than 'a, so it’s valid everywhere 'a is valid. However, the knowledge how long it actually lives is erased, so the compiler also ensures that 'bb can never outlive 'a; it essentially lives the same time, but without being equal to 'a.

Reference-level explanation

Note: I might need some help with mapping the corner cases and to understand if there is something that’s very hard to implement with the current borrow checker.

  • Implement syntax priv 'a: 'b in struct and enum bodies. priv is a reserved keyword and it is unused at the moment, so it shouldn’t cause any parsing complications. If we want in the future to use priv for something else, having a limited scope of type declarations and having lifetime tick ' coming right after it prevents blocking other uses.
  • priv 'a: 'b introduces the lifetime 'a, called a private lifetime that is in scope inside the type declaration body.
    • Note that 'long: 'short means 'long outlives 'short. According to the Liskov subsitution principle if you can also always use B in place of A, then B can be considered as a subtype of A. As it is sound to use a (immutable) reference to a longer-living object in place of a reference to a shorter living object, that means that longer lifetimes in rust can be considered as subtypes of shorter ones they outlive.
  • The latter part of the outlives relationship (e.g. priv 'a: 'b) must be a lifetime that appears in the generic type signature of the type, for example 'b in struct Foo<'b> {}. This means that:
    • All private lifetimes are always outlived by some public lifetime.
    • The smallest lifetime that constraints the type is in the set of public lifetimes.
      • Thus, the tighest limit of the type lifespan is never hidden
  • The type fields are checked according to normal outlives rules
  • Private lifetimes are not nameable outside the type declaration.

All in all, this design allows hiding all but the “most tight” lifetime of a type, when declaring nested types, or leaving part of the lifetimes public and hiding a part.

The main troublemaker why the lifetime proliferation happens, is variance, or rather, the lack of it, so let us start with some notes on variance. &mut T references are invariant with regards to T; that means that no subtype relation between T’s with different lifetimes can happen. The reason is the unfortunate interaction between mutability and subtyping. Let’s say one has a reference &mut Foo<'short> . if one would be able to use &mut Foo<'long> in place of it, bad things would happen: using a variable typed &mut Foo<'short> one is able to replace the original value of Foo<'long> with a value Foo<'short>. The shorter-lived value could be invalidated prematurely as &mut Foo<'long> still existed, causing UB. Thinking it in a slightly different way, the direction of variance depends on the input and output positions, or read and write capabilities of types; &mut T supports both reading and writing, so it must be invariant.

The reason for the proliferation is that because invariant types have incompatible lifetimes, every one must be mentioned – they can’t be “bundled up” under a single lifetime that is the supertype of the rest.

However, with the design proposed in this RFC, the hidden types need not to be mentioned in the signature. They become unnameable outside the types – one could call them existential lifetimes. Since the names of the lifetimes cannot be mentioned, one is not able to come up externally with a lifetime that would be compatible with the private lifetime.

The lifetime checking would be done similarly than in functions with for<'a> F: Fn(&'a T) at the moment: nothing can be assumed about the lifetime except that it lives the current scope. It can’t live in wider scope than the public lifetime it outlives, because the exact point it expires isn’t known. However, it is allowed to live as long as the public lifetime it outlives.

Drawbacks

  • Lifetimes confuse people, and the reason why some lifetimes need to be hidden/encapsulated while others can just be “bundled up” within a shorter lifetime (it’s because of the difference in variance) can escape people.
  • There might be no perfect syntax for hiding lifetimes using the current reserved keywords
  • Admittedly the proliferation of lifetimes is a problem, but does hiding proposed in this RFC lifetimes pull it’s weight as an additional feature?

Rationale and alternatives

TODO

Prior art

TODO. Some notes:

  • The lifetime system of Rust is quite unique; I’m not aware of any prior art.

  • The problems around lifetime proliferation have to do with variance. C#, Kotlin etc.

Unresolved questions

  • I’m not 100% sure that there isn’t corner cases if two fields of the same struct have the same private lifetime. Can one assigned to another? Is it unsound for these lifetimes to match?
  • What will the exact syntax be?
  • Is there any corner cases that haven’t been thought about?

#2

There’s a lot to chew on here, but if I’m interpreting this correctly, this would also be a massive game-changer in supporting self-referential structs. If the private lifetimes have generative existential semantics, I could possibly eliminate most of the closures in rental, making it behave much more like a natural struct. I’ll have to mull this over some more, but very compelling regardless.


#3

Interesting! I think that if you erase the lifetime with a trait object, as in my original post, you should be able to overwrite it with other trait object with a different concealed lifetime. Why can’t we do the same here, in general?

With priv ‘a suggestion I am afraid it breaks down if you have two fields with he same priv lifetime and overwrite one of them.

Perhaps this could work if we hide lifetime on the type basis and not on the struct basis? Like struct Foo<‘b> { bar: priv<‘a: ‘b> &’b Bar<‘a> }?


#4

Terminology note: I really like how the word “conceal” fits much better than “erase” for this situation.


#5

One weird thing about this is that it makes “private” the explicitly-marked case, whereas every other visibility rule in Rust defaults to private and has to made pub explicitly. I think it’s not the best to try and formulate this in terms of visibility.


#6

How often does the case where you want to hide a lifetime happen? I don’t think I’ve seen it myself, so is it worth the complexity and effort? It sounds like a corner case to me and as mentioned, marking something as private is kind of against the general rule in Rust, it might have some corner cases itself…


#7

At least from my experience, this case comes up relatively often, especially in “applications”. Though, it took me quite some time to understand where exactly the problem is.


#8

IME the desire to “hide a lifetime” is most common when you want to remove lifetimes completely from the struct’s interface. That is, something like what rental or owning-ref does- a struct that owns something while simultaneously holding references into it.

I’m not sure how this proposal could be applied to something like that. The only possible public lifetime that doesn’t come from the struct declaration is 'static, which doesn’t seem right. I suspect it could be made to work, probably by limiting the information private lifetimes expose even further, as from the perspective of the struct the owned data appears to live forever. But even then it feels wrong.

There are also really two different ways a struct might hold a reference into itself. One allows the struct to move, because the lifetime in question is really that of some heap allocation. The other requires the struct to be pinned, because the lifetime in question is really that of the struct itself.

So perhaps this would be doable with both private lifetimes and a way to name those new “special” lifetimes? Maybe Box/Rc/etc and Pin could expose some sort of “output lifetime.” Box's would last until it’s moved out of or dropped, while Pin's would last until the object itself is dropped.


#9

I suppose I should expand on why a feature like this would be so beneficial to rental and similar use cases.

As you point out, the only lifetime we can currently use that is not a parameter of a struct itself is 'static. This is what rental currently uses internally. All self-ref lifetimes are erased and replaced with 'static. As an aside, this causes problems in that 'static has specific implications that we don’t actually want, since we’re just using it as a fake lifetime anyway, and why I proposed a new 'unsafe lifetime instead, but that got postponed pending elaboration of the semantics of unsafe code in general.

At any rate, we obviously can’t allow users to borrow fields out of the struct with the fake 'static lifetimes, since they’re completely false. Instead, what I have to do is only allow access to the fields via a closure that accepts the field as an argument, bounded with an HRTB lifetime. Since the compiler can’t know what lifetime will actually be passed to the closure, it is effectively existential with respect to the closure’s surrounding context, and the compiler won’t allow anything to unify with it, thus ensuring safety.

A proposal like this, again if I’m reading it correctly, would allow me to produce such existential lifetimes without the need for a closure, ensuring that the compiler would not unify them with anything inappropriate. If the lifetime is covariant, it can be reborrowed down to a shorter lifetime (which is what rental 0.5 allows now), or if it’s invariant, it should only unify with values derived from itself.

Eliminating the need for closures to access the struct fields is a huge ergonomic win, and makes it feel much more like a natural type. To be clear though, the rental macro would still be necessary to produce the type in the first place, to ensure all the invariants are properly upheld.


#10

That is a neat trick; Swift uses a less formal (in the absence of lifetimes) but almost-identical variant of this in its withUnsafePointer family of APIs, by the way.


#11

Ah, excellent. I was kind of approaching things from the opposite direction- the Box/Pin “output lifetimes” might allow you to express the type without the rental macro or even the 'unsafe lifetime.

But, now that I have a better idea of what you’re getting at, a question- doesn’t this proposal make the supertype of private lifetimes public? That is, if you use the 'static trick, wouldn’t the compiler unify private lifetimes with anything? Or is there some sort of invariance trick that would solve that problem?


#12

Yeah, that’s the main sticking point I’ve been thinking about. I believe that, to be usable for my purposes, I’d have to be able to bound the private lifetimes in reverse, such that they’re assumed to not outlive the public ones. This would likely require more where-like syntax, such as priv 'a where 'b: 'a or some such thing.

Thinking about it more, an exists keyword would probably be better for this entire concept, possible added as part of a new epoch.


#13

Bikeshed: along the lines of in-band lifetimes, maybe we don’t even need a new keyword and could just introduce them in a where clause. Or maybe that’d be too mysterious for this new kind of lifetime.


#14

I agree. I admittedly have not checked in depth, but it intuitively seems like this very neat idea could somehow join forces with the in-band lifetime effort in a fashion which introduces less radical syntax changes.