I read recently @matklad’s thought provoking blog post Encapsulating Lifetime of the Field https://matklad.github.io/2018/05/04/encapsulating-lifetime-of-the-field.html. This got me thinking about lifetime ergonomics. I think that while Rust’s ergonomics story has generally improved, lifetimes have still a long way to go both in ergonomics and expressiveness. The blog post hilights a certain ergonomics problem with lifetimes in structs. I recommend reading it. This provoked me to think about the problem; here’s an attempt to improve the situation!
It’s late o’clock and I’m getting tired so pardon me if there is a bunch of errors and brainfarts. The rationale and prior art parts still need writing and the whole thing needs feedback, but I think I wrote enough for the readers to get the gist to start receiving feedback.
Summary
Enable data types to hide lifetimes that are not part of their public API: that is, a lifetime is not required to appear in the generic type signature if it is declared “private” to that type, using syntax priv 'p: 'a in the struct/enum body. All private lifetimes must be declared to outlive some public lifetime. That means that the shortest lifetime that ultimately limits the lifespan of the containing type is always public and a part of the generic type signature of the containing type. As the private lifetime declared as priv 'p: 'a is not in scope outside of the containing type itself, it doesn’t unify with any lifetime expect for lifetimes that are derived from it.
Motivation
Rust has a problem of proliferation of lifetimes. Data types are required to have in their signature the lifetimes of the data types they contain. For example, as one can see, struct Foo<'a> { s: &'a str }; has to declare the lifetime 'a of it’s field s. When nesting types, this may cause ugly and cumbersome type signatures, such as Context<'f, 'a: 'f>. This problem manifests itself especially with invariant (&mut T) lifetimes, as they cannot be bundled up under a single lifetime in the signature. They must always declared separately for soundness reasons. (Bundling them under a short-living lifetime would violate the Liskov substitution principle: one would be able to smuggle a short-living type behind a reference with a long lifetime.)
An example of proliferating lifetimes:
struct Foo<'s> {
string: &'s mut String,
}
struct Bar<'f, 's: 'f> {
foo: &'f mut Foo<'s>
}
struct Hoge<'b, 'f: 'b, 's: 'f> {
bar: &'b mut Bar<'f, 's>
}
// As you see, the declarations are getting longer and longer!
struct Piyo<'p, 'b: 'p, 'f: 'b, 's: 'f> {
hoge: &'p mut Hoge<'b, 'f, 's>
}
fn main() {
let mut string = String::new();
{
let mut foo = Foo { string: &mut string };
{
let mut bar = Bar { foo: &mut foo };
{
let mut hoge = Hoge { bar: &mut bar };
{
let mut piyo = Piyo { hoge: &mut hoge };
}
}
}
}
}
Generally the lifetimes signal useful and valid information, but in some cases the only relevant piece of information from the viewpoint of the user is the shortest lifetime in the signature, as that limits the span the type can live. Sometimes other, longer lifetimes in the signature are also relevant: for example, if a struct allows mutably accessing one of its fields that contains a long lifetime. One can’t soundly provide write access to a field “as if” the lifetime would be shorter than it actually is, so the long lifetime is relevant from the viewpoint of the user-facing API.
However, data types can also contain lifetimes that aren’t relevant to the user. They may be lifetimes of fields that are private and can be considered as implementation details, or they may be lifetimes of user-facing fields, but the user doesn’t care about the actual lifetime.
It wold be highly desirable to enable data type authors to stop the proliferation of “excess” lifetimes if they so wish. It would enable more succint API’s and improve the ergonomics of using types with lifetimes.
Guide-level explanation
(See the code snippet above).
As one can see when nesting types, lifetimes are not easily contained or “encapsulated” – they leak through type abstractions and proliferate. To fix the problem, this RFC provides a way to hide lifetimes in the type signatures – this equals to saying that the exact lifetime is an implementation detail and the users of the type shouldn’t bother to think about it. As lifetimes guard the correctness of the lifespans of our values (most often, references, preventing us from dangling pointer bugs), not all of them can be hidden. Specifically, every type has a set of lifetimes that are the shortest ones. There might be other lifetimes that outlive the shorter ones, and only those kind of lifetimes can be hidden. Why? Because hiding them doesn’t affect the span our value is allowed live – it’s already restricted by the shorter ones.
One can hide lifetimes using the syntax priv 'ss: 'f in type definitions. See how this prevents the proliferation in deeply nested types:
struct Foo<'s> {
string: &'s mut String,
}
struct Bar<'f> {
priv 'ss: 'f,
foo: &'f mut Foo<'ss>,
}
struct Hoge<'b> {
priv 'ff: 'b,
bar: &'b mut Bar<'ff>,
}
// The signatures stay nice and tidy no matter how deeply we nest!
struct Piyo<'p> {
priv 'bb: 'p,
hoge: &'p mut Hoge<'bb>,
}
fn main() {
let mut string = String::new();
{
let mut foo = Foo { string: &mut string };
{
let mut bar = Bar { foo: &mut foo };
{
let mut hoge = Hoge { bar: &mut bar };
{
let mut piyo = Piyo { hoge: &mut hoge };
}
}
}
}
}
As you can see, the lifetime that is part of the signature of the type, is the lifetime of the deepest scope. Our types contain references to longer-living scopes, but those don’t matter, since when the lifetime of those ends, our types in the deeply nested scopes are long gone.
Note, however, that using private lifetimes brings forth some restrictions too. With public lifetimes you are able to do this:
fn replace_hoge<'a, 'b, 'c, 'd>(piyo: &mut Piyo<'a, 'b, 'c, 'd>, new_hoge: &'a mut Hoge<'b, 'c, 'd>) {
piyo.hoge = new_hoge;
}
but with private lifetimes:
fn replace_hoge<'a, 'b>(piyo: &mut Piyo<'a>, new_hoge: &'a mut Hoge<'b>) {
piyo.hoge = new_hoge; // Lifetime mismatch! `b and 'bb don't match!
}
Why does this happen? Note that because the lifetime 'bb is private, we lose the ability to equate it to lifetime 'b. The actual lifetime is “erased”, so to speak. If the compiler would allow placing stuff with arbitrary lifetimes to piyo.hoge, we could try and smuggle a shortly-lived object in there. If that object were to be deallocated or invalidated some other way before the piyo is, we have a dangling pointer! So writing to references with private references is more restricted than normally, because we can’t name the lifetime of the reference.
However, that doesn’t mean that we can’t mutate types with private lifetimes at all. Let Hoge have an additional field counter: u32. We can do:
fn mut_hoge<'a>(piyo: &mut Piyo<'a>) {
piyo.hoge.counter += 1;
}
Now, let’s imagine that Piyo's hoge field is actually Option<&'p mut Hoge<'bb>>. We can also do:
fn mut_hoge<'a>(piyo: &mut Piyo<'a>) {
let h = piyo.hoge.take();
piyo.hoge = h.map(|h| { h.counter += 1; h });
}
This only works because h has the exact same lifetime has hoge! The compiler knows that the lifetime 'bb lives longer than 'a, so it’s valid everywhere 'a is valid. However, the knowledge how long it actually lives is erased, so the compiler also ensures that 'bb can never outlive 'a; it essentially lives the same time, but without being equal to 'a.
Reference-level explanation
Note: I might need some help with mapping the corner cases and to understand if there is something that’s very hard to implement with the current borrow checker.
- Implement syntax
priv 'a: 'b in struct and enum bodies. priv is a reserved keyword and it is unused at the moment, so it shouldn’t cause any parsing complications. If we want in the future to use priv for something else, having a limited scope of type declarations and having lifetime tick ' coming right after it prevents blocking other uses.
-
priv 'a: 'b introduces the lifetime 'a, called a private lifetime that is in scope inside the type declaration body.
- Note that
'long: 'short means 'long outlives 'short. According to the Liskov subsitution principle if you can also always use B in place of A, then B can be considered as a subtype of A. As it is sound to use a (immutable) reference to a longer-living object in place of a reference to a shorter living object, that means that longer lifetimes in rust can be considered as subtypes of shorter ones they outlive.
- The latter part of the outlives relationship (e.g.
priv 'a: 'b) must be a lifetime that appears in the generic type signature of the type, for example 'b in struct Foo<'b> {}. This means that:
- All private lifetimes are always outlived by some public lifetime.
- The smallest lifetime that constraints the type is in the set of public lifetimes.
- Thus, the tighest limit of the type lifespan is never hidden
- The type fields are checked according to normal outlives rules
- Private lifetimes are not nameable outside the type declaration.
All in all, this design allows hiding all but the “most tight” lifetime of a type, when declaring nested types, or leaving part of the lifetimes public and hiding a part.
The main troublemaker why the lifetime proliferation happens, is variance, or rather, the lack of it, so let us start with some notes on variance. &mut T references are invariant with regards to T; that means that no subtype relation between T’s with different lifetimes can happen. The reason is the unfortunate interaction between mutability and subtyping. Let’s say one has a reference &mut Foo<'short> . if one would be able to use &mut Foo<'long> in place of it, bad things would happen: using a variable typed &mut Foo<'short> one is able to replace the original value of Foo<'long> with a value Foo<'short>. The shorter-lived value could be invalidated prematurely as &mut Foo<'long> still existed, causing UB. Thinking it in a slightly different way, the direction of variance depends on the input and output positions, or read and write capabilities of types; &mut T supports both reading and writing, so it must be invariant.
The reason for the proliferation is that because invariant types have incompatible lifetimes, every one must be mentioned – they can’t be “bundled up” under a single lifetime that is the supertype of the rest.
However, with the design proposed in this RFC, the hidden types need not to be mentioned in the signature. They become unnameable outside the types – one could call them existential lifetimes. Since the names of the lifetimes cannot be mentioned, one is not able to come up externally with a lifetime that would be compatible with the private lifetime.
The lifetime checking would be done similarly than in functions with for<'a> F: Fn(&'a T) at the moment: nothing can be assumed about the lifetime except that it lives the current scope. It can’t live in wider scope than the public lifetime it outlives, because the exact point it expires isn’t known. However, it is allowed to live as long as the public lifetime it outlives.
Drawbacks
- Lifetimes confuse people, and the reason why some lifetimes need to be hidden/encapsulated while others can just be “bundled up” within a shorter lifetime (it’s because of the difference in variance) can escape people.
- There might be no perfect syntax for hiding lifetimes using the current reserved keywords
- Admittedly the proliferation of lifetimes is a problem, but does hiding proposed in this RFC lifetimes pull it’s weight as an additional feature?
Rationale and alternatives
TODO
Prior art
TODO. Some notes:
-
The lifetime system of Rust is quite unique; I’m not aware of any prior art.
-
The problems around lifetime proliferation have to do with variance. C#, Kotlin etc.
Unresolved questions
- I’m not 100% sure that there isn’t corner cases if two fields of the same struct have the same private lifetime. Can one assigned to another? Is it unsound for these lifetimes to match?
- What will the exact syntax be?
- Is there any corner cases that haven’t been thought about?