Would non-`'static` TypeId be at all possible?

H2CO3 · March 13, 2021, 6:43am

Note: the following is NOT an RFC. I am explicitly not proposing any changes to any aspect of Rust. I don't need the functionality described herein, and I wouldn't have the motivation for it, either. I am just genuinely curious about a couple of technical details that I think are still somewhat unclear to me.

So, while attempting to answer this question on URLO, I realized that I don't have a completely satisfactory and compelling reason to back up and motivate my claim that TypeId essentially has to impose a 'static lifetime bound.

When I first learned about this issue, it seemed instinctively obvious. Lifetimes are just not a thing at runtime, they are context-dependent, they have subtyping relationships, and there is the issue of recursive functions, with some of their lifetimes only valid within the same function call.

However, while I'm clueless as to how I would even go about implementing the TypeId intrinsic into the compiler for non-'static lifetimes, I still couldn't convince myself 100% that context dependence and subtyping do indeed make lifetimes infeasible to encode dynamically. Why is it so? The compiler does erase lifetime information before codegen, but maybe it doesn't have to erase them. So, my best guesses so far were:

A lifetime-dependent TypeId would require global (interprocedural or even cross-crate) analysis of the entire code. The context-dependence of lifetimes means that named lifetimes (except for 'static) are always generic parameters, so they may change depending on the caller. Therefore, it's impossible to assign a single, concrete, globally unique ID to a given lifetime – just like the compiler isn't allowed to assign the same value to a function argument across different invocations of the same function.
Perhaps this is also where recursion comes into play? Maybe, just maybe, recursion means that even if the compiler performed a complete global analysis of the code, type IDs of lifetimes would need to change dynamically within a recursive function, with every new level of the call.

This seems contradictory to me, because it would break the "lifetimes are a purely compile-time construct" assumption. But what if someone were to put addresses of local variables in a global Vec<Box<dyn Any>>? Surely a properly implemented non-'static Any would allow that, while ensuring that downcasting fails outside the scope of the local variable being referenced.
It can be hard to encode subtyping relationships efficiently. An implementation I could think of would be to construct a tree of globally-available TypeIds, where every subtype would point to its supertype.

But this might mean generating a prohibitively large tree. I remember that someone once proposed on IRLO that multi-trait objects be implemented by generating vtables for all possible combinations of all object-safe traits. Unsurprisingly, this was deemed unacceptable as there would then be a combinatorial explosion in compile time and binary sizes. Would the same problem arise with properly subtyped TypeIds?

Are any of these guesses of mine correct or relevant? Could someone more familiar with the internals of type lowering shed some light on the most important concrete issues with lifetime-aware TypeId?

comex · March 13, 2021, 8:04am

Well, that's the problem. Lifetimes are such a purely compile-time construct that they're completely erased at runtime.

Consider this function signature:

// If 'b: 'a, then returns Some(x), else returns None
fn downcast_lifetime<'a, 'b>(x: &'a u32) -> Option<&'b u32>

If there was a version of Any that could dynamically test for lifetimes, you could use it to implement downcast_lifetime.

But how would that actually work? downcast_lifetime can be called with either 'a or 'b as the longer lifetime (or they could be equal). Suppose there is one code path that calls downcast where 'b: 'a holds, and another where it doesn't hold. Then no matter how sophisticated a global analysis the compiler performs, it obviously can't prove statically that 'b: 'a either holds or doesn't hold.

So there are two options: it could encode the lifetime at runtime as part of the reference, or it could try to monomorphize functions based on lifetime.

Encoding it at runtime would come with overhead; every reference would probably have to double in size, to store some integer representing the lifetime along with the actual pointer value. It might be possible for the compiler to optimize away the lifetime value in cases where it's known not to be used – sometimes. If you're using references as local variables or function arguments passed by value, and not doing any dynamic dispatch, it might be possible to elide most unneeded lifetimes... with a sophisticated (and slow) global analysis. But what if you have a Vec<&Foo>? The compiler is not going to magically compress the Vec elements in cases where they can be compressed. It's not that smart.

Monomorphizing functions based on lifetime would be difficult. Some might say it's impossible, since the existence of recursion means that there can be an unbounded number of nested lifetimes. But I think (could be wrong) that you'd 'only' need a separate copy of the function for every possible combination of outlives relationships. The number of possible combinations is finite, sure enough… but it grows exponentially in the number of lifetimes. Now, as long as you stick to static calls, you'd only have to monomorphize the combinations that were actually used. But what about dynamic dispatch? Rust allows lifetime generics in trait objects, precisely because lifetimes are erased. To make this work with monomorphized lifetimes, a trait object vtable would need one slot for every possible combination. That's not going to work.

(Also, even ignoring the exponential growth issue, having to compile functions even two or three times just because they were used with different lifetimes would be a massive compile time hit. Maybe you could reduce the impact by carefully keeping track of when lifetimes do and don't affect the end result.)

With all that said, I believe it is possible to make a version of Any that works with non-'static objects, but where all lifetimes involved are known at compile time. You could convert from Foo<'a> to AnyA<'a> and back, but you couldn't convert from Foo<'static> to AnyA<'a>. (You could convert from Foo<'static> to Foo<'a> first, if it happens to be a covariant lifetime parameter, but there would be no way to recover the 'static lifetime afterwards.)

And yes, AnyA is what Soni was proposing the other day. Except I'm pretty sure that, rather than needing a new higher-kinded types feature, you can implement at least a version of it with just GATs...

quinedot · March 13, 2021, 8:34am

O(n! * cⁿ⁺¹) if equality is considered distinct.

Soni · March 13, 2021, 1:58pm

Do also note that you can get the TypeId of something like for<'a> fn(&'a ()), i.e. you can get a TypeId for what's conceptually a type constructor (but also effectively a type).

comex · March 14, 2021, 6:42am

Actually, I need to correct my last post.

If you choose to encode lifetimes at runtime, you don't have to store each reference's lifetime alongside its pointer value. In fact, doing so would be incorrect. Lifetimes don't belong to values; they belong to types. You can safely convert e.g. &'a &'b u32 to &'a &'a u32, but that changes the lifetime you should get from the inner reference, even though you're not touching the memory.

Instead, you would want to pass lifetime relationships as a hidden function argument(s). For a given generic function, you would have to encode the relationships of all lifetime parameters it has, as well as all lifetimes embedded in types chosen for its generic parameters. Also, trait objects are the one case where lifetimes do belong to values. Thus, trait object references would have to store lifetime information for their underlying type (not to be confused with the lifetime of the reference itself), making them extra fat.

This is slightly more feasible than what I said last time. But it would still be quite expensive, even if you 'just' store the relationships between lifetimes, which would suffice for some purposes. If you want to allow keeping around potentially expired references, like the OP's mention of sticking a local variable in a global Vec<Box<dyn Any>>, and dynamically test whether a reference is still alive, you would have to store more than that. I think you'd have to maintain some kind of global tree of lifetimes at runtime, which is starting to get into "everything is 10 times slower" territory.

steffahn · March 14, 2021, 3:04pm

I don’t fully understand why any internal lifetimes are important for trait objects... a type dyn Trait + 'a has a lifetime in its type and more information than that single lifetime is not really needed to handle the trait object safely.

In particular for

we should not forget that the type Box<dyn Any> is nothing but shorthand for Box<dyn Any + 'static>, with the effect that we wouldn’t gain anything here from an Any trait with implementations for non-'static types; my answer to this question

would be: no, nothing would allow that. Not to say that there isn’t a way to put references to local variables into a global data structure at all with a sufficiently advanced dynamic lifetime system. But it can’t be through ordinary dyn Trait + 'a–style trait objects.

a way to put references to local variables into a global data structure at all with a sufficiently advanced dynamic lifetime system

Another approach—using just non-'static implementations of Any—could involve some kind of scope/guard objects that allow access to a global store. When a scope is left, everything associated to that scope is removed from the global store again. A scope introduces a lifetimes 'a and any Box<T> to be inserted while inside of the scope needs to fulfill T: 'a for conversion into Box<dyn Any + 'a>. There would need to be type comparisons between Box<dyn Any + 'a> and Box<dyn Any + 'b> for different lifetimes 'a and 'b; namely, if two scopes with lifetimes 'a and 'b are using the same store.

Those kind of comparsons would need to be possible anyways because of variance. If you have x: Box<dyn Any + 'a> and y: Box<dyn Any + 'b>, you can treat x and y as having the same type Box<dyn Any + 'c> as long as 'c is shorter than both lifetimes 'a and 'b.

bluss · March 14, 2021, 3:27pm

Because specialization is not stable, I've needed the following function for ad-hoc specialization: Given a type T, is T == f64 or not?

With T: 'static this is easy to implement using the current TypeId. It would also be easy to implement with a more relaxed type id, and it would seemingly be easily sound (because one of the two types in the comparison is completely without lifetimes). However, there is no way with the current type system to constrain a function like is_equal<T, U> to only types where U is "lifetime-free".

Soni · March 14, 2021, 3:49pm

Or into GC territory. ;p

cuviper · March 14, 2021, 4:04pm

Maybe we could have a fallible constructor, returning Some(TypeId) for T: 'static, otherwise None. Alternatively, it could use a wrapper that treats non-static like a NaN state, to avoid the mistake of comparing None == None and thinking that's type equality.

dhm · March 14, 2021, 4:05pm

This: I'd love for a is_equal<T : 'static, U>() -> bool function provided by the stdlib, although I have to admit having needed that to hack into poor man's specialization too

comex · March 14, 2021, 11:06pm

That doesn't quite work. With such an is_equal function, you could implement:

fn is_this_lifetime_static<'a>() -> bool {
    is_equal::<&'static (), &'a ()>()
}

But such a function would have to pull information out of thin air, given that lifetimes are erased.

What you'd need is not just T: 'static, but some kind of constraint that means "T does not even contain any lifetimes".

Not coincidentally, the "specialization mode" mentioned in this blog post as a path for sound specialization, if implemented, would provide a way to express that constraint. "Specialization mode" is a pretty non-descriptive term, so I like to call it "regardless of lifetimes". Using that name, is_equal could have its signature written as:

fn is_equal<T, U>() -> bool
    where regardless_of_lifetimes(T: 'static)
{ /* ... */ }

This would require that the constraint not only holds for T, but would still hold even if any lifetimes contained in T were replaced with any other legal possibility. In this case it means that T can't have any lifetimes. If it did, then you could replace them with a local lifetime and T would no longer be 'static.

Well, with the exception of useless lifetimes which are forced to be 'static, like struct Foo<'a: 'static> { /* fields */ }. T can have those.

That said, if the goal is poor man's specialization, well… if regardless_of_lifetimes is added to the language, it would probably be stabilized at the same time as specialization, making poor man's specialization unnecessary. But who knows if that will ever happen.

Perhaps in the meantime there could be a lang-item function that works like @cuviper's suggestion, returning Some(TypeId) or None – except that it would have to check that T has no lifetimes, not just that it's 'static.

dhm · March 14, 2021, 11:13pm

You're right, I phrased that quite poorly, thanks for pointing it out In my case I had a concrete type T in mind (such as type T = () or type T = *const c_char) which couldn't possibly be the 'static instantiation of a lifetime-infected type (I guess that's what you mean by regardless_of_lifetimes(T : 'static) – I thank you for letting me map this "'static regardless of lifetimes" wave-handed definition to the "specialization mode" terminology)). I agree that correctly distinguishing these two "use-cases" (the problematic one, and the one I had) apart is footgunny so I can understand why this unchecked API of mine isn't available ~~yet~~ (but for the valid use cases it would still be useful in the meantime )

cuviper · March 14, 2021, 11:57pm

It's still ok if there are 'static lifetimes internally, right? I understand why &'a () vs. &'static () is problematic, or Cow<'a, str> vs. Cow<'static, str>, but struct Moo(Cow<'static, str>) should be fine.

comex · March 15, 2021, 6:45am

Yeah… I'm not sure of the right terminology, but I'd define "has no lifetimes" as: *ahem* The canonical name of the type, after fully resolving any generic type parameters, type aliases, and trait projections, doesn't mention any lifetimes – except for ones introduced with for<'a> syntax within the type name itself.

So Moo is eligible, because its name is just Moo; the definition of the struct doesn't matter. But Bar<Cow<'a, str>> is obviously ineligible.

More interestingly, Bar<Cow<'static, str>> should probably also be ineligible. In theory the compiler could allow it, if and only if it knew at compile time that the lifetime really was 'static – i.e., yes if you wrote out Bar<Cow<'static, str>> directly, no if you wrote Bar<Cow<'a, str>>, where 'a is some lifetime parameter to your function, and then your function just happened to get called with 'a as 'static. This is the kind of behavior you get today from unstable specialization. Specialization is unsound, but only in conjunction with associated types; there'd be nothing inherently unsound about having this function work that way. But it feels spooky for a function to change its behavior based on not just what a lifetime parameter is, but what it might have been. It's probably better for the compiler to play dumb, at least to start with.

Oh, and about for<'a> syntax (i.e. higher-ranked types): The type for<'a> fn(&'a u32) -> &'a u32, which is just the expanded form of fn(&u32) -> &u32, should be eligible, because there's no concrete lifetime in there that can vary. (You can remove the for<'a> and change it to, say, fn(&'static u32) -> &'static u32, but as Soni mentioned, that is treated as a separate type for monomorphization purposes.) Same goes for, say, Box<for<'a> dyn FnOnce(&'a u32)>.

Soni · March 15, 2021, 3:13pm

It's actually dyn for<'a> btw, if we recall correctly.

mcy · March 15, 2021, 3:14pm

Itym "the resolved type has no free lifetime variables". That is, if the type's kind incorporates the lifetime kind in any way. So Moo: type but Cow<'static, str>: lifetime -> type in this case, because 'static is an applied parameter of the type.

As you mention, we should exclude cases like Trait<'a>::Assoc = i32.

It's actually dyn for<'a> btw, if we recall correctly.

Both for<'a> $ty and for<'a> $trait are valid; the first is a valid type, the second is a valid bound. I can write for<'a> fn(&'a i32), for example. You can even write for<'a> dyn for<'b> Fn(&'a i32) -> &'b i32.

Soni · March 15, 2021, 3:31pm

type X = for<'a> dyn FnOnce(&'a ()); does not compile.

dhm · March 15, 2021, 4:09pm

Yes, I can't see how Rust would be able to distinguish that type from a monomorphization of 'a = 'static. I think wrapper types would thus be mandatory.

Funnily enough, I know there is currently an instance where Rust distinguishes between an explicit 'static "monorphization choice" and a fixed 'static lifetime (e.g., through a type alias), although I find that to be more of a bug / inconsistency rather than a feature:

macro_rules! StaticStr {() => ( &'static str )}
type StaticStr = &'static str;

fn foo (_: StaticStr!()) -> &str { "" } // Ok.
fn bar (_: StaticStr   ) -> &str { "" } // Fails.

Playground

steffahn · March 15, 2021, 5:59pm

I’m not really understanding the argument here. We’re talking about what should be allowed as the type T in a hypothetical fn is_equal<T: NoFreeLifetimes, U>() functions, right? You explained yourself that &'static () doesn’t work, how is Bar<Cow<'static, str>> any different? Maybe it would work in a setting where Bar is some struct Bar<T: 'static>... but otherwise you’d have the problem of a potential is_equal::<Bar<Cow<'static, str>>, Bar<Cow<'a, str>>> call, wouldn’t you?

comex · March 15, 2021, 8:18pm

That's weird. Though both functions do work if you label the return type explicitly as &'static str, so it's 'just' an issue with lifetime elision.

No, well… I'm talking about what should be allowed in a hypothetical fn some_name_here<T>() -> Option<TypeId> function. But I suppose the difference is less about the function signature, more just a question of naming and user expectations. For a function named is_equal, it's clearly wrong to spuriously return false because it implies the types are not equal. But in theory you could just rename it to is_provably_equal, and document what "provably" means in this context. Or perhaps it could be named is_equal but return Option<bool>, as in 'definitely equal, definitely not equal, or unknown'. Similarly, if the Option<TypeId>-returning function were named type_id_if_static, it would be wrong to spuriously return None, but not if it was named e.g. type_id_if_provably_static.

Still, as I said, these kinds of semantics would be spooky and confusing, so it's probably best to avoid them.

Topic		Replies	Views
Do lifetimes have to be erased at runtime? language design	12	1334	October 24, 2022
Type erasing non-`'static` types libs	18	651	August 8, 2024
[Pre-RFC] TypeId for non-static Types	8	2495	March 25, 2019
Pre-RFC: non-footgun non-static TypeId	16	2452	January 9, 2023
Extremely bad solutions to the Any 'static requirement language design	20	2143	September 7, 2020

Would non-`'static` TypeId be at all possible?

Related topics