Pre-RFC: non-footgun non-static TypeId

Proposal for how to unblock use cases that involve TypeId of potentially non-'static types, while exposing zero of the risks that took down the previous attempt at this.

Background

Type id is currently exposed in the standard library as follows:

// in core::any⸺
pub struct TypeId {…}

impl TypeId {
    pub fn of<T>() -> Self
    where
        T: 'static + ?Sized;
}

impl Copy, Clone, Eq, Ord, Debug, Hash

// in core::intrinsics⸺
#[unstable]
pub extern "rust-intrinsic" fn type_id<T>() -> u64
where
    T: 'static + ?Sized;

RFC 1849 proposed deleting the T: 'static bound from the above. This RFC was accepted by the lang team in 2017, and then unaccepted in 2020 in #41875 without ever having been implemented, attributed to "potential for confusion and misuse".

The concern is that since lifetimes are erased at runtime in Rust, &'a U and &'b U necessarily have the same TypeId regardless of lifetimes. Thus pretty much any use of TypeId with non-'static types where downcasting is involved is going to be unsound.

Counterproposal

My pre-RFC proposes the following API:

// in core::any⸺
pub struct TypeId {…}

impl TypeId {
    pub fn of<T>() -> Self
    where
        T: 'static + ?Sized,
    {
        TypeId(intrinsics::type_id::<T>())
    }

    pub fn same_as<T>(&self) -> bool
    where
        T: ?Sized,
    {
        intrinsics::all_types_with_this_id_are_static::<T>()
            && self.0 == intrinsics::type_id::<T>()
    }
}

// in core::intrinsics⸺
#[unstable]
pub extern "rust-intrinsic" fn type_id<T>() -> u64
where
    T: ?Sized;

#[unstable]
pub extern "rust-intrinsic" fn all_types_with_this_id_are_static<T>() -> bool
where
    T: ?Sized;

But is this useful? I still see a 'static bound…

Yes! This API would be amazing for Serde.

In Serde, data formats contain code that is generic over what type is being deserialized or serialized, and those generics usually do not have a 'static bound because plenty of non-'static types can be deserialized and serialized.

However, data formats often want to implement special behavior for a small set of special types unique to that data format, for example DateTime in TOML.

fn part_of_deserializer<'de, V>(data: D, visitor: V) -> Result<V::Value>
where
    V: serde::de::Visitor<'de>,
{
    if /** V == DateTimeVisitor<Value = DateTime> */ {
        let datetime: DateTime = deserialize_datetime(data)?;
        Ok(unsafe {
            mem::transmute_copy(&ManuallyDrop::new(datetime))
        })
    } else {
        deserialize_anything_else(data, visitor)
    }
}

This behavior is impossible to express today. With the API from the pre-RFC, the condition becomes implementable as:

    if TypeId::of::<DateTimeVisitor>().same_as::<V>() {

But is this risky to expose?

No! You still cannot get a TypeId for a type that is not 'static!

You can only get a TypeId of a type that is statically 'static, and then check whether some other type (which may or may not be 'static) is the same as the first one and has the same 'static lifetime.

Examples

struct StaticStr(&'static str);

struct BorrowedStr<'a>(&'a str);

// true
assert!(TypeId::of::<StaticStr>().same_as::<StaticStr>());

// false because different type lol
assert!(! TypeId::of::<StaticStr>().same_as::<&'static str>());

// false because all_types_with_this_id_are_static is false
assert!(! TypeId::of::<BorrowedStr<'static>>().same_as::<BorrowedStr<'a>>());
8 Likes

I haven’t thought too deeply about how much of a problem this is, but a potential remaining concern with the API you’re proposing is that a type like

struct Foo(&'static str);

could be changed into

struct Foo<T = &'static str>(T);

with the intention of this being a non-breaking change, however this changes the behavior of

intrinsics::all_types_with_this_id_are_static::<Foo>()

from true to false, and thus breaks use-cases of type_id.same_as::<Foo>().

4 Likes

Just driving by to say that I've been exposed to something like this in the wild (though they had done it unsoundly abusing monomorphization behavior that we'll hopefully close up to some extent).

It was frankly confusing as framed in their system, but the trick is that a type with no "lifetime positions" (in its monomorphic fully-normalized form that is a tree of type constructor applications) can be known to meet a : 'static without having to specify that bound statically (so a bit like needs_drop).

In a sense, this can be seen as a version of T: 'static that can be used during trait impl specialization, in that it only matches when the bound holds across all possible choices of lifetimes in T (which gets back to "T has no lifetime positions" almost by definition, since any of them would invalidate 'static).


I agree about integrating with TypeId (and Any, in their case), but I'm not fond on having a separate all_types_with_this_id_are_static intrinsic (my suggestion was TypeId::of_lifetimeless, which would have its own intrinsic returning Option).

Also, the problem with not labeling it specially ("lifetimeless" was my bikeshed) is that it can fail for types that are in fact : 'static.

To complete your example:

assert_eq!(TypeId::of::<&'static str>(), TypeId::of::<&'static str>());
assert!(! TypeId::of::<&'static str>().same_as::<&'static str>());

So by that measure, same_as is too misleading of a name.

Whereas if we added suffixed methods to TypeId/Any, this would make sense IMO:

let any_str = &"foo" as &dyn Any;

// Successes:
assert_eq!(any_str.type_id(), TypeId::of::<&'static str>());
assert_eq!(any_str.downcast_ref::<&'static str>(), Some("foo"));

// Failures:
assert_eq!(TypeId::of_lifetimeless::<&'static str>(), None);
assert_eq!(any_str.downcast_lifetimeless_ref::<&'static str>(), None);

Maybe "lifetimeless" is a poor name for the concept, but regardless we should come up with one that's distinguishing enough to not be mistaken as being a helpful shorthand, and thoroughly document it.

1 Like

Isn't such a change already breaking due to the large amount of inference breakages associated? Iirc yu can't really generify a struct due to a lack of inference for generic defaults.

1 Like

Well, the standard library is doing it with Box and Vec, adding a allocator parameter, so I guess it is possible.

1 Like

It's worse for functions, because foo means foo::<{?0}> with an inference variable. For structs, it's significantly less breaking, since Foo means Foo::<{default}>.

If all implementations existing before the generalization always apply to the generalized form, then this cannot break inference.

If any implementations existing before the generalization are limited to being provided solely for the prior type, using that implementation will at least some of the time allow inference to resolve to the single option.

If any implementations existing before the generalization are provided generically but do not apply to the whole generalization, then inference breakage is possible.

The biggest hurdle is not generalizing the struct itself; it's generalizing the functions which mention the new generics when not covered by the generalized Self. So perhaps the choice of a tuple struct was not the best choice here, since it does replace fn Foo(&'static str) -> &'static str with for<T> fn Foo(T) -> T; doing the generalization for a braced struct is generally considered to be minor/allowed inference breakage. (Again, generalizing the impls requires separate justification of being non-breaking.)

1 Like

Only if the field is public. Admitted, I didn't mark the struct public either, so the field being private was somewhat implicit.

There's already a use case for all_types_with_this_id_are_static, motivated entirely differently but with the same outcome: We can't make fn(&'a()) have a 'static lifetime instead of 'a at the moment, even though of course there's nothing preventing such methods from existing outside scopes with 'a lifetime bounds. Otherwise any fn(&'b()) could be safely cast to fn(&'a()) via Any, trivially unsound. This generalizes to other type construction that is contravariant in lifetimes for any other reason including some Box<dyn Trait<'a>> use cases.

1 Like

It doesn't actually matter whether all types with the ID are static does it? It seems like if we're adding an intrinsic anyway it would make more sense to add a purpose built intrinsic (especially since as far as I can tell the rejected RFC purely changed the definition of the type_id intrinsic and not the bounds on TypeId's associated functions)

pub extern "rust-intrinsic" fn type_id_eq_ignoring_lifetimes<T>(type_id: TypeId) -> u64
where
    T: ?Sized;

Then instead of same_as we could do something like

impl TypeId {
    // eq is probably still a little confusing here
    pub fn eq_ignoring_lifetimes<T>(&self) -> bool
    where
        T: ?Sized,
    {
        intrinsics::type_id_eq_ignoring_lifetimes::<T>(self.0)
    }
{

I think that would avoid at least the point of contention that appears to have gotten the previous RFC unaccepted. Though I'm still kind of unclear about why the 'static bound is important on the intrinsic since non-nightly users can't access it except through TypeId anyway.

1 Like

I believe it does, in that it's the only way I know of in which such a feature has sound usecases.

IIUC your eq_ignoring_lifetimes means this is unsound:

// R => "runtime", S => "static"
fn try_cast<R, S: 'static>(r: R) -> Result<S, R> {
    if TypeId::of::<S>().eq_ignoring_lifetimes::<R>() {
        Ok(transmute(r)) // realistically, transmute_copy + forget
    } else {
        Err(r)
    }
}

If I do e.g. try_cast::<_, &'static str>(&"foo".to_string()) the above would return Ok("foo"), except pointing to the heap, so when the String is deallocated, you'll pretty much be guaranteed UB unless you get rid of/never access again the bad &'static str first.

But if it's limited to "all types with the ID are 'static" , then you would get Err because &'a str is only 'static when 'a is, not always (and AFAIK you can do casts like thing as long as you can guarantee that the types contain no lifetime positions, which is currently isomorphic to "always 'static").

This is the kind of usecase I was referencing in my earlier comment above, from which I linked https://github.com/sagebind/castaway/pull/6#issuecomment-1151011368.

3 Likes

As an amateur type-caster myself, I'd really like to be able to deal with non-'static types in a sane way.

I'd prefer to see something like:

enum TypeId {
    // Type is 'static, safe to use for casting.
    Static(u64),
    // Type is not 'static, don't point this gun at your foot.
    Unsafe(u64),
}

impl Clone, Copy, etc.
impl Eq such that TypeId::Unsafe(x) never compares equal to anything, even itself.

I think this gives dtolnay's same_as<T>(&self) in the comparison between two TypeIds: non-static types can't be determined to be the same type. Does this venture too close to what #41875 was trying to avoid?

Can someone point me to some reading about the unsoundness of downcasting non-'static types? (I think I understand, but I'd like to make sure).

1 Like

Currently, Rust's only form of subtyping is via lifetimes, so "all types with this TypeId are 'static" ⇒ "all types with this TypeId can be freely transmuted to one another". However, new forms of subtyping might be added in the future, breaking this assumption.

Currently, Rust's only form of subtyping is via lifetimes

This is incorrect; subtyping also extends to HRTB instantiations. For instance, fn(&'static i32) -> &'static i32 is a subtype of for<'a> fn(&'a i32) -> &'a i32, and both types are 'static. However, the two types have different TypeIds, so the implication still holds.

2 Likes

The difficulty here (which applies both to lifetime HRTB and potential new forms of subtyping) is that switching a return type, associated const, etc from a supertype or a subtype should ideally be a non-breaking change. But having different TypeIds for subtypes makes this a breaking change, and making the subtypes have the same TypeId is unsound.

2 Likes

Another issue is that by the time the type_id intrinsic is called, all lifetimes are already erased. This means that it is impossible to give &'static u8 a different TypeId fron &'a u8. It would only be possible to use the TypeId::Static variant for types that don't have any lifetime parameters at all.

Eq requires the relation to be reflexive, in other words it requires x == x to hold for all x. This property is why we have Eq in the first place instead of just PartialEq, which doesn't require this.

1 Like

I just want to mention a (admitedly extremely niche) usecase for getting TypeIds of non-static types.

I have a TypeEq<L, R> type that is only constructible if its two type arguments are the same type, and allows me to do a limited form of polymorphism in const fns on stable.

I put TypeEq in enums where only one variant can be constructed for any given combination of generic arguments, so to help the compiler remove dead code I use a function (fn reachability_hint(self: TypeEq<L, R>)) that calls unreachable_unchecked when it can prove that the type arguments of TypeEq<L, R> aren't the same type.

The problem is that on stable I can only use size and alignment to prove that L and R are different types, which leaves a lot of opportunities to remove dead code on the table.

Being able to construct and compare TypeIds in const contexts for any type would make it possible to remove virtually all dead branches, except for those that only differ by lifetimes, which is fine with me.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.