Extremely bad solutions to the Any 'static requirement

LateNiteMartyParty · June 4, 2020, 8:31pm

Every once in a while there comes a use case where the functionality of Any would be desired in a non-static context. Unfortunately, one of the sticky points in the Rust library is that TypeId::of<T> requires that the lifetime of T to be 'static. Here are two of the incredibly fun solutions I have found and discovered.

Dtolnay, the genius they are, exploits the monomorphization of functions to use the function pointers as IDs:

#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Debug, Hash)]
pub struct TypeId {
    id: usize,
}

impl TypeId {
    pub fn of<T: ?Sized>() -> Self {
        TypeId {
            id: TypeId::of::<T> as usize,
        }
    }
}

My solution is different, and uses type_name. It relies on two things:

type_name conveniently left out the 'static bound
type_name returns a &'static str, which means that that string is valid for the lifetime of the program.

Here it is (all the stuff that makes this not contrived has been omitted):

pub trait AnyLike<'a, T>
where
    T: 'a,
{
    fn type_id(&self) -> usize {
        std::any::type_name::<Self>().as_ptr() as usize
    }
}

Assuming that type_name prints out the fully qualified path of each type (which it is NOT guaranteed to do), this works across crates! I wouldn't recommend it though.

It's hard to believe I need to actually use this to do something legitimate, but I do, and these two solutions are all I have.

Which is fine for me, as I've carefully thought about the assertions and any possible safety issues. This code is primarily going to be used for unit testing, and I know that all of the types I'm using this on will have distinct names even if the paths are squished. Also, none is named with the prefix of another, which I suppose could also produce a false positive.

However, I'm worried that other people might have not thought about their problems as deeply as I have, and we may end up being de facto required to have type_name print the full path all of the time before we know it.

dtolnay · June 4, 2020, 8:52pm

I don't endorse this, despite my name being in the post. The code you pasted is from one of my projects where it is used as a debugging aide only, not compiled into the project unless opted in by a feature called "unstable-debug" for debugging purposes.

dtolnay · June 4, 2020, 8:57pm

The same function type_name::<T> can have different addresses (and actually different behavior too) in different places. See https://github.com/rust-lang/rfcs/issues/1428#issuecomment-450535964 for an example of different behavior of the same function even within the same source file.

LateNiteMartyParty · June 4, 2020, 9:09pm

I hope that my post didn't come across as if I seemed to imply that you were tacitly endorsing it, I certainly don't recommend using either - I intended to illustrate some work arounds that have been used in the past, and why I think it ultimately is a problem that type_id has a 'static requirement. I only intended to cite you.

Indeed, type_name can have different addresses. Fortunately for me it's never used twice, so I think I can still use it for this.

I'm personally only using this so that I can add more unit tests, and unit tests breaking in the future is not as important as getting code working and tested now. It will never go into production.

cuviper · June 4, 2020, 9:19pm

Many bad ideas in production are preceded thusly...

steffahn · June 4, 2020, 9:23pm

Would you mind elaborating a bit on the legitimate use case you have? Iʼve only seen hints that itʼs related to unit testing in this thread so far. Iʼm mostly just curious, but I guess presenting your use case here might also help convincing people that there is need for an officially supported way to identify non-static types (while ignoring their lifetimes).

LateNiteMartyParty · June 4, 2020, 9:40pm

Sure, I will do my best. I don't think I can go far into the details but I'll give an overview.

I work in compilers, and I have a function that looks like this:

fn parse<'a>(input: &'a str) -> Box<dyn Expression<'a> + 'a>;

It is incredibly convenient when writing compilers to have the output be an opaque AST node. Plus, by attaching the lifetime I can still avoid string copies.

Edit: I mis-characterized the problem. Here it is again, better explained.

Basically, I would like to do the following. When I define I node, I would like to have it automatically derive PartialEq, like the following:

#[derive(partialEq)]
struct Add<'a> { 
    arg1: Box<dyn Expression<'a> + 'a>,
    arg2: Box<dyn Expression<'a> + 'a>,
}

This is so that when I write unit tests, I can do the following:

fn test() { 
    use ast::{add, literal};
    assert_eq!(parse("5 + 5"), add(literal("5"), literal("5"));
}

It's not possible to do this without the workarounds I described. In fact, even with the workaround, equality is only possible if the lifetime is shared, making this safe.

It is possible to do this with Debug to a String, but that requires extra allocation and is confusing because now the node has to derive Debug or Display.

dhardy · June 6, 2020, 10:49am

IIUC there are two sides to this problem:

type_id is supposed to unique to a type; since lifetime parameters are part of the type, it should also depend on the lifetime (even though it isn't needed for this application); note that in general it is hard to ensure that a new parameterisation of a type with a different lifetime would generate a different type_id, thus Any takes the conservative restriction of requiring 'static. (Simply ignoring lifetimes would be unsafe by allowing lifetime extension in a downcast.)
Effectively what you want for the comparison is to cast both objects to a fresh lifetime which does not outlive either source lifetime. For this, Rust would need some type of higher-kinded lifetime generics (especially since it is unknown in general how many lifetime parameters the type has). It's the same problem as userspace reborrows.

So, your "hack" here generates a type_id which is independent of the lifetime, thus could be used to extend lifetime in a downcast (violating memory safety).

RustyYato · June 6, 2020, 11:20am

I think that it is fine to generate type ids for non-'static types, so long as you can't use those types in conjunction with Any (or similar traits). So long as we acknowledge that they will only be unique if we ignore lifetimes.

LateNiteMartyParty · June 6, 2020, 8:12pm

Agreed on point one. I would think that generic objects with different lifetime params would have different type_ids.

On point two, I've walked back a bit on this. I don't actually need to cast the objects to a fresh lifetime. Downcasting to an object that has the same lifetime is sufficient, since the comparison is only used when the lifetime is 'static or otherwise the same. Here is my custom downcasting code to illustrate this:

impl<'a, TS> dyn Expr<'a, TS> + 'a
where
    TS: Redacted + 'a,
{
    /// This is important: it must share the same lifetime!
    fn is<T: Expr<'a, TS> + 'a>(&self) -> bool {
        // Code copied from Any but uses pseudo_type_id
    }

    fn downcast_ref<T: Expr<'a, TS> + 'a>(&self) -> Option<&T> {
        // Code copied from Any 
    }
}

Additionally, since the objects effectively "know" their own types via their vtables, it is only necessary to downcast the rhs object. The algorithm for comparison is as follows:

impl<'a, T, TS> NodeEq<'a, TS> for T
where
    T: PartialEq<Self> + Eq + Expr<'a, TS> + 'a,
    TS: Redacted + 'a,
{
    default fn node_eq(&self, rhs: &(dyn Eval<'a, TS> + 'a)) -> bool {
        rhs.downcast_ref::<Self>().map_or(false, |rhs| self == rhs)
    }
}

Which can then be used to implement for<'a, TS> Box<dyn Expr<'a, TS> + 'a>: PartialEq<Self>

RustyYato · June 6, 2020, 10:10pm

This doesn't work because this allows you to transmute lifetimes in safe code. For example, if a type has an unrelated lifetime

struct Struct<'short, 'a : 'short, 'b : 'short> {
   short: &'short u8,
   a: &'a u8,
   b: &'b u8,
}

You code would allow you to convert from Struct<'short, 'a, 'b> to Struct<'short, 'b, 'a> via dyn Expr<'short, _>. This is unsound.

LateNiteMartyParty · June 6, 2020, 11:17pm

Hmm... you're right, and that's unfortunate for me. I had hoped that Expr having a method with the signature fn ex(self: Box<Self>) would make such constructs impossible to create, but that's clearly not the case.

I still might keep it, because while it is unsound I know that none of the structs that implement Expr have any additional lifetime.

Actually, thinking about this further, I think that it's possible for me to enforce this constraint by adding the following bounds to NodeEq and dyn Expr<'a, TS> + 'a: where 'a: 'static. This would ensure that for

struct Struct<'short, 'a : 'short, 'b : 'short> {
   short: &'short u8,
   a: &'a u8,
   b: &'b u8,
}

that 'short: 'static and 'a: 'static and 'b: 'static.

Since I need these for unit tests, I only need this for the static lifetime, so I think that this might fix the unsoundness? This is probably no different than just implementing these for the 'static lifetime, giving me Any along with it.

RustyYato · June 7, 2020, 12:24am

And we're back at Any

RalfJung · June 7, 2020, 8:33am

Note that Rust compiles functions with unnamed_addr, which means that there is no 1:1 mapping between the address of a function and its source-level identity. Different functions can have the same address (if their assembly is the same and they get merged by LLVM), and potentially the same function might even have different addresses when viewed from different codegen units (not sure about this... it definitely happens for vtables though).

bjorn3 · June 7, 2020, 8:53am

Also happens for generic and #[inline] functions.

LateNiteMartyParty · June 8, 2020, 9:58pm

Unfortunately, it seems like even doing this requires a pseudo-type-id. I tried the following, but the issue was immediately apparent:

impl<TS> dyn Expr<'static, TS> + 'static
where
    TS: TypeSystem + 'static,
{
    /// This is important: it must share the same lifetime!
    pub fn is<T: Expr<'static, TS> + 'static>(&self) -> bool {
        let t = TypeId::of::<T>();
        let concrete = <Self as Any>::type_id(self);
        t == concrete
    }

    pub fn downcast_ref<T: Expr<'static, TS> + 'static>(&self) -> Option<&T> {
        if self.is::<T>() {
            unsafe { Some(&*(self as *const dyn Eval<'static, TS> as *const T)) }
        } else {
            None
        }
    }
}

type_id isn't included in the vtable of Expr because I can't require that Expr: Any The TypeId given to concrete is the one of dyn Expr<'static, TS>, not the TypeId of the concrete type.

I'm forced to have Expr include a type_id function, but since Expr is not always static, the type id has to be a psuedo type id.

I don't think there's any way around this unfortunately, but at least thankfully this code shouldn't be unsound, and I can still do what I want. Unfortunately I don't have the ability to automatically derive PartialEq now, which was convenient, but I should be able to write a proc macro that does the same thing.

RustyYato · June 8, 2020, 10:01pm

Why not? If that's the case you can put

fn typeid(&self) -> TypeId where Self: 'static { TypeId::of::<Self>() }

in your Expr trait, and base things off of that. (Do you really need a trait for Expr, I would assume a enum is a better fit, because there are a limited number of possible expressions forms)

LateNiteMartyParty · June 8, 2020, 10:48pm

Wow, I didn't have any idea that it was possible to have extra bounds on Self for trait methods. I thought those bounds were lifted to the top of the item.

As to the second question, an enum is what we were using before, so it's certainly possible to use. The issue is that while there is a limited number of possible expressions, there are a lot of them, and this provides a convenient way to add and extend functionality without having to add variants to one massive enum.

If this was a performance bottleneck, I don't think that this would be a reasonable change. However, the ability for different people to add different operators without interfering with other peoples work is a win for us.

Another the major reason is that, while I can't get into what TS is, it would be very useful for us to add variants to the Expr that only apply to one TS (or have different impls per concrete TS). There are other ways to do this, but they get really messy.

Thanks for your help Yato, you're amazing as always.

JeffBurdges · June 8, 2020, 11:10pm

In release builds, I'd expect dtolnay's TypeId should satisfy TypeId::of::<T>() == TypeId::of::<S>() for all T and S because any inequalities represent an optimization failure. We've enough reusable or empty default methods ala trait Foo { fn foo(&_self) {} } that rustc should optimize them away.

bjorn3 · June 9, 2020, 4:21pm

Such optimization would require LTO.

Topic		Replies	Views
Would non-`'static` TypeId be at all possible?	26	4260	June 29, 2021
[Pre-RFC] TypeId for non-static Types	8	2494	March 25, 2019
Pre-RFC: KnownTypeId libs	21	1184	October 31, 2019
Non-static data in `static` language design	10	305	October 24, 2024
Pre-RFC: non-footgun non-static TypeId	16	2451	January 9, 2023

Extremely bad solutions to the Any 'static requirement

Related topics