Policy idea: Change TypeId's implementation details every other release

Hah, so not a hole, just anymap not using this impl correctly. That should be easy to fix.

Eh, I mean, tbh, I do believe AnyMap should just use a “dummy hasher” that just returns the last 8 bytes pushed into the thing.

Yup; that's (sort of) correct; As you can see in the implementation, the crate assumes that bytes.len() == 8. The usage of debug_assert! is a bit suspect tho; If compiled under release mode and the representation of TypeId is changed then the implementation may become unsound. If it was assert!(...) instead it'd be fully safe (but still error prone). Tho even if we change TypeId to u128 underneath it should still be OK because it will just read the first 64 bits and ignore the rest. The only soundness hole would arise if we suddenly made bytes.len() < 8, I think (which I can't imagine us doing...).

Can you imagine us making bytes.len() == size_of::<usize>()? (4 on windows, 8 everywhere else)

Would you be willing to elaborate on the rationale for this particular change?

As for making it fast again: I have an idea, but I need to figure out if Windows can at least merge the same global together across DLLs, like it can with errno...

Because the core of it need not be fast! And the goal is to have 0 collisions.

errno is part of the C runtime which is usually statically linked per DLL anyway.

Windows can’t merge globals by name across DLLs, but if you declare a non-generic static in a crate, you can rely on that having the same address in every dependent crate, since all users will reference it rather than instantiating their own copy. If you’re planning to make some sort of global lookup table to merge duplicate type IDs, that should be doable.

Yep, FastAny would be part of std, and it would have two impls:

  • On windows, it uses a non-generic static that is shared, allowing FastTypeId to work properly.
  • On everything else, we don’t use that.

FastAny would involve a few language features that don’t exist yet, but that shouldn’t stop us from fixing Any first.

Let’s not focus on FastAny for now, please?

If you have two DLLs which both statically link the CRT, they will both have their own independent versions of errno. Similarly if you have two DLLs which both dynamically link to the CRT but link to different versions of the CRT, they will still both have their own independent versions of errno. There's no merging of globals done by Windows.

1 Like

So if you depend on a DLL that makes use of errno, you don’t have access to errno?

errno is provided by the CRT, so if you depend on a DLL which makes use of errno and you want to be able to access the same errno as that DLL, you need to make sure both you and the DLL dynamically link to the same CRT.

Hmm. So just ship a triple-R (Rust Runtime Redistributable) on Windows? This can be per-Rust-version, since we already guarantee TypeId incompatibility across Rust versions.

Rust guarantees that inside a given Rust dependency tree there will be precisely one copy of any given Rust crate (two different versions of a crate are considered different crates), so Rust does not currently have any issues with ensuring there is only one copy of a given static. As long as it is known where a given static exists, all compilation units can ensure they link against the static from that known location, and Rust does that just fine. There's no need to get some sort of Rust runtime redistributable.

Okay, so a slow TypeId is not gonna be an issue.

Can we get a slow TypeId?

This is not a forum for "Can we get X?". If you want something to happen, do the work, come up with a design, and spend the time writing a detailed RFC, blog post or comment in the following issue: Collisions in type_id ¡ Issue #10389 ¡ rust-lang/rust ¡ GitHub.

14 Likes

On the use of debug_assert!: https://github.com/chris-morgan/anymap/pull/32 resolved that a week before this discussion, removing all unsafe code there, though not the underlying assumptions; I just hadn’t merged it.

For the rest: I stand by the principle of the TypeIdHasher. This is the dummy hasher, it just happens to be assuming that there are exactly eight bytes rather than merely taking the last eight bytes.

Performance is the name of the game here; I wrote it the way I did under the belief that no substantial change to TypeId’s internals was likely. If it turns out to be likely to change, well then, we’ll change it in anymap and any similar code.

But so far I’m with Centril; until we get compelling rationale for change, I’m happy leaving the assumption in.

(Incidentally, it could be more robust if it did transmute to u64: then if the size changed it’d fail to compile, instead of merely panicking the first time you use it. But as it is, changing the internal representation won’t cause problems if it still hashes to eight bytes, which seems most likely.)

My proposal involves adding as much metadata as possible, as strings, into the TypeId.

Right now my priority is to fix accidental collisions. We can make it fast later.

Just in general: if there’s some situation where the internal implementation of a thing should not be depended on by consumers, but for some technical reason it can’t be completely hidden and so might be depended on in an undesirable way that could cause surprising breakage when the internal implementation changes, then changing that implementation and seeing what breaks is a perfectly reasonable thing to do - in a test case, not a release.

std::mem::size_of::<usize>() evaluates to the size of a pointer, which is always 8 on 64-bit and always 4 on 32-bit.

If it was actually determined by operating system like that, it would be a huge, obvious, ecosystem-wide problem.

1 Like