After reading a blog post about hashing in Rust I was thinking how the situation could be improved. My first thought was that if Rust had an equivalent of C++’s has_unique_object_representations then derive Hash could use it to generate potentially faster implementations for types that do not have padding. Something like this code:
Yes, that would need to be implemented by means of compiler magic.
It looks like the zerocopy crate has implemented its own mechanism for detecting whether it is safe to interpret object as byte slice: zerocopy::IntoBytes. Just a small subset of it would be needed for the purpose of derive Hash, but it’s cool that to know that it would also be useful for serialization.
I did not think about interior mutability. So there would need to be two traits automatically implemented by the compiler:
I know next to nothing about Rust’s internals, so I have no idea whether it would be difficult to implement it or not but it definitely seems possible considering zerocopy has it in a form of an automatically derived unsafe trait.
This would presumably be good to have for Eq too? And for Ord, for that matter, except for uff, little-endianess kind of ruins it, unless the fields are rearranged from least to most significant too.
Another idea: if there are padding bytes, AND them to zero. This would require LLVM magic (freeze?) to strictly avoid reading uninit memory even though the program would never see the uninit bytes in practice.
Being able to SIMDify Eq (and maybe Ord) would be super awesome. On x86-64s without AVX-512 masked loads it could only be easily done for vector-width types, unfortunately. Maybe other architectures have better support for loading arbitrary-width vectors?
Ths idea has been floated around a few times, but one of the issues is that the compiler doesn't actually know anything about padding or pointers at the time it is expanding this derive macro. This would be useful for other things (see "perfect derive"), but it is a very hard compiler change.
It could be maybe done as a specialized macro with an unsafe trait, I wanted to do something like this for rustc's StableHash trait.
Could you share the code you used for benchmarking?
I want to see if I can come up with a macro that would expand into these optimized implementations by leveraging bytemuck and/or zerocopy and publish it as a crate.
I imagined it being implemented so that derive(Hash) expands into code that somehow dispatches at compile time into an appropriate implementation An equivalent of this C++ code
A quite ugly but functional (and soon maybe sound) way could be something like this:
Have a PerfectHashAligned (bikesheddable ofc) marker trait that is implemented by the compiler. (Alternatively this could also be Align<A = 0> or a bytemuck Uninitialized equivalent etc.)
Then for every single derive we do
impl DerivenHash for .. {
// normal impl goes here
}
And then have a specialized
impl<T: DerivenHash> Hash for T {
default fn hash() {
<T as DerivenHash>::hash()
}
}
impl<T: DerivenHash + PerfectHashable> Hash for T {
fn hash() { /* .. */ }
}
This is a bit verbose and goes through some indirection and is currently unsound when lifetimes are around, but I am working on at least fixing the soundness holes of specialization when used in such context as above.
The existing Hash impls for pointers do only hash the address and metadata; if your type contains pointers and wants deep hashing, it has to impl Hash manually. References are deep-hashed, however; the Rust equivalent to has_unique_object_representations would have to return false for types containing references.
I've added another optimization: instead of always using state.write() which is a slow codepath that needs to be able to handle data of any size, I dispatch to the appropriate fixed-size impl such as write_u64. Structs that don't align neatly with a specific integer size are padded to the larger one.
To avoid a performance cliff for structs above 128 bits (maximum fixed-size write is u128) I turn structs of up to 512 bits into a fixed amount of u128 writes with the last one being padded.
You can clone the repo and run cargo bench, I think the benchmarks speak for themselves.
I'll formally announce this crate soon, probably tomorrow.