Surprising interaction between zero sized structs and Hash

I was working on hashing a set of structs that all implement the same trait and some are zero-sized and some not. I have a sequence of them and am using the hash to represent the cumulative set of them up to that point in the sequence (for caching type uses mostly). I was then very surprised when I found that that hashing a ZST is actually a no-op. Here’s a minimal example:

use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};

#[derive(Hash)]
struct Empty {
}

fn main() {
    let mut s = DefaultHasher::new();
    1.hash(&mut s);
    println!("Hash is {}", s.finish());
    let e = Empty{};
    e.hash(&mut s);
    println!("Hash is {}", s.finish());
}

It returns the surprising result:

Hash is 1742378985846435984
Hash is 1742378985846435984

I’d say this breaks the assumption of what .hash() does in that you’ve passed a new value into it and the hash hasn’t changed. It’s definitely a corner case though and having it be a no-op should definitely help performance and be fine in almost all cases.

For my specific use case this was quite easy to work around. My trait includes a .name() that returns a string so I hashed that first before hashing the struct. But I thought it was worth it to discuss first

I don't think there is such an assumption. The only requirement of Hash is a == b ⇒ hash(a) == hash(b). Since a concrete ZST only has one value, making its hash() do nothing won't break this requirement.

Perhaps you should not use Hash if you need anything stronger.

I at least assumed that of .hash(), that it's never a nop. It seems reasonable to think this but maybe not.

Here's a list of things that look strange to me with the current Hash:

  • Empty{} and OtherEmpty{} hash to the same value
  • Foo("somestring") and Bar("somestring") hash to the same value
  • Hashing any number of Empty{} structs in sequence returns the same value

In rust most times these won't bite you because you can't create a hash table of different structs and so the fact that different types hash the same is less important. But it still seems inconsistent given the signature of .hash(&mut Hasher) that allows you to easily hash a sequence of different types. It's at least surprising that in the ZST case the result of the hash will not change no matter how many different types you use and no matter how many different calls you make.

(123_u8, 45_u8) and 11643_u16 both hash to the same value.

You can't compare the hash if they are different types.

If hash of a ZST was not a nop, performance of hashing a struct with zero-sized markers would decrease with the # of markers, which would suck.

5 Likes

It sounds like you were expecting Hash to hash a value's type in addition to the actual value. This is what I'd expect to happen in a dynamically typed language where a value "carries around its type" at runtime, but in Rust I would've expected and am not surprised to learn that "types aren't hashed", or as kenny put it:

That said, "type hashing" seems like a reasonable thing to do as long as it's kept separate from regular hashing. Maybe someone could write a crate that does impl<T: Hash + Any> TypeHash for T using TypeId::of::<T>().

1 Like

Actually I only need something weaker, which is that when .hash() is called on a ZST the hash value changes. Just hashing a fixed value for all ZSTs would be fine by me, but I see that for most use cases this doesn't matter and the performance implications are potentially relevant. The workaround was extremely simple so I only opened the discussion because the result was strange to me at first.

Changes from what to what? I'm afraid I can't tell how this is different from "hashing the type" (or perhaps "hashing the ZST-ness of a type").

Note that this is always possible, even for non-ZSTs, because of the limited size of the hash. I'm sure there's some value that just happens to result in the hash not changing.

Sure, but good luck finding that one...

It's right there in the next sentence:

Just turning the nop into 0.hash(&mut hasher); would work for me.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.