I was recently bit by the fact that f32/f64 don’t implement Hash, making some of my structs fail to #[derive(Hash)]
. I’m writing an image processing pipeline that uses some f32 values as settings and I need a hash as an aggregate identifier of all the settings. At least for this use case (and I suspect many others) just having the f32/f64 bytes be hashed would be fine. I’ve searched around and couldn’t find a discussion explaining why this decision was taken. Can anyone point me towards one or explain why not implementing f32/f64 was the decision?
Not an inherent reason not to implement, but there’s at least one open design question here: Do all NaNs have the same hash value, or do NaNs with different payloads have different hash values? (Or should it be a run time error, or should the hash value be random, or possibly another alternative.)
When is a Hash
implementation useful without an accompanying implementation of Eq
?
You could use the ordered_float
crate to wrap around those floats to be hashed. If an explicit wrapper is undesirable, there is derivative
crate allowing you to customize the hash function of a field (remember to alter Eq
as well).
Fun fact: NaN is not the only issue. Java implemented hashCode()
wrongly making -0 and +0 produce different values, forcing Float.equals()
to have a very strange semantics. You will get the same problem if you implement the hash by bit-casting the f32 to u32 without checking for ±0.
| Language | ±0 | NaNs |
|----------------------|-----------|----------------|
| Java | Different | Same |
| Python | Same | Same |
| Ruby | Same | Different |
| Swift | Same | Different |
| C++ (libc++) | Same | Different |
| Rust (ordered_float) | Same | Same |
@hanna-kruppe @sfackler @kennytm For my use case all I need is Hash, not Eq and I’m perfectly happy with different NaNs returning different hash values. I’m stuffing a struct with a bunch of values and all I want is a hash of all those values that’s guaranteed to be different if the values are different. The use case is mostly for caching and I don’t care about false inequality only false equality.
That being said I’m sure rust would want a more complete implementation. Wouldn’t it be simple to just normalize NaN, Inf, and zero so that when they are passed to the hash they are always the same value? Basically do a few checks to replace different NaNs with a singe value and then pass the u32 to the hash as normal.
@kennytm ordered_float is the workaround I’ll be using but it makes for very ugly code.
That is one option, yes. It's far from obvious that it's the right choice, though.
What are the downsides? In what case is that not what you'd want?
It is incompatible with any use case where the values you’re combining are meaningfully different. This can occur trivially with +/- infinity, it can occur when you use bitwise equality, or when you’re encoding useful information in the NaN tag, and there are probably other examples as well.
I’d say that any time you can detect that the values are different through the normal f32/f64 API the hashing should be different as well. And any time that you can only detect differences by doing a transmute to u32 or similar it’s fine if the hashing is the same.
By these rules and reading the docs we should have different hashes for +Inf, -Inf, +0, -0, NaN.
Right, but unfortunately, HashMap uses the Eq
trait, which is derived from PartialEq
used for overloading ==
and !=
, so we can't have it both ways at the same time.
You can do
mod helper {
pub type f32_helper = f32;
pub type f64_helper = f64;
}
#[allow(non_camel_case_types)]
type f32 = ordered_float::OrderedFloat<helper::f32_helper>;
#[allow(non_camel_case_types)]
type f64 = ordered_float::OrderedFloat<helper::f64_helper>;
to shadow the f32
and f64
typename, so your code looks just like it would with regular f32
or f64
Wrapping and unwrapping a newtype are the ugly operations on it though.
Not sure what you mean. If you can detect that the values are different with the API Eq
should also return a difference.
Indeed, I've had to write a bunch of boilerplate code to make the wrapping and unwrapping palatable:
So should f32::NAN == f32::NAN
and 0. != -0.
? That would break backwards compatibility in a pretty fundamental way, as well as disagreeing with ~every other language in existence.
@sfackler If NAN!=NAN
you’re going to have a bad time using f32 as keys in a hash table. But I see there are also math reasons for that to be like that. Personally I don’t care for Eq
, my use case only requires Hash
.
I think that a hash works the opposite ways. If the values are the same, you have the guarantee that the hash is the same. But two different values can have the same hash (this a collision).
Obviously no hash can guarantee no collisions (pigeon-hole principle and all) but if you start making +0 and -0 hash to the same value on purpose and then that makes a difference mathematically down the line I'll have a problem of broken caching. It increasingly seems to me that at least for my usage just hashing the underlying bytes is ideal.
I also needed a larger datastructure that might contain floats (though in practice rarely used) to implement Hash
in my aterm crate. What I did was implement Eq
/Hash
based on the underlying bytes and I documented that you shouldn’t put floats in there if you’re going to use the Hash
instance. I went so far as to call the module bad_idea to discourage use of it. It was the fastest way to “fix” problem.
This is pretty much my use case. It seems to me that the reason for not having hashing on f32/f64 is that someone might use those values as keys in a hash table. That will always be broken if we want to keep things like NAN != NAN
. But there's another set of use cases that just needs hashing as a cheap way to do aggregate equality comparison for things like caching and other cases where false-inequality is not really an issue. So maybe the solution is to have a way to implement Hash
while at the same time disallowing floats as HashMap
keys?