Attempt at a comprehensive summary
This is an attempt at putting in a single place that can be referenced later what was discussed around the topic of hashing floats.
Base Facts
There are some facts about floats and also about how rust currently is setup that are important to clarify:
- Equality in floats is more complex than in most types because for math reasons you want things like
NAN != NAN
and+0 == -0
to be true. This is why floats in rust implementPartialEq
but notEq
as at least the reflexive property (a == a
) is broken by theNAN
case. - There is a different equality concept that’s generally useful which is that if
a == b
it’s not possible to writemyfunc(a) != myfunc(b)
. This is useful for caching situations and float equality in rust doesn’t guarantee this because+0 == -0
and yet.is_sign_negative()
and friends allow you to differentiate between them. Note that this isn’t a property that’s necessarily exclusive to floats. If someone ever ports rust to an architecture that does integers as sign and magnitude the same+0 == -0
case would exist. - The output of
Hash
cannot be relied upon to be stable. The same version of rust can return different values in different architectures. This is not a property of theHasher
that you’re using but instead of the wayHash
happens to be implemented for the type you’re using (e.g., the current implementation ofHash
for slices of integers returns different values in big and little-endian architectures).
If you want to use floats as HashMap keys
In general floats make for a poor choice as a map key because of the way equality is handled (see facts #1 and #2) so the best answer is just don’t do it. If you have a really good reason for it (e.g., caching the output of a really expensive func(f32)
) here’s what you’d probably need to do:
- Implement a separate equality operation that treats
+0 != -0
andNAN == NAN
. The easiest way to do that is probably to just transmutef32
tou32
andf64
tou64
and compare those but you may want to do something fancier like making all possibleNAN
values be equal. - You can then implement
Hash
by just hashing the underlying bytes after also doing normalization ofNAN
values if you did that for equality.
To implement this in the standard rust library a new equality trait would need to be created so that for math NAN != NAN
and for hashing NAN == NAN
. Since this is a corner case the easiest solution is to just do a wrapper type around floats like the ordered-float crate does, for the few situations where you want it.
If you want to have a content hash of a struct that includes floats
In some situations hashing is used to get a unique identifier for some content calculated from the content itself. For this Hash
is not workable as it doesn’t guarantee the results are stable between architectures (see fact #3). Here your best solution is to use serde’s Serialize
on your struct and then implement a Serializer
that just passes the content onto a cryptographic hash. This can almost surely be done generically and so there will probably be a turn-key solution in the near future (see the discussion here). There are also fancier solutions for this in the works like the objecthash crate.