F32/f64 should implement Hash

pedrocr · June 25, 2017, 6:28pm

I was recently bit by the fact that f32/f64 don’t implement Hash, making some of my structs fail to #[derive(Hash)]. I’m writing an image processing pipeline that uses some f32 values as settings and I need a hash as an aggregate identifier of all the settings. At least for this use case (and I suspect many others) just having the f32/f64 bytes be hashed would be fine. I’ve searched around and couldn’t find a discussion explaining why this decision was taken. Can anyone point me towards one or explain why not implementing f32/f64 was the decision?

hanna-kruppe · June 25, 2017, 7:06pm

Not an inherent reason not to implement, but there’s at least one open design question here: Do all NaNs have the same hash value, or do NaNs with different payloads have different hash values? (Or should it be a run time error, or should the hash value be random, or possibly another alternative.)

sfackler · June 25, 2017, 7:22pm

When is a Hash implementation useful without an accompanying implementation of Eq?

kennytm · June 25, 2017, 8:17pm

You could use the ordered_float crate to wrap around those floats to be hashed. If an explicit wrapper is undesirable, there is derivative crate allowing you to customize the hash function of a field (remember to alter Eq as well).

Fun fact: NaN is not the only issue. Java implemented hashCode() wrongly making -0 and +0 produce different values, forcing Float.equals() to have a very strange semantics. You will get the same problem if you implement the hash by bit-casting the f32 to u32 without checking for ±0.

| Language             | ±0        | NaNs           |
|----------------------|-----------|----------------|
| Java                 | Different | Same           |
| Python               | Same      | Same           |
| Ruby                 | Same      | Different      |
| Swift                | Same      | Different      |
| C++ (libc++)         | Same      | Different      |
| Rust (ordered_float) | Same      | Same           |

pedrocr · June 25, 2017, 8:31pm

@hanna-kruppe @sfackler @kennytm For my use case all I need is Hash, not Eq and I’m perfectly happy with different NaNs returning different hash values. I’m stuffing a struct with a bunch of values and all I want is a hash of all those values that’s guaranteed to be different if the values are different. The use case is mostly for caching and I don’t care about false inequality only false equality.

That being said I’m sure rust would want a more complete implementation. Wouldn’t it be simple to just normalize NaN, Inf, and zero so that when they are passed to the hash they are always the same value? Basically do a few checks to replace different NaNs with a singe value and then pass the u32 to the hash as normal.

@kennytm ordered_float is the workaround I’ll be using but it makes for very ugly code.

hanna-kruppe · June 25, 2017, 8:51pm

That is one option, yes. It's far from obvious that it's the right choice, though.

pedrocr · June 25, 2017, 9:26pm

What are the downsides? In what case is that not what you'd want?

hanna-kruppe · June 25, 2017, 9:47pm

It is incompatible with any use case where the values you’re combining are meaningfully different. This can occur trivially with +/- infinity, it can occur when you use bitwise equality, or when you’re encoding useful information in the NaN tag, and there are probably other examples as well.

pedrocr · June 25, 2017, 10:02pm

I’d say that any time you can detect that the values are different through the normal f32/f64 API the hashing should be different as well. And any time that you can only detect differences by doing a transmute to u32 or similar it’s fine if the hashing is the same.

By these rules and reading the docs we should have different hashes for +Inf, -Inf, +0, -0, NaN.

vadimcn · June 26, 2017, 2:57am

Right, but unfortunately, HashMap uses the Eq trait, which is derived from PartialEq used for overloading == and !=, so we can't have it both ways at the same time.

oli-obk · June 26, 2017, 7:57am

You can do

mod helper {
    pub type f32_helper = f32;
    pub type f64_helper = f64;
}
#[allow(non_camel_case_types)]
type f32 = ordered_float::OrderedFloat<helper::f32_helper>;
#[allow(non_camel_case_types)]
type f64 = ordered_float::OrderedFloat<helper::f64_helper>;

to shadow the f32 and f64 typename, so your code looks just like it would with regular f32 or f64

eddyb · June 26, 2017, 9:04am

Wrapping and unwrapping a newtype are the ugly operations on it though.

pedrocr · June 26, 2017, 10:26am

Not sure what you mean. If you can detect that the values are different with the API Eq should also return a difference.

pedrocr · June 26, 2017, 10:28am

Indeed, I've had to write a bunch of boilerplate code to make the wrapping and unwrapping palatable:

github.com

pedrocr/rawloader/blob/2ccb92a3161009276f8a9e5ca0ef25e6d01d72f6/src/imageops/colorspaces.rs#L6-L20


      
          fn from_ordered(m: [[OrderedFloat<f32>;4];3]) -> [[f32;4];3] {
            [
              [m[0][0].into(),m[0][1].into(),m[0][2].into(),m[0][3].into()],
              [m[1][0].into(),m[1][1].into(),m[1][2].into(),m[1][3].into()],
              [m[2][0].into(),m[2][1].into(),m[2][2].into(),m[2][3].into()],
            ]
          }
          
          fn to_ordered(m: [[f32;4];3]) -> [[OrderedFloat<f32>;4];3] {
            [
              [OrderedFloat(m[0][0]),OrderedFloat(m[0][1]),OrderedFloat(m[0][2]),OrderedFloat(m[0][3])],
              [OrderedFloat(m[1][0]),OrderedFloat(m[1][1]),OrderedFloat(m[1][2]),OrderedFloat(m[1][3])],
              [OrderedFloat(m[2][0]),OrderedFloat(m[2][1]),OrderedFloat(m[2][2]),OrderedFloat(m[2][3])],
            ]
          }

sfackler · June 26, 2017, 3:39pm

So should f32::NAN == f32::NAN and 0. != -0.? That would break backwards compatibility in a pretty fundamental way, as well as disagreeing with ~every other language in existence.

pedrocr · June 26, 2017, 6:04pm

@sfackler If NAN!=NAN you’re going to have a bad time using f32 as keys in a hash table. But I see there are also math reasons for that to be like that. Personally I don’t care for Eq, my use case only requires Hash.

dzamlo · June 26, 2017, 6:32pm

I think that a hash works the opposite ways. If the values are the same, you have the guarantee that the hash is the same. But two different values can have the same hash (this a collision).

pedrocr · June 26, 2017, 6:38pm

Obviously no hash can guarantee no collisions (pigeon-hole principle and all) but if you start making +0 and -0 hash to the same value on purpose and then that makes a difference mathematically down the line I'll have a problem of broken caching. It increasingly seems to me that at least for my usage just hashing the underlying bytes is ideal.

Apanatshka · June 27, 2017, 2:33pm

I also needed a larger datastructure that might contain floats (though in practice rarely used) to implement Hash in my aterm crate. What I did was implement Eq/Hash based on the underlying bytes and I documented that you shouldn’t put floats in there if you’re going to use the Hash instance. I went so far as to call the module bad_idea to discourage use of it. It was the fastest way to “fix” problem.

pedrocr · June 28, 2017, 11:45am

This is pretty much my use case. It seems to me that the reason for not having hashing on f32/f64 is that someone might use those values as keys in a hash table. That will always be broken if we want to keep things like NAN != NAN. But there's another set of use cases that just needs hashing as a cheap way to do aggregate equality comparison for things like caching and other cases where false-inequality is not really an issue. So maybe the solution is to have a way to implement Hash while at the same time disallowing floats as HashMap keys?

Topic		Replies	Views
Hash/Eq on type: `for<'a> fn(&'a u32)`	3	940	March 25, 2019
Hashes for caches: is it dirty? libs	10	1312	August 9, 2021
Finite{F32,F64} in std? libs	2	1016	April 5, 2021
(Pre-?)Pre-RFC: Range-restricting wrappers for floating-point types language design	13	3960	March 25, 2019
Shared traits / method(self) vs method(&self) for i32, i64 ...f32 f64 language design	15	1898	May 12, 2022

F32/f64 should implement Hash

Related topics