Generalize Hasher to different outputs

tczajka · October 1, 2021, 5:47pm

Hasher::finish returns u64. It would be more useful to have this output be more generic.

For instance, you might want to hash something with a cryptographic hasher, returning 256 bits. As long as Hash for your type provides different outputs for different inputs (not necessarily true today, but see this issue), using such a Hasher<Output=u256> would give a cryptographic check that some value is what is expected.

So you would have something like:

trait Hasher {
    type Output;
    fn finish(&self) -> Self::Output;
    ...
}

Not sure if this can be changed in a backward-compatible way though?

kpreid · October 1, 2021, 6:37pm

If the goal is purely to use a specific hash algorithm on a specific value, then modifying Hasher is not strictly necessary: you can let finish() return a truncated value (or even unimplemented!()), and define a different method on the concrete Hasher implementation to return the complete value. That is, Hasher provides the example

fn calculate_hash<T: Hash>(t: &T) -> u64 {
    let mut s = DefaultHasher::new();
    t.hash(&mut s);
    s.finish()
}

which can be rewritten as

fn calculate_hash<T: Hash>(t: &T) -> [u8; 32] {
    let mut s = My256Hasher::new();
    t.hash(&mut s);
    s.finish_256()
}

where finish_256 is an inherent method of My256Hasher. Of course, this is an inelegant workaround, but I thought it worth mentioning given the potential difficulty of standard library changes.

(Unfortunately, this abuse of the trait is observable by the type implementing Hash, because it's allowed to call finish() on the Hasher it is given, since finish() takes &self rather than self. If only Hasher was a write-only trait, we wouldn't have that problem or the original one…)

scottmcm · October 1, 2021, 6:54pm

It would almost work to do trait Hasher = CustomHasher<Output = u64>;, but implementing traits through trait aliases isn't supported, so that would be incompatible.

The other option would be associated type defaults, but I don't think those work quite how would be needed here either?

kornel · October 1, 2021, 7:17pm

libstd's Hasher is meant for HashMap hashing, not for cryptography.

I think this is risky, because for a hash map there's no harm in implementing Hash trait poorly (e.g. skipping some fields of a struct, or even not hashing entirely), or inconsistently across platforms or crate versions.

But for cryptographic purposes skipping hashing of a field may be a disastrous security vulnerability. Difference in endianness or the size of usize may cause data verification problems.

So I think it's good that Hasher is unsuitable for cryptography.

There's Digest for that. I combine it with bincode for hashing structs.

tczajka · October 1, 2021, 7:18pm

Since implementations of Hash should only write data, optimally Hash wouldn't write to a Hasher, they should serialize their data to some sort of byte-stream trait (similar to std::io::Write), and Hasher<Output> would derive from that.

tczajka · October 1, 2021, 8:16pm

I have other use case examples other than cryptographic hashes:

32-bit hashes may be sufficient for many purposes and faster to calculate (on 32-bit platforms, and even on 64-bit platforms with some hashing algorithms)
I have used 96-bit hashes to reduce the probability of any collisions at all, so that I can store these hashes rather than full keys and be confident this scheme will extremely rarely fail. With 64-bit hashes you run into significant collision probabilities once you have billions of keys.

quinedot · October 1, 2021, 9:55pm

Making that a hard requirement would be, in and of itself, a backwards-incompatible change.

You can't change the return type of finish backwards-compatibly, so it would have to be a new method. The method would have to have a default implementation so as to not break current implementers; therefore it would need to return an Option or have a panicking default implementation or be restricted by bounds, or similar. Associated type defaults aren't stable, so (currently) it would have to be a generic parameter with a default.

Putting all that together, you get something like

trait Hasher<Supplemental=std::marker::Infalliable> {
    fn supplemental(&self) -> Option<Supplemental> { None }
    // or maybe
    fn supplemental(&self) -> Supplemental {
        panic!("This type did not implement Hasher::supplemental")
    }
    // ... everything else there today stays the same ...
}

On the arguably-plus-side, Supplemental doesn't override u64 and you can implement more than one type of Supplemental (maybe I can give you 32 bits or 256 bits).

Alternatively, if we look at extending Hasher to larger sizes specifically:

trait Hasher<Output=u64> where u64: Into<Output> {
    // New implementers are expected to override this
    fn output(&self) -> Output {
        self.finish().into()
    }
}

For smaller sizes, returning u64 is sufficient today (though not ideal). Alternatively, a new trait that also allowed for smaller sizes could take the place of the Into bound.

Collisions being panic-inducing is another use-case outside of Hashers intended use.

All in all, I'm not sure Hasher is the place for what you want; std's Hasher need not tackle every use-case of hashing. Perhaps you really need a custom sub-trait that requires the guarantees and API you desire.

CAD97 · October 1, 2021, 11:27pm

Because the Hash trait takes &impl Hasher, even if Hasher is generalized, Hash would still require Output = u64. Additionally, an associated type might be the wrong way to generalize Hasher anyway; hashers like HighwayHash can output 64, 128, or 256 bits of entropy from the same hashing process. (IIUC, for a strong hash, you can XOR equal-sized sections of the hash together to get a smaller hash that is similarly strong. Highway does something smarter, though, and I am not a cryptogropher.) That wants for a type input (parameter), not output (associated).

Ultimately, I think the way to handle larger hash outputs in the shorter term is what HighwayHash does: just have intrinsic methods for pulling out more of the calculated entropy.

In the longer term, perhaps (a const generic future form of) Digest is the way to go, I don't know!

system · December 30, 2021, 11:27pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Make Hasher portable? (mini RFC) libs	37	2743	March 25, 2019
Low-latency hashing libs	11	1358	December 18, 2024
Redoing hashing traits libs	3	371	April 8, 2025
Surprising interaction between zero sized structs and Hash libs	11	1776	March 25, 2019
Why are the Read/Write traits so different libs	10	1473	March 25, 2019

Generalize Hasher to different outputs

Related topics