Make Hasher portable? (mini RFC)

mcy · July 13, 2018, 7:22pm

While I understand that “the compiler does it” is often used as a standard for the “correct” way of doing things in Rust, presumably it’s only like this because this issue was never brought up? It’s already wide-spread to make Hasher platform independent by overriding default trait methods, as mentioned above, but I think the case can be made that this is a bad status-quo (where it’s not clear that a random impl Hasher does not guarantee platform-independence) that should at least be documented. Certainly, it would be nice to be able to say in the trait bound “I want this Hasher to be platform-independent” by saying H: Digest instead of H: Hasher or what-have-you.

After all, in the Before Time, the compiler was full of @Ts.

michaelwoerister · July 13, 2018, 8:55pm

I was just giving a counter-example to the statement saying that the Hasher trait was “useless for anything needing portable output”. I’m not saying that the documentation should not mention any pitfalls. But unless you take care of using a platform-independent hash algorithm implementation too, converting input parameters to little-endian won’t be enough. It might even lead to people wrongly assuming that they can forgo thinking about platform-specific concerns.

Eh2406 · July 13, 2018, 9:48pm

In cargo we also try to use hash for run to run identification, and it is annoying. That is on the same computer, but still; I personally would love to see a split between hash/digest trates, so we can only do the slow “make it consistent” when we are making a digest and use the fast tricks when inserting in a hashmap.

mcy · July 13, 2018, 10:48pm

Yeah, that makes sense!

This makes me think that Digest should be an unsafe trait, to express the contract that it's a platform-specific hasher.

Soni · July 13, 2018, 11:12pm

uh what about compile-time constant hashmaps? those need portable, or at least deterministic, hashes, no?

dhardy · July 14, 2018, 2:28pm

I don't like this terminology. There is for example password hashing, which definitely needs to be portable.

Do we have such a thing in Rust? If we did, they would indeed need to be portable. But there's another issue: HashMap uses a hasher with random key, so for compile time usage the key would need to be fixed. For constants maps that wouldn't matter, but for mutable ones it might make them vulnerable to DoS via hash collisions.

I still think just making Hasher portable by default is the nicest option.

Eh2406 · July 14, 2018, 7:36pm

Indeed there is strong precedent for using hash to mean both things, that is why the hash trait is expected to do all things for all needs. If we decide to be more specific we will be braking with the existing terminology.

mcy · July 14, 2018, 8:49pm

A const constructor for HashMap doesn't make a lot of sense anyways, since HashMap is a glorified Vec. What it sounds like you want is an analogue of [T; n] for HashMap, which, in a compile-time setting, would be immutable: const TABLE: ConstMap<K, V, n> = const_map! { .. };. The correct thing to use for the mutable case is a static mut and maybe a Once or whatever.

Java actually follows the convention I describe: hashCode produces an ephemeral hash, while somewhere deep in javax.crypto you can dig up the Digest class, which supports SHA1 and friends. I think "digest" is prevalent enough as "platform-safe hash" that no one should be confused. And, let's be honest: if you're doing something actually dangerous like dealing with passwords, you should be double-checking that your chosen hash does what you want.

kornel · July 15, 2018, 12:06am

Does the trait guarantee order in which struct fields are hashed? Is it the source order or optimized memory layout order?

Soni · July 15, 2018, 12:45am

I’m talking about the Rust equivalent of “string switches” (as seen in Java).

mcy · July 15, 2018, 12:52am

That's entirely dependent on the struct's Hash implmentation; I don't think #[derive(Hash)] makes any such assurance.

This already exists:

match foo {
    "foo" => ..
}

IIRC this just does the obvious O(n) comparison. Using a hashmap for such a purpose is a bad idea: hashing will usually be slower than checking by equality anyways, and if your match is so big that a hashmap actually nets you a performance gain... your match is too big. A const HashMap is a good idea, but it should definitely not be generated by a match.

IIRC Java doesn't use a hashmap to switch on strings but compiles it to a sequence of invokevirtuals, ifeqs, and gotos to emulate fall-through. It does do this for switching on enums, because the ordinal (a.k.a discriminant) of an enum is not fixed at compile time, because Java is linked at runtime. (Mind, I just ran example code through a disassembler so I could be completely wrong!)

Soni · July 15, 2018, 2:30am

Java didn’t always specify the String hashCode. But then Java 7 happened.

Because Java’s switch (s) {} (where s is a String) is a hashtable.

mcy · July 15, 2018, 4:14am

Fair, I actually wasn’t aware of that. I still don’t think this an appropriate optimization for Rust; Java has the advantage that a lot of strings are interned, and that Java’s String is really a (&'gc [u16], i32), since the result of the hashcode is cached. It sounds to me like what you want is best accomplished by a macro.

Centril · July 15, 2018, 5:10am

Aside: But being able to construct Vecs and push to them inside a const fn does make sense so that you can build more interesting algorithms and execute them at compile. By the same rationale, HashMaps made and used at compile time also is useful.

mcy · July 15, 2018, 5:34am

Oh yes, this is definitely something I spent a long time thinking about while walking home the other day. I was actually wondering if we can somehow make Box<T> and friends abstract over “owned pointer to heap OR data segment”, probably by using the pointers in an invalid memory page. I’d really like to be able to say const XS: Vec<i32> = vec![1, 2, 3]; or something like that.

JeffBurdges · August 5, 2018, 7:51pm

I dislike the byteorder crate and wish the type system could handle this better, but…

I think the distinction between digests and hashes must be maintained because Hash must be fast while Digest can afford more time.

Imagine I’ve a point on an elliptic curve in protective coordinates. What does hashing it mean? A Digest should require an expensive conversion to affine coordinates, ala serialization, but should a Hash really convert to affine coordinates or should we do something faster? Or maybe Cell<ProjectivePoint>, RefCell<ProjectivePoint>, etc. should satisfy Hash but normalize the point? Or maybe the developer should be forced to batch normalize the points first by only having affine points satisfy Hash?

I do think some wrapper types could smooth over the rough edges though:

A pub struct DigestToHash<T: Digest>(pub T) could provide Hash when cryptography crates forget one, but again cryptography types should consider performance issues for Hash.
A SerdeDigest wrapper could provide a Digest for data types that forget one.

JeffBurdges · August 17, 2018, 11:01pm

I suppose Hash makes no sense for ProjectivePoint because anything you’d do is too slow. Eq works, but Hash should be compatible with Eq and fast.

system · March 25, 2019, 8:30am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Generalize Hasher to different outputs libs	8	953	December 30, 2021
Low-latency hashing libs	11	1318	December 18, 2024
Redoing hashing traits libs	3	361	April 8, 2025
Unstable hash architecture ideas (deprecated)	13	3343	March 25, 2019
Global default hash function override for HashMap and HashSet libs	26	1204	August 20, 2024

Make Hasher portable? (mini RFC)

Related topics