Redoing hashing traits

The current Hash and Hasher traits have some flaws:

  • Hasher isn't really usable for order independent hashing, as is needed for impl Hash for HashSet and other types that act like sets or bags that don't have a good way to order their elements. this can be fixed with an extension trait, but that's not usable with Hash.
  • Hash isn't usable with unsized Hashers due to the oversight of missing H: ?Sized which can't be fixed due to stability.

therefore, I think it would be useful to add new hashing traits (names up for debate), maybe have blanket impls for the old traits in terms of the new traits (only maybe because it depends on for<T> syntax), and deprecate the old traits:

pub trait NewHash<H: ?Sized + Hasher> {
    fn hash(&self, h: &mut H);
}

pub trait UnordHasher: Hasher + Clone {
    /// merges the state of `other` into `self` in an order-independent way,
    /// such that for any permutation of `hashers`,
    /// `hashers.into_iter().reduce(|mut a, b| { a.join(b); a })` returns the same value.
    fn join(&mut self, other: Self);
}

impl<T: ?Sized> Hash for T
where
    for<H: Hasher> T: NewHash<H>,
{
    fn hash<H: Hasher>(&self, h: &mut H) {
        NewHash::hash(self, h);
    }
}

then HashSet<T> (other generics omitted for succinctness) could implement hashing like:

impl<T, H: OIHasher> NewHash<H> for HashSet<T> {
    fn hash(&self, h: &mut H) {
        self.len().hash(h);
        let fork_point = h.clone();
        for element in self {
            let mut child_hasher = fork_point.clone();
            element.hash(&mut child_hasher);
            h.join(child_hasher);
        }
    }
}
2 Likes

maybe a better interface is:

trait UnordHasher: Hasher {
    type Fork: UnordHasher;
    fn fork(&self) -> Self::Fork;
    /// merges the state of `fork` into `self` in an order-independent way,
    /// such that for any permutation of `hashers`,
    /// `for h in hashers { self.join(h); }` leaves `self` in the same state.
    fn join(&mut self, fork: Self::Fork);
}

this allows having forked hashers start in a different state than self's current state, e.g. if you had a SHA256 hasher, you might want each forked hasher to be its own independent SHA256 rather than duplicating the internal state which SHA256 wasn't designed to do.

Is this hasher meant to be usable for stable hashes? Stable in the sense that it's the same across program runs (if the same seed is provided), platforms, and Rust versions.

Is this hasher meant to be usable in cryptographically secure signatures? It's not cryptographically secure to just XOR hashes of elements in a HashSet (ability to control some but not all bits of a hash for each element is enough for the attacker to make the whole set have any hash they want).

yes...if the NewHash implementations provide that property. I wasn't planning on that, but we could definitely do that.

yes...if the hasher provides that property and, if desired, a custom finish for wider than u64 output. UnordHasher specifically delineates the spots where it's merging states in an order-independent way (a sequence of join calls on one hasher without any write* calls in between), so the hasher could always implement a valid cryptographically secure hashing algorithm, e.g. storing a list of hashes to merge and sorting them before writing them (assuming that's secure), so e.g. if SHA256MerkleTree is implemented that way, then:

fn f(h: &mut SHA256MerkleTree, v: HashSet<Foo>) {
    v.hash(h);
}

could be equivalent to:

fn f(h: &mut SHA256MerkleTree, v: HashSet<Foo>) {
    let mut hashes: Vec<[u8; 32]> = v.iter().map(|i| {
        let mut h2 = SHA256MerkleTree::default();
        i.hash(&mut h2);
        h2.finish_256bit()
    }).collect();
    hashes.sort();
    hashes.hash(h);
}