[Pre-RFC] Hash for ZST slices Hash not to iterate

Summary

  • Problem: Current implementation of Hash for ZST slices iterates over items. That times out for very large ones (usize::MAX).
  • Suggested change: Hash::hash_slice(...) not to iterate (and not to call hash on its items) when the type size is zero. I'm willing to create an RFC if needed, and a pull request with some tests (though any regressions could mean compilation timeouts). Zero cost: The added if-else should optimize away.

Example

use core::hash::{BuildHasher, Hash, Hasher};
use std::hash::RandomState;

const MAX_ZST_ARRAY: [(); usize::MAX] = [(); usize::MAX];

fn main() {
    let hasher_builder = RandomState::new();
    let mut hasher = hasher_builder.build_hasher();

    MAX_ZST_ARRAY.hash(&mut hasher);
    let hash = hasher.finish();
    println!("hash: {hash}");
}

Current implementation

hash for [T] calls

  • state.write_length_prefix(self.len());
  • Hash::hash_slice(self, state)

Since ZST's don't carry any data, it's fair to assume that they don't write anything in their Hash::hash - like hash(...) for (). Well behaving ZST's could write some constants, but the hash would still have the same zero entropy/quality. They could write some thread-specific (Thread-local-based) constant values (if the ZST is !Send), and the current Hash implementation for slices does honor that - but then all their instances (in the same thread) are equal anyway, so the entropy/quality within the thread that they're bound/limited to is zero again.

Backwards-compatible

Hash::hash_slice(...) wisely says that "implementation is also explicitly left unspecified".

hash for () does have #[inline], but it's not optimized away even in release (cargo run --release on x86_64 Linux, both stable 1.89.0 and 1.91.0-nightly).

Motivation

Using usize::MAX (or comparable) ZST slice/array is "crazy", but it's safe code with well defined behavior. It allows a simple Hash-DoS-free way of custom signalling from the Hash implementation back to a custom hasher. And it still works with other hashers, too.(This specific use-case could use Hasher:: write_length_prefix, but that is behind hasher_prefixfree_extras feature - nightly.)

This also show the power of Rust types, and ZST's/non-obvious-types, not just in compile time (when they do shine and are heavily used already), but in runtime, too.

2 Likes

How realistic is hashing an array of ZSTs, more so that big? DOS is possible only if your program does that. I don't think the standard library should optimize for this case.

IMO it sounds more reasonable to fix this than special casing hashing behavior. It would also potentially benefit other kind of noop loops over ZST slices than just hashing.

13 Likes