[Pre-RFC] Abandonning Morals In The Name Of Performance: The Raw Entry API


#41

The question is whether using these functions can lead to other safe code invoking undefined behavior. For example, Vec::set_len does not invoke undefined behavior, but a consequence of using it incorrectly is that calling some other safe Rust methods might invoke undefined behavior. That is, Vec::set_len allows you to break Vec's invariants in such a way that it could lead to undefined behavior in safe Rust later on.


#42

My understanding is they can’t. HashMap is currently implemented entirely using safe code* by making it a wrapper around RawTable (which is basically "just a tricked out Vec<Option<(u64, K, V)>>"). The changes we’ve been discussing would presumably all be at the HashMap level and thus be similarly safe.

*except for the the experimental implementation of collection_placement


#43

@fintelia does this mean that none of the proposed changes can result in undefined behavior being invoked, and that the only consequences of breaking the invariants of these collections is potential logic errors?


#44

Yes, the proposed changes can introduce potential logic errors but cannot trigger undefined behavior. In fact, the potential logic errors are the same ones that are already possible if the key’s Hash implementation does not respect the required invariants. (e.g. you’ll see some very strange behavior if keys that are equal can produce hashes that aren’t.)


#45

Then that’s great news, they don’t even need to be unsafe. A doc warning would be enough.


#46

@Gankro do you think there’s any chance we might see this kind of lower-level API surface? We’re currently relying on a painful-to-maintain fork of HashMap in rahashmap to get randomized eviction from a map for use in a cache…


#47

I’ve had https://github.com/rust-lang/rust/pull/50821 up for a while, but i’ve been waiting on more libs team feedback to finish it.


#48

That’s awesome! Looks like @alexcrichton followed up on it with a question about the safety of the API?


#49

They closed it just now saying that they were waiting for you? Seems like there’s been some miscommunication here?


#50

In the rust-lang/rust repo the triage team will close any PR where the author is unresponsive for more than two weeks. This doesn’t mean the PR won’t be accepted, it’s just a way for us to keep the list of open PRs clean. If the author wants to continue working on that PR we’ll be happy to reopen it!


#51

I’d honestly like to see entry_ref, which can just call to_owned() if needed. I think it’s worth having that via a safe method, rather than just going straight to raw_entry.


#52

There might be more value in exploring generic versions of the wrapper types you want to avoid than in merely fixing the entry API.

First, we should consider a Cow-centric entry API like:

impl<K, V, S> HashMap<K, V, S> where K: Eq + Hash, S: BuildHasher
{
    pub fn entry_cow<'a,Q>(&'a mut self, k: Cow<'a,Q>) -> Entry<'a,K, V>
    where Q: ?Sized + Hash + Eq + ToOwned<Owned=K> { .. }

    pub fn entry(&mut self, key: K) -> Entry<K, V> where K: Clone {
        self.entry_cow(Cow::Owned(key))
    }

We’d use Cow internally to VacantEntry too of course.

pub struct VacantEntry<'a, K: 'a, V: 'a> {
    hash: SafeHash,
    key: Cow<'a,K>,
    elem: VacantEntryState<K, V, &'a mut RawTable<K, V>>,
}

Is the only issue here the K: Clone restriction implicit in Cow? If you make an Entry using a borrowed key then sometimes you must either make the owned version or return an error, but the error case no longer requires entry. If you need to make an owned version, then basically you do have functionality like ToOwned, but the problem is that doing ToOwned temporarily might require two wrapper types and six trait impls, yes?

We could define another Cow-like type that specifies the function explicit:

pub enum OneStep<A,B,F: fn(A) -> B> {
    First(pub A),
    Second(pub B)
}
pub fn one_step<F: F: fn(A) -> B>(a: B,f: F) -> OneStep<A,B,f> { OneStep(a) }

We have a canonical mapping that makes .into() work:

impl<'a,B: 'a + ToOwned + ?Sized> From<Cow<'a,B>> for OneStep<&'a B, <B as ToOwned>::Owned,<B as ToOwned>::to_owned> { .. }

and

    pub fn entry_borrowed<'a,Q,F: fn(&'a Q) -> K>(&'a mut self, k: OneStep<&'a Q,K,F>) -> Entry<'a,K, V>
    where Q: ?Sized + Hash + Eq { .. }

    pub fn entry_ref<'a,Q>(&'a mut self, k: &Q) -> Entry<'a,K, V>
    where Q: ?Sized + Hash + Eq + ToOwned<Owned=K> {
        self.entry_borrowed(Cow::Borrowed(k).into())
    }

    pub fn entry(&mut self, key: K) -> Entry<K, V> {
        fn do_nothing(k: K) { k }
        self.entry_borrowed(one_step(k,do_nothing))
    }

There are existing traits like std::ops::Generator<Yield=&'a Q,Return=K> that kinda work too, albeit imperfectly.

I dislike users having control over the hash so directly too, not because they make mistakes, but because it’ll increase their stress level. I’m wondering if some simpler hashing interface plus another state transition might simplify this, ala

pub trait SimpleHasher64<BuildHasher> {
    fn hashed(&self) -> u64;
}
pub struct PreHashed64<T,B: BuildHasher = RandomState>(T, Option<u64>)
impl<T,B: BuildHasher> SimpleHasher<B> for PreHashed64<T,B> { .. }

In any case, these issues both look like very simple one-step state machines provide the right abstraction, even if we want the state machine to always compile down to nothing.