[Pre-RFC] Abandonning Morals In The Name Of Performance: The Raw Entry API

gnzlbg · March 29, 2018, 10:53pm

The question is whether using these functions can lead to other safe code invoking undefined behavior. For example, Vec::set_len does not invoke undefined behavior, but a consequence of using it incorrectly is that calling some other safe Rust methods might invoke undefined behavior. That is, Vec::set_len allows you to break Vec's invariants in such a way that it could lead to undefined behavior in safe Rust later on.

fintelia · March 29, 2018, 11:28pm

My understanding is they can’t. HashMap is currently implemented entirely using safe code* by making it a wrapper around RawTable (which is basically "just a tricked out Vec<Option<(u64, K, V)>>"). The changes we’ve been discussing would presumably all be at the HashMap level and thus be similarly safe.

*except for the the experimental implementation of collection_placement

gnzlbg · April 3, 2018, 1:32pm

@fintelia does this mean that none of the proposed changes can result in undefined behavior being invoked, and that the only consequences of breaking the invariants of these collections is potential logic errors?

fintelia · April 6, 2018, 5:09pm

Yes, the proposed changes can introduce potential logic errors but cannot trigger undefined behavior. In fact, the potential logic errors are the same ones that are already possible if the key's Hash implementation does not respect the required invariants. (e.g. you'll see some very strange behavior if keys that are equal can produce hashes that aren't.)

gnzlbg · April 7, 2018, 8:59am

Then that’s great news, they don’t even need to be unsafe. A doc warning would be enough.

jonhoo · June 4, 2018, 5:17pm

@Gankra do you think there’s any chance we might see this kind of lower-level API surface? We’re currently relying on a painful-to-maintain fork of HashMap in rahashmap to get randomized eviction from a map for use in a cache…

Gankra · June 4, 2018, 5:41pm

I’ve had https://github.com/rust-lang/rust/pull/50821 up for a while, but i’ve been waiting on more libs team feedback to finish it.

jonhoo · June 4, 2018, 5:58pm

That’s awesome! Looks like @alexcrichton followed up on it with a question about the safety of the API?

jonhoo · June 13, 2018, 12:59pm

They closed it just now saying that they were waiting for you? Seems like there’s been some miscommunication here?

pietroalbini · June 13, 2018, 1:09pm

In the rust-lang/rust repo the triage team will close any PR where the author is unresponsive for more than two weeks. This doesn’t mean the PR won’t be accepted, it’s just a way for us to keep the list of open PRs clean. If the author wants to continue working on that PR we’ll be happy to reopen it!

josh · September 7, 2018, 8:55pm

I’d honestly like to see entry_ref, which can just call to_owned() if needed. I think it’s worth having that via a safe method, rather than just going straight to raw_entry.

JeffBurdges · September 8, 2018, 9:52am

There might be more value in exploring generic versions of the wrapper types you want to avoid than in merely fixing the entry API.

First, we should consider a Cow-centric entry API like:

impl<K, V, S> HashMap<K, V, S> where K: Eq + Hash, S: BuildHasher
{
    pub fn entry_cow<'a,Q>(&'a mut self, k: Cow<'a,Q>) -> Entry<'a,K, V>
    where Q: ?Sized + Hash + Eq + ToOwned<Owned=K> { .. }

    pub fn entry(&mut self, key: K) -> Entry<K, V> where K: Clone {
        self.entry_cow(Cow::Owned(key))
    }

We’d use Cow internally to VacantEntry too of course.

pub struct VacantEntry<'a, K: 'a, V: 'a> {
    hash: SafeHash,
    key: Cow<'a,K>,
    elem: VacantEntryState<K, V, &'a mut RawTable<K, V>>,
}

Is the only issue here the K: Clone restriction implicit in Cow? If you make an Entry using a borrowed key then sometimes you must either make the owned version or return an error, but the error case no longer requires entry. If you need to make an owned version, then basically you do have functionality like ToOwned, but the problem is that doing ToOwned temporarily might require two wrapper types and six trait impls, yes?

We could define another Cow-like type that specifies the function explicit:

pub enum OneStep<A,B,F: fn(A) -> B> {
    First(pub A),
    Second(pub B)
}
pub fn one_step<F: F: fn(A) -> B>(a: B,f: F) -> OneStep<A,B,f> { OneStep(a) }

We have a canonical mapping that makes .into() work:

impl<'a,B: 'a + ToOwned + ?Sized> From<Cow<'a,B>> for OneStep<&'a B, <B as ToOwned>::Owned,<B as ToOwned>::to_owned> { .. }

and

    pub fn entry_borrowed<'a,Q,F: fn(&'a Q) -> K>(&'a mut self, k: OneStep<&'a Q,K,F>) -> Entry<'a,K, V>
    where Q: ?Sized + Hash + Eq { .. }

    pub fn entry_ref<'a,Q>(&'a mut self, k: &Q) -> Entry<'a,K, V>
    where Q: ?Sized + Hash + Eq + ToOwned<Owned=K> {
        self.entry_borrowed(Cow::Borrowed(k).into())
    }

    pub fn entry(&mut self, key: K) -> Entry<K, V> {
        fn do_nothing(k: K) { k }
        self.entry_borrowed(one_step(k,do_nothing))
    }

There are existing traits like std::ops::Generator<Yield=&'a Q,Return=K> that kinda work too, albeit imperfectly.

I dislike users having control over the hash so directly too, not because they make mistakes, but because it’ll increase their stress level. I’m wondering if some simpler hashing interface plus another state transition might simplify this, ala

pub trait SimpleHasher64<BuildHasher> {
    fn hashed(&self) -> u64;
}
pub struct PreHashed64<T,B: BuildHasher = RandomState>(T, Option<u64>)
impl<T,B: BuildHasher> SimpleHasher<B> for PreHashed64<T,B> { .. }

In any case, these issues both look like very simple one-step state machines provide the right abstraction, even if we want the state machine to always compile down to nothing.

system · March 25, 2019, 8:29am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Entry API requires eager allocation of key, would be nice to have a lazy version libs	7	949	February 18, 2024
Help harden HashMap in libstd! libs	27	4558	March 25, 2019
Pre-RFC: bless the `type FastHashMap` pattern	92	4542	March 25, 2019
Help wanted: fast hash maps in std	29	6341	March 25, 2019
Head-Desking on Entry API 4.0 libs	8	3023	March 25, 2019

[Pre-RFC] Abandonning Morals In The Name Of Performance: The Raw Entry API

Related topics