Head-Desking on Entry API 4.0

Gankra · May 28, 2015, 6:36pm

The Entry API provides a way for a user of a map to hook into the search/insertion/removal logic and avoid unneccessary work. Specifically, it enables you to perform conditional logic based on whether the key is already present in the map or not. Generally, this is done by capturing the state of the insertion algorithm after it identifies whether the given key is present or not.

However there are two competing initialization scenarios for a consumer of this API:

I have a by-value key ready, and I don’t need it anymore. I would like to feed it directly into the Entry to avoid any further clones.
I have a by-ref key ready, and I don’t want to clone it unless an insertion is actually necessary.

Respectively, these APIs want:

by-val: fn entry(key: K) -> Entry<K, V>
by-ref: fn entry<Q: ?Sized>(query: &Q) -> Entry<Q, K, V> where K: Borrow<Q>

And each is suboptimal for the other’s usecase.

If I have a by-val key and the by-ref API, I’m forced to pass in the key by-ref, causing the API to needlessly clone the key if an insertion occurs.
If I have a by-ref key and the by-val API, I’m forced to pass in the key by-val, causing me to needlessly clone the key if an insertion doesn’t occur.

Initially, the API was by-val simply because it pre-existed std::borrow. After std::borrow came to be, it was decided that the by-val-must-clone-on-insert API was the “least bad” one from a performance stand-point, on the premise that anyone using the Entry API expects lots of duplicate keys. However it was discovered that certain Map instances couldn’t work with the by-ref API, and the change was reverted.

This was not-great but generally hand-waved as ok because there was nothing preventing a hypothetical unification of the interfaces with the right traits and blanket impls.

Entry API 4.0 is supposed to be that API:

The basic idea is simple: accept IntoCow, which generalizes over owned and borrowed data, with the ability to upgrade borrowed data to Owned. Unfortunately, this doesn’t work, because IntoCow lacks the necessary blanket impls (and it’s not clear that this would be possible with coherence).

Instead, I considered adding a new method, entry_lazy which takes borrowed data. We then unify these two APIs through Cow, providing a uniform API and implementation with two trivial entry-points. The actual changes to the implementation are completely trivial: just add call borrow in set up and into_owned in VacantEntry::insert.

Unfortunately this has a negative side-effect: the value in the entry method needs to be feed-able to a Cow. This means we need to be able to declare a type for the value to Borrow to. Naively this is fairly easy: K is Borrow<K> for all K. However Cow also requires that the Borrow is ToOwned<Owned=K>, which only is true when K: Clone. The end result is that we have to specify that K is Clone, even though this is specifically the entry method which will never call Clone.

In a fever dream I produced a glorious hack to solve this. Just add this to the std::borrow:

    // A type that cannot be realized, for making Cows that are 
    // statically guaranteed to be Owned
    pub struct FakeBorrow<T>(PhantomData<T>);
   
    impl<T> Borrow<FakeBorrow<T>> for T { 
        fn borrow(&self) -> &FakeBorrow<T> { panic!("Noooooo") }
    }

    impl<T> ToOwned for FakeBorrow<T> { 
        type Owned=T; 
        fn to_owned(self) -> T { panic!("Nooooooo") }
    }

Now entry just makes a Cow<'static, FakeBorrow<K>>, and everyone’s happy. Right???

Unfortunately this is totally busted as soon as the code tries to call moo.borrow() to search the structure. The exact Entry API as written has this as a non-problem, though! Simply directly match on the Cow enum an instead of borrowing in the Owned case, just take a reference to it! This creates some nasty logic and monomorphization duplication as the Borrow case works with a different value from the Owned case, but in principle it should work (haven’t tested this design out fully).

It may have a bad interaction with future changes to the Entry API that expose the key argument for richer delayed computation of the value, though. Also obviously FakeBorrow is a big huge hack.

Alternatively we can take entry_lazy to its logical extreme and just make a duplicate parallel Entry API with all the same interface but different types.

I dunno, I’ve been bashing my head against this too much.

aturon · May 28, 2015, 6:46pm

To be clear, the thought was that we'd be able to use IntoCow at some later date, with specialization in hand. It's worth re-checking whether this is the case. But OTOH, if you are wanting to land these changes in the near term, the IntoCow idea probably isn't going to work very well.

Couple things. First, I don't think there's a strong reason that the Cow type itself needs this constraint, and it's backcompat to remove it (from the enum). That should help.

Second, more minor: ToOwned doesn't require Clone; it's just that there's a blanket impl for Clone data.

aturon · May 28, 2015, 6:48pm

I wonder if FakeBorrow could be defined differently: the problem isn't with borrowing, it's with to_owned. So maybe FakeBorrow can wrap a real borrow, and provide the borrow method, but panic on to_owned?

seanmonstar · May 28, 2015, 7:30pm

Teeny correction: IntoCow is now Into<Cow>.

Gankra · May 28, 2015, 8:12pm

Sadly(?) IntoCow lives to this day.

seanmonstar · May 28, 2015, 9:30pm

I… what. There’s also Into<Cow> implementations for str and [T] and so forth. And those are stable.

aturon · May 28, 2015, 9:58pm

OK, so that was wrong -- I had forgotten that the motivation was in part to leverage the associated type in ToOwned to make Cow only require the single borrowed type parameter.

Thinking about this a bit more, I am wondering whether the following strategy might work for internal sharing of code, which would work over some type KQ: MaybeOwned<Q, K>:

struct Borrowed<'a, B: 'a>(&'a B);
struct Owned<O>(O);

trait MaybeOwned<B: ?Sized, O> {
    fn as_borrowed(&self) -> &B;
    fn into_owned(self) -> O;
}

impl<'a, B> MaybeOwned<B, B::Owned> for Borrowed<'a, B> where B: ToOwned {
    fn as_borrowed(&self) -> &B {
        self.0
    }

    fn into_owned(self) -> B::Owned {
        self.0.to_owned()
    }
}

impl<B, O> MaybeOwned<B, O> for Owned<O> where O: Borrow<B> {
    fn as_borrowed(&self) -> &B {
        self.0.borrow()
    }

    fn into_owned(self) -> O {
        self.0
    }
}

EDIT: to further clarify, each version of the entry API would invoke the code with the key/query value wrapped in one of the two newtypes.

Gankra · June 3, 2015, 5:45pm

Sorry it took me a while to respond to this. Reading over it again, I’m not sure how this solves the problem?

system · March 25, 2019, 8:24am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Entry API requires eager allocation of key, would be nice to have a lazy version libs	7	786	February 18, 2024
Thoughts about the Entry API language design	8	1395	March 25, 2019
pre-RFC: `and_then` method on `std::collections::hash_map::Entry` libs	4	1283	March 25, 2019
`Entry::or_insert_with_key` libs	1	675	November 25, 2019
[Pre-RFC] HashMap/BTreeMap Entry method or_insert_with_key libs	5	817	July 9, 2020

Head-Desking on Entry API 4.0

Related topics