The question is whether using these functions can lead to other safe code invoking undefined behavior. For example, Vec::set_len
does not invoke undefined behavior, but a consequence of using it incorrectly is that calling some other safe Rust methods might invoke undefined behavior. That is, Vec::set_len
allows you to break Vec
's invariants in such a way that it could lead to undefined behavior in safe Rust later on.
My understanding is they canāt. HashMap
is currently implemented entirely using safe code* by making it a wrapper around RawTable
(which is basically "just a tricked out Vec<Option<(u64, K, V)>>
"). The changes weāve been discussing would presumably all be at the HashMap level and thus be similarly safe.
*except for the the experimental implementation of collection_placement
@fintelia does this mean that none of the proposed changes can result in undefined behavior being invoked, and that the only consequences of breaking the invariants of these collections is potential logic errors?
Yes, the proposed changes can introduce potential logic errors but cannot trigger undefined behavior. In fact, the potential logic errors are the same ones that are already possible if the key's Hash
implementation does not respect the required invariants. (e.g. you'll see some very strange behavior if keys that are equal can produce hashes that aren't.)
Then thatās great news, they donāt even need to be unsafe
. A doc warning would be enough.
@Gankra do you think thereās any chance we might see this kind of lower-level API surface? Weāre currently relying on a painful-to-maintain fork of HashMap
in rahashmap to get randomized eviction from a map for use in a cacheā¦
Iāve had https://github.com/rust-lang/rust/pull/50821 up for a while, but iāve been waiting on more libs team feedback to finish it.
Thatās awesome! Looks like @alexcrichton followed up on it with a question about the safety of the API?
They closed it just now saying that they were waiting for you? Seems like thereās been some miscommunication here?
In the rust-lang/rust repo the triage team will close any PR where the author is unresponsive for more than two weeks. This doesnāt mean the PR wonāt be accepted, itās just a way for us to keep the list of open PRs clean. If the author wants to continue working on that PR weāll be happy to reopen it!
Iād honestly like to see entry_ref
, which can just call to_owned()
if needed. I think itās worth having that via a safe method, rather than just going straight to raw_entry
.
There might be more value in exploring generic versions of the wrapper types you want to avoid than in merely fixing the entry API.
First, we should consider a Cow
-centric entry API like:
impl<K, V, S> HashMap<K, V, S> where K: Eq + Hash, S: BuildHasher
{
pub fn entry_cow<'a,Q>(&'a mut self, k: Cow<'a,Q>) -> Entry<'a,K, V>
where Q: ?Sized + Hash + Eq + ToOwned<Owned=K> { .. }
pub fn entry(&mut self, key: K) -> Entry<K, V> where K: Clone {
self.entry_cow(Cow::Owned(key))
}
Weād use Cow
internally to VacantEntry
too of course.
pub struct VacantEntry<'a, K: 'a, V: 'a> {
hash: SafeHash,
key: Cow<'a,K>,
elem: VacantEntryState<K, V, &'a mut RawTable<K, V>>,
}
Is the only issue here the K: Clone
restriction implicit in Cow
? If you make an Entry
using a borrowed key then sometimes you must either make the owned version or return an error, but the error case no longer requires entry. If you need to make an owned version, then basically you do have functionality like ToOwned
, but the problem is that doing ToOwned
temporarily might require two wrapper types and six trait impls, yes?
We could define another Cow
-like type that specifies the function explicit:
pub enum OneStep<A,B,F: fn(A) -> B> {
First(pub A),
Second(pub B)
}
pub fn one_step<F: F: fn(A) -> B>(a: B,f: F) -> OneStep<A,B,f> { OneStep(a) }
We have a canonical mapping that makes .into()
work:
impl<'a,B: 'a + ToOwned + ?Sized> From<Cow<'a,B>> for OneStep<&'a B, <B as ToOwned>::Owned,<B as ToOwned>::to_owned> { .. }
and
pub fn entry_borrowed<'a,Q,F: fn(&'a Q) -> K>(&'a mut self, k: OneStep<&'a Q,K,F>) -> Entry<'a,K, V>
where Q: ?Sized + Hash + Eq { .. }
pub fn entry_ref<'a,Q>(&'a mut self, k: &Q) -> Entry<'a,K, V>
where Q: ?Sized + Hash + Eq + ToOwned<Owned=K> {
self.entry_borrowed(Cow::Borrowed(k).into())
}
pub fn entry(&mut self, key: K) -> Entry<K, V> {
fn do_nothing(k: K) { k }
self.entry_borrowed(one_step(k,do_nothing))
}
There are existing traits like std::ops::Generator<Yield=&'a Q,Return=K>
that kinda work too, albeit imperfectly.
I dislike users having control over the hash so directly too, not because they make mistakes, but because itāll increase their stress level. Iām wondering if some simpler hashing interface plus another state transition might simplify this, ala
pub trait SimpleHasher64<BuildHasher> {
fn hashed(&self) -> u64;
}
pub struct PreHashed64<T,B: BuildHasher = RandomState>(T, Option<u64>)
impl<T,B: BuildHasher> SimpleHasher<B> for PreHashed64<T,B> { .. }
In any case, these issues both look like very simple one-step state machines provide the right abstraction, even if we want the state machine to always compile down to nothing.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.