What would be your opinion on including an associated list in std::collection?

HjVT · December 8, 2023, 9:55pm

I've had a discussion over at Rust community discord about what could theoretically be the Dictionary type, and realized that one of the options, that is probably the best for very small data sizes and has an advantage of only requiring ~~PartialEq~~ Eq for keys, is not included in the standard library.

pitaj · December 8, 2023, 10:17pm

What do you mean by "associated list", what would the API look like, and what are the benefits over the existing collections?

HjVT · December 8, 2023, 10:18pm

Basically a wrapper over Vec<(K, V)>, with a map-like API for insertion and removal. Benefit over both is not requiring either Hash nor ~~Eq~~ Cmp, and lack of indirection to the data. Tradeoff being, lookup and insertion are O(n) in size.

pitaj · December 8, 2023, 11:17pm

How would it work without Eq? Do you mean Ord?

ExpHP · December 8, 2023, 11:18pm

Why would it only require PartialEq? Wouldn't that allow it to e.g. fill up with many NaN keys?

scottmcm · December 9, 2023, 12:13am

There's lots of possible spots in the trade-off spectrum, but not all of them need to be in alloc.

I think a FlatMap<K, V> being in a crate -- the same way a SmallVec<T> is just in a crate -- is fine.

Especially since .iter().find(…) works pretty well even without a wrapper.

tgross35 · December 9, 2023, 12:24am

I think the term VecMap is a little better because “dictionaries” in a lot of languages are our HashMap.

There is a crate that provides this vecmap - Rust

I have definitely used this in the past, more lightweight than a HashMap and less indirection than a BTree. It wouldn’t be bad to have something easy to drop into place.

That being said, the implementation is so basic I’m not sure it merits a place in std. For Ord things, binary_search_by_key already gives you the position to insert if lookup fails. For !Ord things, .iter().find(…) works. It’s trivial to write a few lines of code wrapper around either of these things to suite your application’s needs, without std needing to be opinionated about to Ord or not to Ord.

Maybe an example of these patterns in the book somewhere is a better fit than a new data type? Or maybe we could add methods to a impl Vec<(K, V)> without adding a new type, such as get_or_insert.

HjVT · December 9, 2023, 12:35am

As a previous art in adding methods to Vec, there's assoc

epage · December 9, 2023, 3:36am

For another example implementation to demonstrate different trade offs: https://github.com/clap-rs/clap/blob/master/clap_builder/src/util/flat_map.rs

This is a (Vec<K>, Vec<V>)

Better cache locality for key lookups
Smaller binary size because you are more likely to share monomorphized Vec code

dhardy · December 10, 2023, 6:38pm

There's also linear-map which (according to Cargo) is more popular, but old and unmaintained (thus e.g. fn Entry::or_default is missing).

Something I have used, if a little niche. However, it's simple and a decent choice of "map" type when the length is known to be very small, so... either including or not including seems fine IMO.

toc · December 11, 2023, 2:05pm

And on another axis this gets very close to various DataFrame/Series implementations.

zackw · December 12, 2023, 6:41pm

Nitpick: the term for this in Lisp is "association list" or "pairlist", not "associated list". And I think "associated list" sounds too much like "associated type" for clarity.

"The key type only needs to be Eq, not Ord or Hash" seems like a solid reason to add this to the stdlib: if you know you need a map where the keys only have to be Eq, but you don't know what this is usually called, and you have to hunt for something in crates.io, it could be quite hard to find. I don't know if it's a solid enough reason, especially considering the Vec<(K,V)> vs (Vec<K>, Vec<V>) tradeoffs.

pitaj · December 12, 2023, 7:39pm

I will note that it's also possible to make a variant which uses a single allocation while holding keys and values in a more optimal way. Essentially Box<([K; n], [V; n])>

HjVT · December 13, 2023, 1:14am

Im not too sure it's actually more optimal, having worked on and benchmarked a similarly laid out container, my gut tells me it will be within margin of more naive 2 Vec approach. While complexity of bookkeeping forsuch an allocation is not insignificant.

pitaj · December 13, 2023, 4:55am

What bookkeeping? All you have to do is store two pointers, one capacity, and one length which is practically the same as the (Vec<K>, Vec<V>) case. All of the tricky stuff you only have to deal with on reallocation.

That said, I don't expect it to be much faster unless you keep the whole structure within a single cache line or something.

HjVT · December 13, 2023, 5:24am

Capacity is a bit tricky to decide. Also keys and values could easily have different alignment requirements.

scottmcm · December 13, 2023, 5:46am

It's not that big of a deal. You can do

use std::alloc::{Layout, LayoutError};
pub fn layout<K, V>(cap: usize) -> Result<(Layout, usize), LayoutError> {
    let a = Layout::array::<K>(cap)?;
    a.extend(Layout::array::<V>(cap)?)
}

to get the allocation size & alignment needed and the offset to the array of values.

HjVT · December 13, 2023, 5:59am

But this ignores possible over-allocation.

pitaj · December 13, 2023, 4:14pm

Why? The capacity can just be treated like length. Same for both component arrays.

What do you mean by over-allocation? Rust allocators don't return the size of the actual allocation, so all you can assume is that it's at least the size you asked for.

HjVT · December 14, 2023, 1:45am

Unstable Allocator trait in fact does return length.

Topic		Replies	Views
Collection Traits, Take 2	29	7471	May 8, 2019
pre-RFC: `and_then` method on `std::collections::hash_map::Entry` libs	4	1280	March 25, 2019
Std library inclusion policy	2	943	March 25, 2019
Traits for common data structures libs	45	3116	July 1, 2021
BTreeSet/BTreeMap: Custom Ord libs	21	3619	March 25, 2019

What would be your opinion on including an associated list in std::collection?

Related Topics