BijectionMap (Pre-RFC)

Summary

This RFC suggests a new data structure for Rust, to be implemented later: A Bi-Directional HashMap, or BijectionMap<K, V, S> where both the key and value are hashable and can be looked up. Additionaly, a TrijectionMap is also possible to implement lookups of three values.

Motivation

Many times, people would wish for a simple reverse-lookup function to convert their enums to different values, perhaps a String to their enum values and vice versa. In the current implementation, you would require two hashmaps:

use std::collections::HashMap;

pub enum MyEnum {
    VariantA,
    VariantB,
    VariantC,
    VariantD,
    // and so on...
}

const STRING_TO_MYENUM_MAP: HashMap<&str, MyEnum> = HashMap::from([("a", MyEnum::VariantA), ("b", MyEnum::VariantB), ("c", MyEnum::VariantC), ("d", MyEnum::VariantD)] /* and so on... */);
const MYENUM_TO_STRING_MAP: HashMap<MyEnum, &str> = HashMap::from([(MyEnum::VariantA, "a"), (MyEnum::VariantB, "b"), (MyEnum::VariantC, "c"), (MyEnum::VariantD, "d")] /* and so on...*/);

This is too cumbersome for many people to use. True, they could implement their own method, but could there be a better solution using just one map? They could define it

use std::collections::BijectionMap;

pub enum MyEnum {
    VariantA,
    VariantB,
    VariantC,
    VariantD,
}

const STRING_CONV_MYENUM_MAP: BijectionMap<&str, MyEnum> = BijectionMap::from([("a", MyEnum::VariantA), ("b", MyEnum::VariantB), ("c", MyEnum::VariantC), ("d", MyEnum::VariantD)] /* and so on... */);

and use it here:

let from_string = STRING_CONV_MYENUM_MAP.get_value("a");
let to_string = STRING_CONV_MYENUM_MAP.get_key(MyEnum::VariantA);

A TrijectionMap<V, W, X> can be used much the same way, except for three-way converters.

Guide-Level Explanation

A BijectionMap is a Rust collection that maps keys and values together. Unlike a HashMap, you can look up a BijectionMap both ways, by key or by value.

Say you're developing a program and you need to check usernames and match them up with user emails and vice versa. You can, obviously, use two hashmaps and develop a reverser function. But why do two if you can use just one map?

let USER_BIDI_MAP = BijectionMap::new::<&str, &str<()
for user in users {
    USER_BIDI_MAP.insert(user.username, user.email);
}
let random_user_who_logged_in = "iamaverysmartperson";
let their_email USER_BIDI_MAP.get(random_user_who_logged_in);

It can also be used to convert enums to other values:

pub enum IpResponse<Value, Error> {
   Success(Value),
   Info(Value),
   Err(Error)
}

// let statuscode_equals_ipresponse = blah blah blah
// This can help us get a status code, convert it to an enum, use it in functions, and send it to a user downstream for example

A TrijectionMap works the same way, but it can access three values at one time!

let tmp_map = TrijectionMap::new::<String, i64, MyEnum>()
tmp_map.insert("Hello", 932, MyEnum::VariantA)
// Many, many insertions later:
println!(tmp_map.get_value2_from_value1("Hello"));

Reference-Level Explanation

BijectionMap<K, V, S> will have both Keys and Values implement Eq and Hash, but the API is quite similar to HashMap:

let ikea_bijection_map = BijectionMap::new::<&str, u32>();
ikea_bijection_map.insert("SVALLJARD", 8321789228129);
ikea_bijection_map.insert("STENKOL", 182317198732918);
ikea_bijection_map.insert("VALLHEIOR", 92131831298718);
ikea_bijection_map.insert("HAREDROR", 92311273981273);

// this user wants WELTAEK. Now which one is that?
user.get_key("samanheik288").cart.push_back(ikea_bijection_map.get_key("WELTAEK"));

// one of the delivery agents gave us this product number and this username. What does it match to?
user.get_key("heldros8492").cart.push_back(ikea_bijection_map.get_value(923179812371));

// ATTENTION! SAMANKEIV is no longer being sold!
ikea_bijection_map.remove_key("SAMANKEIV");

// We ran out of 9328193103930 and IKEA will stop selling it here. What do we do?
ikea_bijection_map.remove_value(9328193103930);

// ERWAK is a great product... but is the name being used?
ikea_bijection_map.entry_key("ERWAK").or_insert(21389181329871);

Same for TrijectionMap:

let id_name_status_map = TrijectionMap::new::<i64, String, u32>();
id_name_status_map.insert(212019, "HiRandomPersonHere!", 1);
id_name_status_map.insert(238239, "SanoMane290", 2);
id_name_status_map.insert(100029, "DoOrDoNotThereIsNoTry", 3);
id_name_status_map.insert(190000, "SoUncivilized1234", 4);
// Later...
id_name_status_map.get_value_1_from_value_2("SoUncivilized1234");
id_name_status_map.remove_value_1(100029);
id_name_status_map.entry_value_2("ErwakBandez").or_insert_val1and3(12983, 389)

Drawbacks

A drawback to this usage is that all entries have to be one-to-one, so all usecases must be evaluated.

Rationale and Alternatives

An alternative is to implement two or three hashmaps. The BijectionMap reduces memory size and typing, and is easier to understand in terms of being explicit.

Prior art

bimap

Unresolved questions

None? This is quite similar to a HashMap.

Future possibilities

If variadic generics are ever stabilised, a new MultidirectionalMap may be created with variadic generics for the types, to be able to convert between a theoretically infinite number of types.

2 Likes

I encountered a scenario where I needed to search for two different types of values from another type value, likes this:

let tmp_map1<String, (i64, MyEnum)> = HashMap::new();
let tmp_map2<i64, (String, MyEnum)> = HashMap::new();
let tmp_map3<MyEnum, (i64, String)> = HashMap::new();

// get String for MyEnum
let s = tmp_map3.get(MyEnum::XX1).map(|t| t.1);
// get MyEnum for i64
let e = tmp_map2.get(114514).map(|t| t.1);

This RFC may consider supporting multiple generic types to simplely this example?

For that, I think we can create a new type that supports variadic type arguments - I got the idea of a TrijectionMap, but that will only solve this issue for maps with 3 elements. What do you think?

Yep. The hash functions will cause some overhead and reduce performance, but I suppose that's fine. (am I getting you right?)

Prior art: Are there any crates that provide this already?

Popularity: Why should this be in the standard library? It seems pretty niche.

12 Likes

@pitaj I wouldn't call this "niche". Converting strings to enums and vice versa is a pretty big usecase, and bi-directional hashmaps are very good for large tables where it may be required to lookup both keys and values. I gave an example of a crate that is prior art, bijection_map.

For enum -> string, EnumMap is better than HashMap.

1 Like

But this is not only enum maps. This can also be used in string to id conversions and lookup tables of many forms.

Is there a known, performant implementation available anywhere in any language? I've never heard of such a data-structure. The only reasonable implementation I can think of off the top of my head is 2 maps in opposite directions under the hood.

1 Like

For large elements it would be better to share the objects in some way between the two hash maps rather than clone them.

I don't think this is a good addition to std though because you could reasonably use various other data structures rather than HashMaps for each of the two directions depending on the use case.

5 Likes

Boost has Chapter 1. Boost.Bimap but I don't know if it is performant. According to Bimap and Boost it is built on top of Boost Multiindex, which is from memory is a horribly complex mess of templates that let's you define a multi-column database with some of the columns being indexed (and I don't know how performant it is).

I have had to solve this sort of thing before (not in Rust though), and the approach I took was to maintain two maps, one for each direction. If I remember correctly, I was doing this in Python, and it was a 1-to-N relation, so I had a map to a list in one direction. While the proposed solution here would only allow 1-to-1 mapping.

That said, this is a very niche data structure to me. The 1-to-1 case being a niche of a niche.

EDIT: Multiindex apparently also exists for Rust: MultiIndexMap — data structures in Rust // Lib.rs (could be relevant for prior art).

2 Likes

Yeah one can probably do some indirection like

(typeid A, hashA) -> synthetic key K
(typeid B, hashB) -> synthetic key K
synthetic key K -> (A, B)

And then structural operations on the map would first do a forward lookup to get the value and then a reverse to update all the keys.

This is essentially a database with indices. This general structure would also allow partial mapping where only a subset is bijective and the remainder only exists as A or B (what would yield NULLs in an outer join).

2 Likes

Such a map isn't really possible to construct efficiently with just HashMap's API, but it's possible with the HashTable provided by hashbrown.

I agree that this doesn't really belong in std, though; many (if not most) use cases will want different table structure for different entry types.

There's also the interesting question of at what point programs should switch from an almost-database to a proper database. I think that's sooner than most devs start reaching for a proper database.

2 Likes

A map like that was announced on r/rust just the other day: GitHub - oxidecomputer/iddqd: Maps where keys are borrowed from values. is a crate for maps where the keys borrow from the data, and they have bijective and even trijective maps in there (as well a standard maps).

Caveat: I haven't tested it yet.

1 Like

In the past, I did a fair amount of company internal dev tooling (written in python) for wrangling build systems / C++ files etc. Bidirectional 1-to-N maps was something I ran into a lot while doing that. I don't think this database suggestion is appropriate there: the data structures are ephermal, they should be as fast as possible, you don't care about ACID properties, etc.

To me, the use cases where it makes sense to go for a full database with support for multiple indexed columns for data that isn't persistent seems fairly niche. But this may of course depend on what background you have.

1 Like

There are of in-memory, in-process databases.

you don't care about ACID properties

in multi-threaded programs the ACI part can become tricky. RWLock or DashMap will do for a while, but even with those you have to be careful about deadlocks once you start using multiple maps.

To me, the use cases where it makes sense to go for a full database with support for multiple indexed columns for data that isn't persistent seems fairly niche.

It's a continuum. You can give up immediate durability but still snapshot the data every now and then. Or perhaps it can be regenerated after a crash. Or maybe it's a low-latency system and if it goes down the data will be obsolete anyway.

1 Like