Mentioning about the memory consumption of different collections

I noticed that BTree takes up roughly about 3 times as much ram as Vec and somewhere around twice as much as Hashmap.

I wrote a little code to check how much ram does Vec, Btree, HashMap (with default hasher), VecDeque, LinkedList uses.

Couldn't really reproduce what exactly happened, but it shows that btree/hashmap consumes more than twice as much ram when compared to vec/vecdeque/linkedlist.

It took me some time to figure out why my memory usage is so high. I really think that it helps people if we mention about the ram usage of each collections on the docs.

What do you guys think? Let me know if I'm getting something wrong!

1 Like

I think the benchmark would be more informative if you e.g. plotted the allocated memory as a function of insertions, instead of just considering the state after one specific number of elements.

Looking at the numbers you got, my guess is that Vec and VecDeque ended up with minimal overhead since they use a doubling growth strategy and 1M happens to be just under 2^20. On the other hand, HashMap will have to reallocate before it has filled all buckets to preserve performance (at a load factor of 0.875 IIRC). Because its implementation restricts the table size to be a power of two (for good reasons), it ends up needing 2^21 buckets for 1M elements.

I'm not familiar with the implementation of BTreeMap, but it being an ordered tree, the insertion order will make a difference in what exactly ends up happening. Your test inserts the keys in sorted order, so I'd be wary of drawing conclusions from just that.

6 Likes

Memory consumption of collections will strongly depend on many things:

  • The types in the collections. E.g. a hashmap needs to track vacant and occupied slots. If it does it via Option<T>, then the overhead is negligible for large T, double memory consumption for word-sized T, and several times more memory use for T=u8.

  • Usage patterns. Collections will overallocate memory for better performance. Depending on your usage pattern, all of that overallocated memory may happen to be used, or all of it may be empty, if you barely trigger growth. Something like BTreeMap has many internal buffers, making its exact memory consumption quite hard to predict and very dependent on insertion/removal order.

  • Internal details of the collection algorithm. That part is obvious, but it can also vary significantly.

Overall, it is too hard to give any reasonable guarantees, apart from asymptotic memory usage, and it will be unreasonably constraining for the evolution of collections. For this reason specifying the things you propose will be more harmful than useful.

3 Likes

I plotted out the ram usage and number of entries. It seems to grow linearly and vector has the least consumption.

Though, it appears that the memory usage differs depending on many factors

Thank you for your explanation!

After glancing on each collection's source code and try-and-erroring on my project I thought they are predictable (-_-; )

image

image

Are you sure this is measuring correctly? It's strange that all the lines are straight. I'd expect Vec, VecDeque, and HashMap to have chunky graphs since their resizing strategy has exponential backoff.

9 Likes

Thank you for letting me know!

I will investigate.

The different behaviour is because you shrink_to_fit after each test, bypassing the different allocation strategies.

I'm guessing the HashMap::shrink_to_fit only shrinks the allocations, without moving elements to try and compact them better, so it probably still has overhead with extra unused slots in the internal buckets. And something similar for the other collections that don't shrink as well.

3 Likes

If you store n elements in a Vec, then you will need O(n) memory.

If you store n elements in a HashMap, then you will need O(n) memory.

If you store n elements in a BTree, then you will need O(n) memory.

But:

A Vec is basically an array and consumes n*sizeof(T) bytes.

A BTree is a tree and there are pointers. Thus it will consume more memory.

1 Like

Except when it has just expanded and it consumes 2 * n * sizeof(T) bytes. Or when you grow it a bunch and then shrink it a bunch and it consumes 100 * n * sizeof(T) bytes.

It depends on how full the tree is, sometimes the extra space is pointers and sometimes it's in allocated but unused spots for K's.

2 Likes