Mentioning about the memory consumption of different collections

thomas-k-cameron · October 23, 2022, 8:57pm

I noticed that BTree takes up roughly about 3 times as much ram as Vec and somewhere around twice as much as Hashmap.

I wrote a little code to check how much ram does Vec, Btree, HashMap (with default hasher), VecDeque, LinkedList uses.

Couldn't really reproduce what exactly happened, but it shows that btree/hashmap consumes more than twice as much ram when compared to vec/vecdeque/linkedlist.

It took me some time to figure out why my memory usage is so high. I really think that it helps people if we mention about the ram usage of each collections on the docs.

What do you guys think? Let me know if I'm getting something wrong!

quaternic · October 23, 2022, 10:10pm

I think the benchmark would be more informative if you e.g. plotted the allocated memory as a function of insertions, instead of just considering the state after one specific number of elements.

Looking at the numbers you got, my guess is that Vec and VecDeque ended up with minimal overhead since they use a doubling growth strategy and 1M happens to be just under 2^20. On the other hand, HashMap will have to reallocate before it has filled all buckets to preserve performance (at a load factor of 0.875 IIRC). Because its implementation restricts the table size to be a power of two (for good reasons), it ends up needing 2^21 buckets for 1M elements.

I'm not familiar with the implementation of BTreeMap, but it being an ordered tree, the insertion order will make a difference in what exactly ends up happening. Your test inserts the keys in sorted order, so I'd be wary of drawing conclusions from just that.

afetisov · October 24, 2022, 12:10am

Memory consumption of collections will strongly depend on many things:

The types in the collections. E.g. a hashmap needs to track vacant and occupied slots. If it does it via Option<T>, then the overhead is negligible for large T, double memory consumption for word-sized T, and several times more memory use for T=u8.
Usage patterns. Collections will overallocate memory for better performance. Depending on your usage pattern, all of that overallocated memory may happen to be used, or all of it may be empty, if you barely trigger growth. Something like BTreeMap has many internal buffers, making its exact memory consumption quite hard to predict and very dependent on insertion/removal order.
Internal details of the collection algorithm. That part is obvious, but it can also vary significantly.

Overall, it is too hard to give any reasonable guarantees, apart from asymptotic memory usage, and it will be unreasonably constraining for the evolution of collections. For this reason specifying the things you propose will be more harmful than useful.

thomas-k-cameron · October 24, 2022, 12:20am

I plotted out the ram usage and number of entries. It seems to grow linearly and vector has the least consumption.

Though, it appears that the memory usage differs depending on many factors

github.com

thomas-k-cameron/ram-consumption-of-collections/blob/master/jupyter/plots.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {

This file has been truncated. show original

thomas-k-cameron · October 24, 2022, 12:43am

Thank you for your explanation!

After glancing on each collection's source code and try-and-erroring on my project I thought they are predictable (-_-; )

jkugelman · October 24, 2022, 4:41am

Are you sure this is measuring correctly? It's strange that all the lines are straight. I'd expect Vec, VecDeque, and HashMap to have chunky graphs since their resizing strategy has exponential backoff.

thomas-k-cameron · October 24, 2022, 9:27am

Thank you for letting me know!

I will investigate.

Nemo157 · October 24, 2022, 10:20am

The different behaviour is because you shrink_to_fit after each test, bypassing the different allocation strategies.

I'm guessing the HashMap::shrink_to_fit only shrinks the allocations, without moving elements to try and compact them better, so it probably still has overhead with extra unused slots in the internal buckets. And something similar for the other collections that don't shrink as well.

tschuett · October 24, 2022, 4:53pm

If you store n elements in a Vec, then you will need O(n) memory.

If you store n elements in a HashMap, then you will need O(n) memory.

If you store n elements in a BTree, then you will need O(n) memory.

But:

A Vec is basically an array and consumes n*sizeof(T) bytes.

A BTree is a tree and there are pointers. Thus it will consume more memory.

toc · October 25, 2022, 4:41am

Except when it has just expanded and it consumes 2 * n * sizeof(T) bytes. Or when you grow it a bunch and then shrink it a bunch and it consumes 100 * n * sizeof(T) bytes.

It depends on how full the tree is, sometimes the extra space is pointers and sometimes it's in allocated but unused spots for K's.

system · January 23, 2023, 4:42am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is 6 the right B for BTtree? how to benchmark? libs	7	1890	March 25, 2019
[Feature Request] Expose bulk_build_from_sorted_iter in BTreeMap libs	17	1998	September 16, 2022
Another alternative to allocators for Vec and other collections libs	5	1020	December 31, 2023
BTreeMap constant time first/last iterators libs	33	3820	July 10, 2020
Suggestion: BTreeMap/BTreeSet O(log(n)) n-th element access libs	4	2099	May 26, 2019

Mentioning about the memory consumption of different collections

Related topics