Vec is asymmetric with memory handling

tczajka · February 24, 2025, 7:47pm

No, SipHash is more complicated: it's a cryptographically secure keyed hash function.

Universal hashing much simpler and not cryptographically secure, but still randomized to get low collision count regardless of keys. So you can't DoS it by providing a set of bad keys (there are no bad sets of keys).

comex · February 24, 2025, 9:57pm

To quote one description of a universal hash function:

What sets UMASH apart from these other non-cryptographic hash functions is its proof of a collision probability bound. In the absence of an adversary that adaptively constructs pathological inputs as it infers more information about the randomly chosen parameters, we know that two distinct inputs of s or fewer bytes will have the same 64-bit hash with probability at most ⌈s/2048⌉⋅2^-56, where the expectation is taken over the random “key” parameters.

However, even in the absence of seed-independent collisions, timing side-channels in a data structure implementation could theoretically leak information about colliding inputs, and iterating over a hash table’s entries to print its contents can divulge even more bits.

Sounds like it would be secure in some use cases but not others. There are plenty of scenarios where an attacker can alternate between "add entry" operations and "get entries" operations that leak the iteration order.

source: UMASH: a fast and universal enough hash - Backtrace Engineering

tczajka · February 25, 2025, 9:13am

Yes. But the same is true to some extent with SipHash. The attacker will not be able to recover the hashing key, but they can still detect hash table collisions by a timing attack and use that knowledge to perform a DoS attack in some scenarios.

bjorn3 · February 25, 2025, 10:13am

My understanding is that SipHash doesn't allow you to escalate from a few known collisions to arbitrarily many collisions without either knowing the key or actively interacting with the system you want to DoS. While UMASH would once you have a few known collisions allow you to generate arbitrarily many collisions offline and then push them all at once to the system to DoS. Even just the iteration order is enough to leak enough information to allow finding collisions with UMASH.

Quoting UMASH: a fast and universal enough hash - Backtrace Engineering

Obviously, this rules out using UMASH in a MAC, but might also be an issue for, e.g., hash tables where attackers control the keys and can extrapolate the hash values. A timing side-channel may let attackers determine when keys collide; once a set of colliding keys is known, the linear structure of UMASH makes it trivial to create more collisions by combining keys from that set. Worse, iterating over the hash table’s entries can leak the hash values, which would let an attacker slowly extract the parameters.

tczajka · February 25, 2025, 10:25am

That's exactly right. I wouldn't recommend universal hashing for security applications like permanently storing user data.

However most use cases of HashMap aren't in a setting where an attacker can interactively probe it. Universal hashing works well when you either don't have security needs or when the processing is offline without interaction, and then it's fine. And much better than some arbitrary non-randomized function like x % n or whatever.

kornel · February 25, 2025, 10:41am

I think this is actually a problem with Rust's memory allocator interface. Vec has only a primitive allocator to work with, so Vec needs to be careful about costs of memory management, even things that could be cheap.

Rust allocator doesn't have methods for growing and shrinking allocations in-place through virtual memory. There's realloc that may copy.
realloc can't be told how much data is uninitialized, so if it copies, it copies everything including wasting time on copying uninitialized capacity.
alloc can't be told to reserve large address space for growing, so the Vec may get allocations from a pool of fixed-size blocks, which force it to copy each time it grows.
There's no way to cooperate with the allocator closely and query it what block sizes it uses, so Vec uses its own independent heuristics for resizing, and uses only the size requested, not the full size of a chunk that the allocator has given.
The general rule is that dealloc must get the exact same size that was allocated, so shrink_to_fit or into Box<[T]> must realloc even if the capacity is only off by a byte and ends up in the same allocator bucket anyway. It also makes querying allocator's real allocated size problematic, because using the real size would change the visible capacity() and cause reallocs in into_boxed_slice().
There's nothing like MADV_DONTNEED to mark unused capacity as unused.

On systems with 64-bit addresses and virtual memory, unused capacity doesn't need to be wasting any RAM. Address ranges are plentiful, and only mapped pages cost something. Vec could be more efficient and cheaper if it was able to take advantage of modern memory management.

But it's forced into the malloc/free model which requires it to be careful and conservative, or risk excessive overhead in some usage scenarios.

The Vec itself is too small (3xusize) and too simple to track how it's used, so it can't adapt to the workload. It wouldn't know when is a good time to shrink the capacity, and when it'd just waste two allocations and two copies for nothing.

The Vec's primitive simplicity is generally a good thing, since it makes performance predictable, and cost of storing Vec fixed and low. The OS and the allocator could take handling of some of the smarts, but Rust doesn't have an API for that.

kornel · February 25, 2025, 10:46am

Another possible solution to this would be to develop profiling tools for usage of Vec.

Imagine you could run an instrumented build of your program that logs each Vec allocation, reallocation, and how much capacity was wasted.

It could then suggest places where reserve() would help, and where with_capacity() was too large, and places where shrink_to_fit() would help.

Vorpal · February 25, 2025, 10:57am

That trick is also only applicable to multi-page allocations. My experience is that the vast number of vectors are small (less than 10 elements, often just 1-4). This is why SmallVec from the crate of the same name is often such a good optimisation. You can avoid allocations entirely if you can inline one element.

Most programs I write that use Vec (cli tools, as opposed to embedded where I use heapless etc) tend to have a few really big Vec (multi-MB), and thousands or tens of thousands small Vec.

I wouldn't want to waste whole pages on these just in case they grow large enough that virtual memory trickery would be useful. And for the large Vec a Hugepage backing would actually be more beneficial for TLB pressure to me.

All of this likely depends on your use case and people all tend to think they are representative of the "common case".

kornel · February 25, 2025, 11:05am

Yeah, it's all deeply in "it depends" territory. If you keep thousands of small Vecs, you may be better off using Box<[T]> instead, and save a usize too. Having to explicitly use SmallVec or other workload-specific type isn't too bad IMHO.

Cocoa has NSArray which is internally like a dyn Vec that can replace its own implementation dynamically. If you insert elements a lot, it will change its storage into a linked list of chunks. But such dynamic cleverness is too much for Rust's basic primitive.

steffahn · February 25, 2025, 11:06am

I think Vec is at least missing some sort of generic shrink()/garbage_collect()/... function with good asymptotics, so that if you called it after every potentially-len-shrinking operation it wouldn’t make your program accidentally-quadratic the way that overuse of shrink_to_fit can.

If it isn't the default, it should at least be easy to manually use Vec as a dynamic array with (only) a constant factor memory overhead, if you want to.

Given that (as far as I remember the docs) Vec's exact growth strategy isn't guaranteed/fixed, I'm not even sure if it is currently possible at all to do this manually in a future-proof way, and in any case it's not super trivial to implement.

the8472 · February 25, 2025, 11:06am

Rust allocator doesn't have methods for growing and shrinking allocations in-place through virtual memory. There's realloc that may copy.

Just knowing whether realloc is a dumb impl that always copies or one that under some conditions operates in-place would already be useful to know. That and other info could be exposed as associated consts that describe allocator behavior in more detail.

The general rule is that dealloc must get the exact same size that was allocated, so shrink_to_fit or into Box<[T]> must realloc even if the capacity is only off by a byte

This hasn't been true for some time. If you request size X and the allocator returns Y=X+5 then you're allowed to deallocate with X..=Y. It's described in the memory fitting section.

There's nothing like MADV_DONTNEED to mark unused capacity as unused.

Does that require cooperation from the allocator? I think you could call madvise directly. Though you'd have to do some calculation to not overstep page boundaries.

alloc can't be told to reserve large address space for growing, so the Vec may get allocations from a pool of fixed-size blocks, which force it to copy each time it grows.

At least on linux with overcommit and zero-page optimization you could just do a large with_capacity call. But yeah on windows you'd have to do something else to grab address space without causing an OOM.

But having such an allocator API wouldn't really help here because Vec itself doesn't expose that either. So this would have to be a custom allocator that does this implicitly... which you can already do today.

There's no way to cooperate with the allocator closely and query it what block sizes it uses, so Vec uses its own independent heuristics for resizing, and uses only the size requested, not the full size of a chunk that the allocator has given.

Well those usually are powers of two and if you just use the default strategy then you get power of two growth so that probably happens to line up already.

A few years ago someone tried to use the allocation size returned from the allocator but that was actually a loss for performance.

The Vec's primitive simplicity is generally a good thing, since it makes performance predictable, and cost of storing Vec fixed and low. The OS and the allocator could take handling of some of the smarts, but Rust doesn't have an API for that.

Which of the things you mentioned would be essential to achieve that? I'm not quite sure what you're trying to accomplish and what the API for that would look like on both Vec and Allocator.

tczajka · February 25, 2025, 12:25pm

It's currently impossible to properly do amortized shrinking directly on a Vec without depending on the internal implementation details because shrink_to is not guaranteed to lower capacity, the argument is just a lower bound on capacity. So if you just call shrink_to whenever len < capacity / 4, you may end up calling it every time and run into possible quadratic complexity. The only way to do this is to have a wrapper around Vec that keeps extra state (e.g. "last requested capacity").

Rudxain · February 25, 2025, 2:07pm

Related discussion/issue

steffahn · February 25, 2025, 2:35pm

Also... as a much simpler example: the growth factor isn't documented, AFAIK: thus if Vec decided to move to a growth strategy of "grow to capacity = len×4" (or even worse len×5) for some reason then "shrink if capacity < len*4" is really going to kill amortization in use cases that alternate small growths and shrinks.

More closely related to your example, I believe Vec currently has no way to avoid having shrink_to_fit call back into the allocator whenever capacity > len even if the current capacity already fits the length, e. g. since it would be derived from the actual allocation size that a custom allocator returned, including excess capacity the allocator had available when len*size_of::<T>() (or more) bytes were requested. I'm not sure if Vec actually makes use of such excess capacity yet, but if it does - or if it will do - then e. g. creating Vec::with_capacity(n), followed by filling n elements, followed by conversion to a boxed slice, might actually call the allocator twice. Of course it's expected the second call will be cheap when the allocator notices the realloc requests size that was already available anyway, but still it's unnecessary overhead, and not necessarily an inlineable thing.

tczajka · February 25, 2025, 4:34pm

Is it possible to add a generic parameter such as this?

pub struct Vec<T, A = Global, G = DefaultVecGrowthStrategy>
where
    A: Allocator,
    G: VecGrowthStrategy

The default growth strategy would be the current one (2x growth, no shrinking), and there could be another standard one with auto-shrinking, and users could create their own where it's 1.5x growth rather than 2x growth, for example, or there could be one that's parametrized with the growth and shrink factor constants.

kpreid · February 25, 2025, 5:29pm

That's true of the unstable Allocator trait, but not of the GlobalAlloc trait. Because Allocator is unstable, all stable Rust programs’ allocations currently pass through GlobalAlloc, which does not include the ability to return more memory than requested (the return value is a pointer to the beginning rather than a slice pointer) and doesn’t have the “fit” rule since the information to use it is not available.

pitaj · February 25, 2025, 6:26pm

That's only true for custom allocators, AFAIK

scottmcm · February 25, 2025, 7:04pm

It's be great to get an ACP for that. I think it absolutely makes sense to have.

Right now we don't have any straight-forward thing to tell people who see the "your Vec might be massively overallocated depending on what it came from" on Vec::from_iter, so we should have a function exactly for this where we can say "if you're worried about this, call shrink and if it's more over-allocated than what a push loop would have done, it'll reduce capacity down to some unspecified reasonable amount".

EDIT: well, I went and made one, Add a "shrink-if-excessive" API to `Vec` · Issue #553 · rust-lang/libs-team · GitHub

CAD97 · February 25, 2025, 8:22pm

std::alloc::Global apparently doesn't even implement GlobalAlloc^[1], so you're correct on the surface — Rust's std collections use Global as Allocator, not GlobalAlloc. However, these implementations are merely wrappers around the stable free functions (e.g. alloc and friends) so it goes through the GlobalAlloc interface either way, and System also implements Allocator in terms of GlobalAlloc.

And I think the “grow a lot then shrink to roughly final size” case using a non-Vec type, as well as the giant page-sized vector, is fully reasonable. It's already the case that any functionality that doesn't change the length works on [T] instead of Vec<T>, so the full extent of API assuming Vec is mostly limited to that utilizing a buffer of some sort. (And I think we should encourage using Box<[T]> more when the length is fixed once the type is constructed.)

Vec is a “simple enough” and “good enough” option that won't be non-linearly horrible (per Vec) in 99.9% of applications. A SmallVec that uses inline storage for “small” vectors and a HugeVec that uses page allocation tricks on platforms that can do so would be nice to haves for sure, but I think the costs of only having Vec in std are a bit overstated.

Everything would be possible. Growth strategy isn't covered, but I think it's relevant to point out that the Storage proposal (alternative to directly using the allocation trait in collections) handles the SmallVec and HugeVec cases. The short-ish version (laden with contentious choices):

struct Vec<T, S: Store = SingleStore<Global>> {
    ptr: S::Handle,
    len: usize,
    cap: usize,
    store: S,
}

struct SingleStore<A> {
    data: ptr::NonNull<u8>, // !
    alloc: A,
}

unsafe impl Store for SingleStore<impl Allocator> {
    type Handle = (), // !

    fn allocate(&mut self, layout: Layout) -> Result<Self::Handle, AllocError> = try {
        let ptr = self.alloc.allocate(layout)?;
        self.data = ptr::NonNull::new(ptr)
            .unwrap_or_else(|| handle_alloc_error(layout));
        ()
    }

    unsafe fn deallocate(&mut self, (): Self::Handle, layout: Layout) {
        unsafe { self.alloc.deallocate(self.data, layout) }
    }

    /* … grow/shrink etc … */

    unsafe fn resolve(&self, (): Self::Handle, layout: Layout) -> ptr::NonNull<u8> {
        self.data
    }
}

struct SmallStore<T, A> {
    union {
        outline: ptr::NonNull<u8>,
        inline: UnsafeCell<MaybeUninit<T>>,
    },
    alloc: A,
}

unsafe impl<T> Store for SmallStore<T, impl Allocator> {
    type Handle = (), // !

    fn allocate(&mut self, layout: Layout) -> Result<Self::Handle, AllocError> = try {
        if layout.fits_in(Layout::new::<T>()) {
            // nop
        } else {
            let ptr = self.alloc.allocate(layout)?;
        self.data = ptr::NonNull::new(ptr)
                .unwrap_or_else(|| handle_alloc_error(layout));
            ()
        }
    }

    unsafe fn deallocate(&mut self, (): Self::Handle, layout: Layout) {
        if layout.fits_in(Layout::new::<T>()) {
            // nop
        } else {
            unsafe { self.alloc.deallocate(self.data, layout) }
        }
    }

    unsafe fn resolve(&self, (): Self::Handle, layout: Layout) -> ptr::NonNull<u8> {
        if layout.fits_in(Layout::new::<T>()) {
            unsafe { self.inline.as_ptr().cast() }
        } else {
            unsafe { self.outline }
        }
    }

    /* … and so on … */
}

/* … whatever PageVec would need to do … */

Note to self: Vec::as_ptr's specification requires its storage to be vec-shared-mutable, removing that axis of behavior from the Store trait hierarchy. However, the mut split for de/alloc is still an annoying source of seemingly incidental complexity, and ArrayVec not optimizing out capacity still leaves us needing to duplicate Vec's API, thus making Store even more difficult to justify the intrinsic complexity of.

I may re-champion storage if we can find a way to optimize length/capacity without bloating the API to a painfully inelegant level again. But even then, the lock to non-meta-laid-out allocations is also quite an annoying limitation… but one necessary for optimal SmallStore…

This isn't for discussion in this thread, though. Just musing on why this is so difficult to get right.

Very much agree. Maybe it should be reserve_shrink or similar, but there definitely should be a way to ask for heuristic driven shrinking that works well with the growth heuristics.

This makes it impossible to define the global allocator in terms of the global allocator, which is a good thing. ↩︎

Topic		Replies	Views
Pre-RFC: changing the Alloc trait	14	2010	March 25, 2019
Is custom allocators the right abstraction? libs	91	26500	July 8, 2021
Why "bring your own allocator" APIs are not more popular in Rust?	41	8516	July 31, 2021
Pre-RFC: Storage API libs	21	3600	September 17, 2023
Small string optimization: remove as_mut_vec libs	71	14147	March 25, 2019

Vec is asymmetric with memory handling

Related topics