SizeOf trait and #[derive(SizeOf)]?

I'd like to open a discussion on having a SizeOf trait that takes into account dynamically allocated memory.

assert_eq!(vec![1u32].size_of(), 28 /* 24 + 4 */);

Would return the normal mem::size_of + the total pointed to data. This data should use the capacity and not the len(), because the intent is to get the total resident size of an object. This does not include kernel memory, so file descriptors will have a size_of that is just the bytes of the fd.

The use-case is to debug big structures in long-running programs in order to ensure that there are no space leaks, as well as general debugging and to get a sense of where the bulk of data is located. This can be a useful tool in monitoring an application without needing to use outside tools.

When it comes to shared memory, perhaps SizeOf::size_of can return two numbers (u64 /owned/, u64 /shared/), shared bytes and owned bytes. Meaning that an Rc<u32> with a reference count of 1 will return (12, 0), whilst shared it would return (8, 4).


Related: heapsize, deepsize


Also the malloc_size_of crate which is now used by Servo and Firefox in place of the heapsize crate.


For anyone else who is interested, I was wondering why Firefox/Servo had stopped using heapsize. I found the following text in the initial commit (by @nnethercote) introducing the malloc_size_of crate:

//! A crate for measuring the heap usage of data structures in a way that
//! integrates with Firefox's memory reporting, particularly the use of
//! mozjemalloc and DMD.
//! This crate has a lot of overlap with the existing `heapsize` crate, and may
//! one day be merged into it. But for now, `heapsize` has the following
//! major shortcomings.
//! - It basically assumes that the `HeapSizeOf` trait can be used for every
//!   type, which is not true. Sometimes more than a single size measurement
//!   needs to be returned for a type, and sometimes additional synchronization
//!   arguments (such as lock guards) need to be passed in.
//! - It has no proper way of measuring some common types, such as `HashSet`
//!   and `HashMap`, that don't expose internal pointers.
//! - It has no proper way of handling values with multiple referents, such as
//!   `Rc` and `Arc`.
//! This crate solves those problems.
//! - It provides traits for both "shallow" and "deep" measurement, which gives
//!   more flexibility in the cases where the traits can't be used.
//! - It allows for measuring blocks even when only an interior pointer can be
//!   obtained for heap allocations, e.g. `HashSet` and `HashMap`. (This relies
//!   on the heap allocator having suitable support, which mozjemalloc has.)
//! - It allows handling of types like `Rc` and `Arc` by providing special
//!   traits that are different to the ones for non-graph structures.
//! Suggested uses are as follows.
//! - When possible, use the `MallocSizeOf` trait. (Deriving support is
//!   provided by the `malloc_size_of_derive` crate.)
//! - If you need an additional synchronization argument, provide a function
//!   that is like the standard trait method, but with the extra argument.
//! - If you need multiple measurements for a type, provide a function named
//!   `add_size_of_children` that takes a mutable reference to a struct that
//!   contains the multiple measurement fields.
//! - When deep measurement (via `MallocSizeOf`) cannot be implemented for a
//!   type, shallow measurement (via `MallocShallowSizeOf`) in combination with
//!   iteration can be a useful substitute.
//! - `Rc` and `Arc` are always tricky, which is why `MallocSizeOf` is not (and
//!   should not be) implemented for them.
//! - If an `Rc` or `Arc` is known to be a "primary" reference and can always
//!   be measured, it should be measured via the `MallocUnconditionalSizeOf`
//!   trait.
//! - If an `Rc` or `Arc` should be measured only if it hasn't been seen
//!   before, it should be measured via the `MallocConditionalSizeOf` trait.
//! - Using universal function call syntax is a good idea when measuring boxed
//!   fields in structs, because it makes it clear that the Box is being
//!   measured as well as the thing it points to. E.g.

This doesn't seem very useful. If you have two Rc<u32>s and they both return (8, 4), how do you know if it's the same 4 bytes?

I've wanted something similar, but I think you'd actually need to interact with the allocator side of things, and make a trait plus an allocator that interfaces with it so that you can properly identify who is allocating what. I don't think there's anyway for a trait to do that alone.

I'm not sure if it's feasible to know if it's the same 4 bytes, but if we want to we can sort all pointers (or spans) encountered during the recursive walk of SizeOf::size_of and remove any overlapping pointees from the shared byte count.

Not sure if that's desirable because it seems a bit heavy.

This kind of thing is really hard to make fully general, for the reasons outlined in the big comment above. On any moderately complex program you'll have all sorts of tricky edge cases where you probably need some domain-specific knowledge to understand what the best thing to do is.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.