I was chatting with some coworkers who work in an extremely size-constrained environment. Every dyn T
object in a binary produces at least three words of vtable: size, alignment, destructor. However, if you only exclusively work with &mut dyn T
, these three words are very wasteful, and they pile up: three words per type-to-dyn per trait. This waste almost never matters, but it piles up in regimes where the binary is on the order of a few kilobytes in size. This gets much worse when &dyn Fn
and friends get involved.
I've been thinking of a potential -Ccompress-vtables
flag that sacrifices performance (and, maybe, changes language semantics...) for trying to get rid of as many of these words as possible.
The "obvious" implementation is to remove all three words and make size_of_val
, align_of_val
and drop_in_place
panic/fail to link/fail to monomorphize when T = dyn
. I don't think I need to explain why this is a bad idea. =)
Instead, we make the following observations:
- Alignment can be stored as its logarithm.
- In this size regime, destructors are actually kinda rare.
- Gigantic types are rare.
One implementation I thought of was to pack all three words into one like so (for a 32-bit system; 64-bit is analogous, but less interesting). This word is called the "header".
- The top bit of the word is set if
drop_in_place
is not a no-op. - The next five bits are the log of the alignment (which cannot be greater than 31)[1].
- The remaining 26 low bits are the size.
The actual pointer to the destructor is now at the end of the vtable, if there is one. The destructor for a dyn
is thus:
/// The virtual call dominates over having to do a branch.
if (vtable[0] & 0x8000_0000 != 0) {
vtable[1 + num_methods](self);
}
If there is no destructor, we save on the alignment and destructor words. 2/3 is a pretty solid deal all things considered. There are two additional caveats:
- Sizes of
dyn
objects are restricted to 60MB if the flag is turned on. This seems like a very hard restriction to run into if you care about vtable compression. We can just ban all types of such size if we want to avoid post-mono diagnostics (only with this flag turned on, ofc). -
size_of_val
incurs an extraand
instruction. -
align_of_val
is
fn align_of_val<dyn Trait>(x: &dyn Trait) {
// The precise choice of bit trick depends on what `1 << 32` is...
(1 << ((x.vtable[0] >> 26 & 31) + 1)) - 1
}
These functions aren't called all that often, and the memory lookup will dominate performance either way.
Does this seem like an interesting opt-in optimization to try to implement? I might go and try to wire this into rustc at some point for fun, but I figured I'd pitch it here, too.
[1] Some ABIs might not allow such large alignments anyways... I suspect 2 bits is enough for some 32-bit ABIs, but that's not a big deal.