Additional enum tag optimizations?

#1

I’m aware of the nullable pointer optimization for enums. But what about enums that need a discriminant yet still have a pointer at the beginning of each variant?

If I remember correctly, x86_64 does not use all the bits in its pointers (any other ISAs?). I don’t know if any OS uses these bits to store information, but perhaps if we wanted to be clever, we could have the compiler use those bits for the discriminant?

I understand the fundamental tradeoff, of course; you save 8 bytes at the beginning of the enum but you spend a couple extra cycles shifting and masking to actually get the discriminant every time you match on it. But in cases where the space savings are preferable, could this be a valuable optimization?

#2

The problem with that approach is that you will need to do this masking for all accesses of pointer types. Consider what happens if you take a reference to one of the enum fields. Now you have a &mut Box<T> but you still need to use a mask when dereferencing that reference.

#3

See also the prior discussion about enum representations. There are tons of possibilities even ignoring the pointer alignments representations.

#4

How so? If the ISA ignores those bits on deref, there’s no need to mask them out, right?

Edit:

the AMD specification requires that bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception.

I see what you mean now. All right, forget that idea.

#5

the AMD specification requires that bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception.

I see what you mean now. All right, forget that idea.

Correct me if I’m wrong but that’s not about alignment, alignment is about least-significant bits. This quote is about most-significant bits.

GHC is doing something like this. Evaluated objects are having non-zero first (least significant) 3 (or 2 on 32-bit systems) bits. If number of alternatives is smaller than or equal to 2^3 - 1 (or 2^2 - 1 on 32bit), the least significant 3 (2 in 32bit) bits also give the tag. So you can branch (in a case expression) without actually dereferencing the pointer.

#6

Indeed, I believe LLVM (and many other compilers) uses the LSB to indicate a C++ pointer is to a virtual method - so I imagine masking them out before dereferencing isn’t that expensive…

closed #7

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.