Additional enum tag optimizations?

I’m aware of the nullable pointer optimization for enums. But what about enums that need a discriminant yet still have a pointer at the beginning of each variant?

If I remember correctly, x86_64 does not use all the bits in its pointers (any other ISAs?). I don’t know if any OS uses these bits to store information, but perhaps if we wanted to be clever, we could have the compiler use those bits for the discriminant?

I understand the fundamental tradeoff, of course; you save 8 bytes at the beginning of the enum but you spend a couple extra cycles shifting and masking to actually get the discriminant every time you match on it. But in cases where the space savings are preferable, could this be a valuable optimization?

The problem with that approach is that you will need to do this masking for all accesses of pointer types. Consider what happens if you take a reference to one of the enum fields. Now you have a &mut Box<T> but you still need to use a mask when dereferencing that reference.

See also the prior discussion about enum representations. There are tons of possibilities even ignoring the pointer alignments representations.

How so? If the ISA ignores those bits on deref, there's no need to mask them out, right?

Edit:

the AMD specification requires that bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception.

I see what you mean now. All right, forget that idea.

the AMD specification requires that bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception.

I see what you mean now. All right, forget that idea.

Correct me if I'm wrong but that's not about alignment, alignment is about least-significant bits. This quote is about most-significant bits.

GHC is doing something like this. Evaluated objects are having non-zero first (least significant) 3 (or 2 on 32-bit systems) bits. If number of alternatives is smaller than or equal to 2^3 - 1 (or 2^2 - 1 on 32bit), the least significant 3 (2 in 32bit) bits also give the tag. So you can branch (in a case expression) without actually dereferencing the pointer.

Indeed, I believe LLVM (and many other compilers) uses the LSB to indicate a C++ pointer is to a virtual method - so I imagine masking them out before dereferencing isn’t that expensive…

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.