Many processors (including many widely used ones) support an operation that divides a number by 2, rounding towards negative infinity, and also returning whether it was odd or even. In Rust, this would look something like this (implemented as a method on each integer type):
fn floor_halve_with_remainder(self) -> (Self, bool) {
(self.div_euclid(2), (self & 1) == 1)
}
(Although I'm not planning to use this on negative numbers, the floor-halving behaviour, rather than rounding towards zero, is important to avoid needing a special case for them that would lose performance.)
Although this could be added as a user-facing function and might be useful on occasion, I'm primarily hoping for it to be added as an intrinsic that the code-generation backends can special-case, so that it could be used internally in the compiler as a building block for lowering.
Motivation
The motivation for this is based around looking at possible enum niche optimizations. In current Rust, an enum like this is as large as 3 pointers:
enum Example {
One(&'static u16, &'static u16),
Two(&'static u16),
Three(&'static u16),
Four(&'static u16),
Five(&'static u16),
}
Clearly this is possible to fit into a smaller space by exploiting the fact that references to u16 are always stored as even numbers in memory, but most approaches I've seen for that lose performance due to having bad code generation for match statements – although the memory usage can be lower (depending on whether or not the memory allocator needs to round up the size), the code generated when using the enum is bad enough that the code ends up slower regardless.
However, if you have an efficient floor-halve-with-remainder operation, it leads to an efficient encoding of this sort of enum: the large case (two references) is stored as-is, and the small cases are stored using consecutive odd numbers in place of the pointer (in this case, 1, 3, 5, and 7). To decode the enum, you do a floor-halve-with-remainder on the pointer that gets replaced by the discriminant: if you have no remainder you have the large option, if you do have a remainder you have a small option, and the result of the division is a number in a consecutive small range so you can index directly into a jump table with it.
If the pointer you're using the niches of is larger than u16, there are alternative possible arrangements – but this one is still extremely efficient given the intrinsic, and (marginally) beats more obvious arrangements like using 0/1/2/3 or 1/2/3/4 as the value to put in the niches of the small options (it also has the advantage of mixing well with the existing niche optimization, meaning you can put an Option inside or outside without losing performance).
Questions
Does something like this exist in LLVM already, so that it can be lowered to, or would it need to be added to LLVM first? (And is there an efficient way to do this in Rust already? The implementation I gave above calculates both halves separately, even when you feed the resulting boolean into an if statement, rather than using the processor instruction directly.)
Also, is an ACP the appropriate tool for adding something like this? Requests for new compiler intrinsics are probably quite rare, so I'm not sure what the appropriate process for them is.