I'm almost certain that I have seen this discussed on IRLO before, but I'm unable to find the previous thread(s).
Rust has "niche value optimization" for certain situations, such as Option<&T>
and NonZeroX
. Currently, these are implemented at the language level. I'd like to see this moved to a lib implementation, such that any user can declare forbidden values for their types.
A trait could be defined as the following, ensuring that the user writing the code is aware that it can lead to undefined behavior (due to it being unsafe
) and ensuring that the compiler is able to evaluate it lazily at compile-time.
pub unsafe trait ForbiddenValues {
const fn forbidden_values() -> impl FusedIterator<Item = Self>;
}
This trait would rely on a large number of unstable features, including GATs and the not-yet-RFC-accepted const fn
in traits.
For obvious reasons, this method cannot be called by anyone. Ever. This is consistent behavior with Drop
. Presumably similar language methods could be used to prevent the calling of this new trait. The trait would exist solely to aid the compiler.
I am not sure what mechanism Rust actually uses to perform niche value optimization, but I presume that the compiler would be able to use an arbitrary byte sequence (which is fundamentally what any type is, of course) for the optimization. The byte sequence would be knowable at compile-time given the method is const fn
.
The proposed trait would replace the existing #[rustc_layout_scalar_valid_range_start(1)]
attributes on NonZeroU8
and similar. In its place could be the following trivial trait impl.
unsafe impl ForbiddenValues for NonZeroU8 {
const fn forbidden_values() -> impl FusedIterator<Item = Self> {
yield NonZeroU8(0);
}
}
This example obviously makes some assumptions about the future of the Generator
trait and its interaction with Iterator
, but it could be implemented via a ZST that implements Iterator
: I'm just representing it this way for simplicity and potential future viability.
As to real-world usage, I have the UtcOffset
struct in the time crate. Its definition is incredibly simple.
pub struct UtcOffset {
hours: i8,
minutes: i8,
seconds: i8,
}
There are invariants that are already enforced and assumed to be valid internally.
-
hours
is in-23..=23
-
minutes
is in-59..=59
-
seconds
is in-59..=59
- The signs of all nonzero field values must match.
I suspect most users will find it simpler to filter out valid values out of the full set rather than directly generating invalid values. This would be simple to implement, even for a situation like this.
unsafe impl ForbiddenValues for UtcOffset {
const fn forbidden_values() -> impl FusedIterator<Item = Self> {
const fn is_valid(offset: &UtcOffset) -> bool {
(-23..=23).contains(&offset.hours)
&& (-59..=59).contains(&offset.minutes)
&& (-59..=59).contains(&offset.seconds)
}
for ((hours, minutes), seconds) in (..).zip(..).zip(..) {
let value = UtcOffset { hours, minutes, seconds };
if !is_valid(&value) {
yield value;
}
}
}
}
Note that this implementation doesn't even consider the final bullet above — it really doesn't need to. With just the first three bullets, there are already nearly four million niche values. No reasonable program needs that many of course. If they do, they will probably be fine dealing with the slightly larger struct
size.
Does this design seem feasible? What issues might come up?