Feature request: #[repr(bool)]

There are values which are fundamentally boolean in nature, but which can be easy to misinterpret. For example, with the Miller-Rabin probablistic primality test, I might want a type that more clearly explains what each result means:

#[repr(bool)]
#[derive(Clone, Copy, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)]
pub enum Primality {
    DefinitelyComposite = false,
    ProbablyPrime = true,
}

impl Primality {
    pub(crate) fn new(value: bool) -> Primality {
        core::mem::transmute<bool, Primality>(value)
    }
    
    #[inline(always)]
    pub fn is_definitely_composite(&self) -> bool {
        !(self as bool)
    }
    
    #[inline(always)]
    pub fn is_probably_prime(&self) -> bool {
        self as bool
    }
}

If the function just returns true or false, it's easy to forget that it doesn't mean definitely prime or definitely composite, both for authors and readers.

In this case, it would be nice to not use constants, like const PROBABLY_PRIME: bool = true, because that doesn't fix that problem, as it is highly likely that the programmer will see -> bool on the function and then deal with it as a regular bool rather than using the constants.

This has several benefits:

  1. The type is very explicit about what its values mean
  2. That explicitness will generally be reflected in the code programmers write
  3. #[repr(bool)] ensures zero-work conversions to bool
  4. #[repr(bool)] allows you to write Variant1 = true instead of Variant1 = 1 as with #[repr(u8)], which better communicates meaning
  5. #[repr(bool)] enforces a maximum of two variants
  6. #[repr(bool)] allows explicitly using value as bool if desired, instead of something more complicated to understand like value as u8 != 0 with #[repr(u8)]
  7. If there ever comes along an architecture where bool is more efficiently represented as something other than 0_u8 and 1_u8 (perhaps the machine language requires u8::MAX for true), programs which use #[repr(bool)] will automatically adapt and will already be optimized compared to things like value as u8 != 0, which might require a comparison instead of no work at all
  8. If the programmer knows how #[repr(u8)] and so on work, #[repr(bool)] has an obvious interpretation
  9. There are no backward compatibility issues with this
  10. Works nicely with FFI by allowing us to "wrap" a C boolean in an enum and vice versa

What are your opinions on this feature request?

6 Likes

It's probably best to have a separate explicit list of just what is different from using #[repr(u8)].

I think it'd just be

  • compiler-enforced variant count ≤ 2
  • variant assignment = true instead of = true as u8
  • theoretical platforms where sizeof(_BOOL) is not sizeof(uint8_t)
  • theoretical platforms where transmute(true) != true as u8
    • Rust defines the as to portably produce 0/1 independent of what the byte representation is
  • as bool support on the enum
    • note that as is generally disliked and From/Into should be derived/used instead
6 Likes

Could you provide more use cases? I think as long as you can represent a value as too variant, it's okay to represent as it directly as a bool, where you just need to have a function called "is_xxx".

In your specific case, the reason why you want to have a standalone enum is that there is actually a third case (which you didn't include): definitely prime. So this enum should actually contains three variants.

That's incorrect. The probablistic primality test is called that because it's probablistic, not deterministic. There is no third variant for the result it returns. It can tell you either that it's definitely composite or probably prime and can provide no further information.

Misunderstandings like this are a primary reason why this is useful.

3 Likes

Which part is incorrect?

There is a certain range in where the integers can be checked with miller-rabin tests deterministically. If you don't want to include that branch, usually people will use a is_probable_prime name for the primality check.

Yes, it's true for just about all probablistic algorithms that the uncertain outputs for some inputs can be correctly used as if they were certain. That extra knowledge of which inputs that happens with doesn't change the fact that the algorithm has exactly two possible outputs and that they have the meanings that I mentioned.

The is_probable_prime name is not sufficiently clear. What does false mean? Does it mean that it's much more improbable for it to be a prime or does it mean that it's definitely not prime?

If you would want to improve on the algorithm itself by providing a third variant, that's fine, but there are benefits to not doing that extra work when it can't apply (for example, RSA key generation can't use facts about small integers), and so an implementation that skips that work is acceptable in those cases, and the enum I gave earlier is useful in that case.

3 Likes

What's the main goal here — allowing foo as bool for bool-like enums? If so, I definitely don't want this. stdlib has been trying to move away from as because it can be error-prone in many situations. I don't think expanding it is what we want/need.

3 Likes

The main goal is to give a clearer meaning to values that are traditionally represented as bools. A secondary goal is to represent them as bools so that, for example, they can be given via FFI to C or received via FFI with no problems and no work at all as far as conversions.

4 Likes

The name is not that clear itself, but the documentation could be sufficient to make it clear. It seems not worth it for me to add a language feature for this.

As far as I understand it, the entire goal of enums is specifically for cases like this. The documentation could be sufficient to make it clear that 1 stands for a red light on a stoplight in code that controls an actual stoplight, but that's not what we do.

To use an enum instead brings several benefits as far as making code more explicit to aid understanding of what's going on.

It also explicitly reinforces the meaning in the code itself, providing less need to memorize the documentation's exact details.

4 Likes

size_of in std::mem - Rust has documented that sizeof(bool) is 1 in Rust, though, so repr(bool) would I think have to be 1 as well, regardless of what C says.

4 Likes

maybe all you want is

impl Primality{
    pub const ProbablyPrime:Primality=Self(true);
    pub const DefinitelyComposite:Primality=Self(false);
}

with

impl std::fmt::Display for Primality{
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f,"{}", if self.0 {"ProbablyPrime"} else {"DefinitelyComposite"})
    }
}
3 Likes

Not really. That can't, for example, be used easily in match statements. I really do mean an enum.

It appears I was wrong. This is a solution if the #[repr(bool)] suggestion isn't accepted. I can use #[repr(transparent)] on it and then put #[allow(non_upper_case_globals)] on the constants.

This seems to have some issues that enums don't, though. It's harder to write the use line for them, for example. You can't just do use crate_name::Primality::*; like you can with an enum.

It seems a bit like trying to pretend to be an enum because we want it to work like one, but not quite having the real thing.

3 Likes

In 2018, we decided that bool is ABI- and layout-compatible to C _Bool, and that sizeof _Bool == 1 on all platforms Rust currently targets.

If my memory serves me well, we deferred the decision of what exactly this means for targets where sizeof _Bool != 1. IIRC, the options are that Rust doesn't support those platforms, or that bool is still _Bool and whatever downsides apply (and they apply equally to C, which also mandates that _Bool acts as an integer containing either 0 (false) or 1 (true); given C23's mandate of two's compliment representation, this may mandate that _Bool is byte-equivalent to an integer of the same size storing 0 or 1).

I'm not seeing the motivating use case here. Why write value as bool when you could write value == Primality::ProbablyPrime? The latter is clearer at the use-site, which is the main point of using an enum in the first place.

You could also easily define an as_bool() method for your enum, although it would probably be better to name it is_probably_prime() (i.e. just implement your definition of is_probably_prime() manually instead of leaning on a new Rust feature). In the former case, repr(bool) would be a minor convenience at best, and seems like it would mainly just tempt the user into a less-readable format.

Unfortunately, this is more complicated than you might think, because different C compilers don't agree on whether a bool is 1 byte or 4 bytes. The unstable core::ffi module doesn't even define a "c_bool" type, presumably for this reason. So you'll always need an explicit conversion (which is probably what you want to do anyway, for the semantics reasons).

11 Likes

As far as I am aware, it is still the case that C99 _Bool is one byte on all targets Rust targets (where _Bool exists, at least; it may not on some of the retro targets).

What is common is to see a BOOL macro from pre-C99 which is int.

Props to C99 for finally settling on an answer, but I doubt there will ever come a time where Rust FFI is no longer used to interface with old legacy code, so it'll always be in our interest to avoid making features where you can easily miss a possible gotcha.

Ah, you skipped a step there.

  • Rust’s bool always has size/align 1.
  • Rust’s bool always matches C’s _Bool in ABI, not any legacy BOOL (of which there wouldn’t necessarily be one standard one per target anyway).
  • C _Bool does not have size/alignment 1 on all platforms (the one I know is 32-bit PowerPC Macs, where its size and alignment was 4).
  • Rust does not currently support any such targets.

None of these say anything about targets that don’t have a C99 _Bool, because Rust doesn’t support any such targets. But if it did, bool matching some arbitrary C typedef wouldn’t break anything more than a non-size-1-align-1 _Bool could.


I agree with the original poster that being able to define an enum whose representation is ABI-compatible with _Bool is interesting and useful, and that repr(bool) is a reasonable way to spell that.

EDIT: Because C++11 allows enum foo_t : bool (their equivalent of repr(primitive)), this could be considered a C++ interop feature.

2 Likes

What’s wrong with this?

#[derive(Clone, Copy, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)]
pub enum Primality {
    DefinitelyComposite,
    ProbablyPrime,
}

impl Primality {
    pub(crate) fn new(value: bool) -> Primality {
        match value {
            true => ProbablyPrime,
            false => DefinitelyComposite,
        }
    }
    
    #[inline(always)]
    pub fn is_definitely_composite(&self) -> bool {
        matches!(self, DefinitelyComposite)
    }
    
    #[inline(always)]
    pub fn is_probably_prime(&self) -> bool {
        matches!(self, ProbablyPrime)
    }
}

Just as readable, doesn’t require any new language features.

1 Like

One of Rust's selling points is "Empowering everyone to build reliable and efficient software." One of the ways it does that is by using C++'s idea of zero-cost abstractions.

It's more efficient to deal with something that already has the exact same representation as a bool so that there are no conversion costs at all. This fits with the desire for zero-cost.

This is the same reason that, for example, #[repr(u8)] exists: so that the representation of the type and the exact values of each variant can be controlled by the programmer for efficiency reasons (for example, by eliminating conversion costs to and from an "unwrapped" u8 value).

Without #[repr(u8)], you can still use a few match statements to convert back and forth, but it can have a cost if the underlying variants aren't the same bit patterns as their u8 version or, even if those are the same, if the compiler doesn't realize it can eliminate the conversion.