What is the status of language support for converting enums to/from their repr type?

Hi,

tl;dr: I'm trying to understand why rust has only partial support for C-like enums (= fieldless enums with a primitive representation), and whether there is any possibility for an RFC to close the gap at language level instead of forcing users to choose between myriad crates of varying degrees of complexity and quality.

=== Details below ===

The language officially supports primitive representation of fieldless enums (aka C-like enums). They can be defined (and the repr type is part of the public signature of the enum), and the compiler will honor the corresponding size and alignment requirements. Such enums can also be cast as their repr type, which requires special treatment by the compiler because enums are neither primitive types nor trait objects (as the compiler helpfully points out when attempting to cast in the reverse direction):

error[E0605]: non-primitive cast: u8 as MyEnum

an as expression can only be used to convert between primitive types or to coerce to a specific trait object

The one-way casting makes sense, because not every u8 value is a valid representation of any MyEnum variant. But even the lossless direction that is supported lacks the corresponding From<u8> for MyEnum.

This is in contrast with e.g. the primitive types that provide impl From for all valid combinations of lossless/widening casts (which is much safer than relying on as which is willing to perform lossy casts, such as truncating the fractional part of a f32 when casting as i32).

The primitive types even provide myriad impl TryFrom for narrowing casts, to cover the values that do allow for lossless conversions. But there is no corresponding impl TryFrom<MyEnum> for ReprType when defining a C-like enum. Sure, those traits can be derived manually (or with a macro, perhaps provided by one of those many crates)... but why?

A major reason for defining a C-like enum at all, is because one needs to translate between the enum and its ordinal values (ie while implementing a binary specification of some kind). If rust had no concept of a C-like enum in the first place, that would be one thing (sad, but at least self-consistent). But the concept exists, and the language does the work to allow casting C-like enums to their repr type. It just seems to stop short of fully supporting the concept.

There have been lots of proposals and discussions over the years, and many (many) crates exist to emulate this rather basic behavior, but I got lost in the details and was unable to get a good sense of what the true blockers are and whether there is any possibility for a very simple RFC that would allow this:

#[repr(u8)]
enum MyEnum {
    A = 0,
    B = 1,
    C = 3,
    D = 4,
}

fn main() {
    let o = MyEnum::A as u8;        // 0u8 (already works today)
    let o = u8::from(MyEnum::A);    // 0u8
    let e = MyEnum::try_from(0u8);  // Ok(MyEnum::A)
    let e = MyEnum::try_from(2u8);  // Err, invalid repr value
    let o = u16::from(MyEnum::A);   // no such trait, convert to u8 first
    let e = MyEnum::try_from(0u16); // no such trait, convert to u8 first
}

I guess the main technical questions would be:

  1. Should this work for any other enum types?
    • A: NO. Only fieldless enums with a primitive representation have an unambiguous mapping to/from integers of the repr type.
  2. Should the conversion traits be derived unconditionally?
    • A: Yes, this is a basic characteristic of C-like enums; also, complexity goes through the roof if we need to introduce additional annotations to control this behavior.
  3. Wouldn't this be a breaking change?
    • A: Yes, because it would conflict with all existing implementations of those conversions; it would have to ship as part of a language edition bump
  4. What should the ErrorType of that TryFrom be?
    • A: std::num::TryFromIntError already exists for cases when "a checked integral type conversion fails" -- we could probably just use that. Callers can always Result::map_err it to a type they like better.
  5. If there's an impl From<MyEnum> for u8, should there not also be one for u16, u32, etc?
    • A: No, it adds complexity for little benefit; users who need them can easily -- and stably -- assemble them from existing conversions, once the basic conversions exist at all.
3 Likes

Editions don't solve this problem. Having to support all editions and non-sensible implementations aside,[1] the existing implementations of TryFrom may use a different error type.


  1. which effectively blocks it for From ↩︎

Probably no, as that'd make the following a breaking change:

#[non_exhaustive]
#[repr(u8)]
pub enum MyEnum {
    A = 0,
    B = 1,
}

#[non_exhaustive]
#[repr(u8)]
pub enum MyEnum {
    A = 0,
    B = 1,
    C(usize), // New variant that makes this a non-C-like enum
}

By requiring something like #[derive(TryFromRepr)] you also avoid (3), as it ends up being an opt-in by the crate defining the enum [1]


  1. Though that opt-in may require a breaking change in that crate due to the error type changing. ↩︎

3 Likes

Good point... Would TryFromRepr also imply IntoRepr? Or would that be a second opt-in?

cc @scottmcm, who has been working on a proposal for this.

There is some prior art of Rust bending to support C's constructs with the union type.

I personally would push for extending TryFrom as far as possible. Once all avenues are exhausted, then we could introduce a new trait, e.g.TryFromRepr. Rust already has enough ways to convert between types.

To immediately contradict myself though, another avenue to explore would be introduce analogies of ToString for types that are encoded as integers, e.g. ToU8, ToU16, etc. That would enable more of those conversions that are currently unsupported to be automatically implemented, i.e. impl From<T> for u16 where T: ToU16.

Aside - how would endianness be handled?

Maybe I had misunderstood, but I thought the intent of #[derive(TryFromRepr)] would be to request an impl TryFrom<ReprType> for EnumType, rather than actually introducing a new trait with that name?

Major downside is, the derive macro's name would not match that of the trait it derives.

It also seems unlikely that we could actually get away with introducing a new trait with a blanket impl, because an impl<T: TryFromRepr> TryFrom<T::Repr> for T would conflict with basically every other blanket impl that already exists or might exist in the future. We already hit that a lot with [Try]Into because of the blanket impl for all T: [Try]From.

I suppose an alternative might be to expand the repr annotation, e.g.

#[repr(u8, TryFrom, Into)]
enum MyEnum { ... }

That would read as an enum "with repr type u8, impl TryFrom<u8>, and impl Into<u8> (via impl From for u8)", and only works for fieldless enums.

Or (following C vs C-unwind idea):

#[repr(u8-fieldless)]
enum MyEnum { ... }

Where the -fieldless suffix (actual name TBD) officially designates the enum as C-like with appropriate bidirectional conversions in the form of auto-derived TryFrom<u8> and Into<u8>. Another possible name might borrow "interconvertible" concept from C++, e.g. u8-interconvertible. But it's a very long name, and the rustonomicon already suggests that u8 by itself should already produce an interconvertible type:

Adding a repr(u*)/repr(i*) causes it to be treated exactly like the specified integer type for layout purposes (except that the compiler will still exploit its knowledge of "invalid" values at this type to optimize enum layout

This is all rather "creative" -- and I have no idea if there's any possibility for expanding the repr attribute like that -- but would at least be a relatively clean way to opt in to deriving the type conversion traits. Vaguely similar to how one can "opt in" to packed and align as modifiers for C and Rust?

AFAIK, neither C/C++ nor Rust makes any language level provision for endianness? Somebody implementing a binary specification would either have to designate enum values in a way that reconciles endianness of the target architecture with that of the spec, or perform endianness conversions on the bytes they read/write. Fortunately all the integer types provide const to/from le/be/ne bytes functions, so the necessary tools are available.

1 Like

No, I think that I misunderstood the earlier message. Thank you very much for elaborating on these ideas. Hopefully there will be some way through all of this. It certainly seems as though there's a lot of good intent.

A grab-bag of context:

Allowing From and TryFrom to be derivable doesn't seem incompatible with this plan, but I think the plan is uncertain enough at the moment that it's not obvious exactly what should be done or how things will interact, so things are a bit stalled, and I'm not sure whether anyone is working on firming up that plan (cc @jack).

2 Likes

It wouldn't create a nice API, but it's also worth keeping in mind mem::Discriminant; for enums with an explicit repr, that type could provide [into|from]_inner functions to get at the underlying repr value.

1 Like

Correct, initially I had #[derive(TryFrom)], but it felt confusing as to what it's supposed to implement TryFrom for. I did not intend to suggest adding a new Trait.

Another (more flexible) option could be the following:

#[derive(TryFrom(u8, usize), Into(u8)]
#[repr(u8)]
enum MyEnum {...}

The more I think about it, the more I like it, as it could also be done for other traits, too. And it clearly shows what is derived. Though it does make the derive list longer.

This I'm not sure about. You usually want both directions but in derive we usually/always have the trait names, as you said in a later comment, which makes it even more confusing what #[derive(TryFrom)] implements. That's why I choose a different name for it (even though it just implements TryFrom. In the back of my mind was also the idea of aliases/groups of traits that are derived (to reduce the need for large derive lists for newtype wrappers): Pre-RFC: `#[derive(From)]` for newtypes - #17 by DragonDev1906

1 Like

Just to note, there's also the current unstable derive(CoercePointee) which doesn't implement a trait CoercePointee. (Currently, anyway; it impls CoerceUnsized and DispatchFromDyn. I could see CoerceUnsized being renamed to CoercePointee.)