A #[repr] to allow an enum containing enums to use a single discriminant

Suppose I have an enum KindOfThing. It looks like this:

enum KindOfThing {
    Apple,
    Spaghetti,
    Cauliflower,
    Table,
    Chair,
    Bed,
    Desk,
    Other,
}

But I would like it to look like this:

enum KindOfThing {
    Food(KindOfFood),
    Furniture(KindOfFurniture),
    Other,
}

enum KindOfFood {
    Apple,
    Spaghetti,
    Cauliflower,
}

enum KindOfFurniture {
    Table,
    Chair,
    Bed,
    Desk,
}

That way I can pass around KindOfFood and KindOfFurniture and match on them exhaustively without writing any boilerplate like this:

impl KindOfThing {
    fn kind_of_food(&self) -> Option<KindOfFood> {
        match self {
            KindOfThing::Apple => Some(KindOfFood::Apple),
            // ..etc
            _ => None,
        }
    }
}

I need KindOfThing to be a single byte, though. (I imagine some of you have wished for a version of Option that would allow this, for a more concrete example.) But it's impossible for the compiler to optimize the second version down to a single byte, because that will make it impossible to take a reference to the KindOfFood or KindOfFurniture within.

Since #[repr(packed)] structs will soon forbid taking references in safe code, it seems reasonable to create a new repr that enables this optimization, or possibly to allow #[repr(packed)] on enums with this behavior.

Has this been discussed before? Any thoughts?

3 Likes

Option does do this, and is in fact guaranteed to: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c253fe463dadeb133dd5c45ed99bf7cb

So does an enum that looks like Option: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3a8add3b13067b06bd3c2c5ee12d2898

That's not the reason. The compiler just doesn't have the logic to handle niches like that. It could do so today, without any annotation.

Oh, interesting.

I can see how this is possible with Option if non-Option-like enums are made to be nonzero, but I don't see how it's possible in general.

To elaborate, all enums held by KindOfThing must "know about" each other in order to avoid having overlapping discriminants. This isn't possible if they're in different crates.

One version of this could be having copy-only fields (no references allowed) so that more aggressive layout optimizations would be possible as there's always an opportunity to run code when reading or writing it. That would also allow things like structs than can store multiple enum fields in one byte, bitfield-style.

5 Likes

It's possible if they are compiled together, though.

I happen to have written up a pre-RFC covering this feature:

I haven't gotten much feedback about it, and I'm less sure if I'll proceed with the RFC process myself.

Possible, yes, but not necessarily feasible. Doing this fully transparently would require the compiler to scan the entire graph of types to determine which enums are nested in which, probably use heuristics to decide how frequently does this happen and whether some cases may be worth sacrificing.

Unless you're thinking of having a single global counter and by default assigning discriminants by reading and then incrementing it. But that feels like a rather wasteful solution.

In any case, compact fields as described by myself get you most of the way there, are explicit and much cheaper to implement.

5 Likes

Another solution would be to use the following code:

enum KindOfThing {
    Food(KindOfFood),
    Furniture(KindOfFurniture),
    Other,
}

enum KindOfFood {
    Apple,
    Spaghetti,
    Cauliflower,
}
// You could generate this with the [variant_count] crate,
// but then it doesn't compile in Rust Playground
const VARIANT_COUNT_KindOfFood: isize = 3;

enum KindOfFurniture {
    Table = VARIANT_COUNT_KindOfFood,
    Chair,
    Bed,
    Desk,
}

Given that the above enums are guaranteed to have non-overlapping ranges of values, the compiler could theoretically store them in a single byte, similarly to the "manual" version of KindOfThing OP posted.

Doing so might be simpler than adding a new representation to the language with specific rules, and it would be less constraining for the end user.

So far it looks like the compiler only performs that optimization if the root enum has at most one variant with multiple sub-variants (so KindOfThing isn't optimized, but Option<KindOfFood> is). I don't know how hard that would be to extend.

(also, having a built-in version of the variant_count crate wouldn't hurt)

4 Likes

I like your proposal a lot. It's a solution with no new language features that could work well until something like @felix.s's pre-RFC gets stabilized, if that ever happens. It also leaves open the possibility of taking references to the inner values!

[Sorry for posting and deleting, I haven't used Discourse before and it's confusing]

There's an edit button below your posts (the crayon-looking icon).

I know, but I was confused because I'd used the "reply" button to reply to your post but didn't see any visible indication that it was a reply.

For some reason Discourse only displays the "reply" icon if you're replying to a post that isn't the most recent. If I had to guess, it's to clean up the UI in the common case.

What about only allowing shared-discriminant enums with inner enums that have all/some discriminants specified manually, so compiler only have to check for non-overlapness, not guess the non-overlapping set.

1 Like

Quick note: any implementation solving OP's problem would have a drawback, which is that

kindOfThing1 as u8 == kindOfThing2 as u8

is no longer equivalent to

mem::discriminant(kindOfThing1) == mem::discriminant(kindOfThing2)

which probably has performance implications (no jump tables), among other things.

A thing I've found myself wanting, which is an alternative solution to this problem, is something along the lines of "subenums", i.e., an enum made up of variants from another enum:

enum MyEnum {
  Red, Green, Blue(u32),
}

enum MySub {
  use MyEnum::Red,
  use MyEnum::Blue,
}

In this manner, MySub has the same runtime representation as MyEnum, and is in fact just a MyEnum with a narrowing constraint on the variants. This means that you get MySub as MyEnum for free (no subtyping relationship, god save the type checker).

This also avoids the concern about breaking jump tables, and the cross-crate concerns.

5 Likes