Idea: allow C-like enums in match arms over their repr types

I don't have the time or energy to defend this as a fleshed-out (pre-)RFC so my responses will be limited. I'm mostly just posting this here in case someone else sees merit in this concept and wants to see this through. While I have had a number of use-cases for something like this, I've only done cursory research into the design space and I don't care to exhaustively discuss alternatives.

Motivation

When writing low-level protocol-handling code, it's common to have a C-like enum to represent different message or packet types, e.g.:

#[repr(u8)]
enum PacketKind {
    Foo = 0,
    Bar,
    Baz
}

However, it then becomes tedious to use that in parsing code:

pub fn decode_packet(packet: &[u8]) -> Result<Packet, PacketError> {
    if packet.len() < 2 {
        return Err(PacketError::TooShort);
    }

    let kind = match packet[0] {
        0 => PacketKind::Foo,
        1 => PacketKind::Bar,
        2 => PacketKind::Baz,
        other => return Err(PacketError::unexpected_kind(other)),
    };

    match kind {
        PacketKind::Foo => decode_foo(&packet[1..]),
        PacketKind::Bar => decode_bar(&packet[1..]),
        PacketKind::Baz => decode_baz(&packet[1..]),
    }
}

Now, crates exist that let you generate a TryFrom impl with a derive, such as num_enum or enum-utils. And that would let you skip writing the first match.

However, using this in the above code effectively generates a redundant match, as there will be one in the TryFrom impl to convert from u8 to PacketKind, and then one in the code we write to dispatch on the value of PacketKind. Will the optimizer eliminate the redundant match? Probably, but as the Rust community has been learning, it's better to not generate as much redundant code in the first place.

It's also not super easy to discover these crates; even if you search "enum repr" on crates.io, you have to sift through a number of implementations of varying quality and maintenance status: https://crates.io/search?q=enum%20repr

Proposal

Allow using C-like enum variants in match arms for their #[repr] type:

match packet[0] /* : u8 */ {
    PacketKind::Foo => decode_foo(&packet[1..]),
    PacketKind::Bar => decode_bar(&packet[1..]),
    PacketKind::Baz => decode_baz(&packet[1..]),
    other /* : u8 */ => Err(PacketError::unexpected_kind(other)),
}

This would effectively desugar to something like:

match packet[0] /* : u8 */ {
    const { PacketKind::Foo as u8 } => decode_foo(&packet[1..]),
    const { PacketKind::Bar as u8 } => decode_bar(&packet[1..]),
    const { PacketKind::Baz as u8 } => decode_baz(&packet[1..]),
    other /* : u8 */ => Err(PacketError::unexpected_kind(other)),
}

though that's likely not valid syntax in the first place, hopefully you get the idea.

Having discussed this briefly with coworkers, we identified a few edge cases (not an exhaustive list):

  • The catchall arm should be required to handle values that aren't a valid enum variant, unless the enum exhaustively covers the value space of its #[repr] (feasible for 8 and 16 bit enums).
  • Obviously, enums with data wouldn't work with this so they should be rejected.
  • We're not sure how this would interact with enum-variants-as-types but C-like enums aren't very useful with that feature anyway.
  • I think you should be allowed to use multiple enum types in the same match as long as they have the same #[repr], e.g.:
#[repr(u8)]
enum SomeOtherPacketKind {
    FooBar = 4,
    FooBaz,
    BarBaz,
}

match packet[0] {
   PacketKind::Foo => decode_foo(&packet[1..]),
   ...
   SomeOtherPacketKind::FooBar => decode_foo_bar(&packet[1..]),
    other /*: u8 */ => Err(PacketKind::unexpected_kind(other)),
}

though I do concede that this code kind of has a smell to it, so I wouldn't be against having a lint for this. If the enums overlap that should be fine as they can be linted just like other redundant match arms.

  • Binding patterns should be allowed, but if mixing and matching is allowed then obviously you can't do that within a binding pattern because there's no applicable type without anonymous sum types:
match packet[0] /* : u8 */ {
    // Allowed
    kind /* : PacketKind */ @ (PacketKind::Foo | PacketKind::Bar) => decode_foo_or_bar(&packet[1..]), kind),
    // Rejected
    kind /* : ??? */ @ (PacketKind::Foo | SomeOtherPacketKind::FooBar) => (),
    ...
}
1 Like

It absolutely is valid syntax. On nightly. Using unstable features. Rust Playground

2 Likes

Ah neat. I had previously tried

match packet[0] {
    (PacketKind::Foo as u8) => ...,
    ...
}

which is not valid, so I assumed the same for const { }. Still, that's a good bit of boilerplate that can be saved with my proposed sugar.

My first instinct here is that changing the type system for match is too strong a change to enable this.

For example, one could also use a proc macro to generate associated constants such that you can write

match packet[0] /* : u8 */ {
    PacketKind::FOO => decode_foo(&packet[1..]),
    PacketKind::BAR => decode_bar(&packet[1..]),
    PacketKind::BAZ => decode_baz(&packet[1..]),
    other /* : u8 */ => Err(PacketError::unexpected_kind(other)),
}

Or make a quick macro_rules! for something like

mymatch! { (as u8) packet[0] {
    PacketKind::Foo => decode_foo(&packet[1..]),
    PacketKind::Bar => decode_bar(&packet[1..]),
    PacketKind::Baz => decode_baz(&packet[1..]),
    other => Err(PacketError::unexpected_kind(other)),
}

that expands to the const patterns.

Or at least I'd like to see inline const patterns stable to see how much they help before changing the type system to do this.


And you probably know this, but the translation of C's

enum PacketKind { Foo, Bar, Baz }`

is not to a rust enum, but to something more like

struct PacketKind(u8);
impl PacketKind {
    const Foo: Self = Self(0);
    const Bar: Self = Self(1);
    const Baz: Self = Self(2);
}

So if one has already done that, then the match is

match PacketKind(packet[0]) {
    PacketKind::Foo => decode_foo(&packet[1..]),
    PacketKind::Bar => decode_bar(&packet[1..]),
    PacketKind::Baz => decode_baz(&packet[1..]),
    PacketKind(other) => Err(PacketError::unexpected_kind(other)),
}

which is already pretty reasonable.

4 Likes

Macros can help with boilerplate.

// get enum discriminant for expressions and patterns
macro_rules! d {
    ($e:expr) => {
        const { $e as _ }
    };
}
pub fn decode_packet(packet: &[u8]) -> Result<Packet, PacketError> {
    if packet.len() < 2 {
        return Err(PacketError::TooShort);
    }

    match packet[0] /* : u8 */ {
        d!(PacketKind::Foo) => decode_foo(&packet[1..]),
        d!(PacketKind::Bar) => decode_bar(&packet[1..]),
        d!(PacketKind::Baz) => decode_baz(&packet[1..]),
        other /* : u8 */ => Err(PacketError::unexpected_kind(other)),
    }
}

Rust Playground

1 Like

Yes, but a lot of people are hesitant to use macros or specifically avoid them, and it's still inferior to my proposal in UX.

Oh, I’m just contributing with laying out what is possible already, so we can have a fair comparison of what the proposal would need to improve upon, and how significant the improvement is.


Note that this is a single macro usable for all enums and all reprs. If people are supposed to be okay with learning about a whole new language feature, the alternative of learning about a single new macro wouldn’t seem too problematic to me, so I don’t buy “people are hesitant to use macros” as a particularly strong argument.

UX can of course be an argument.

2 Likes

There's an alternative here: deref patterns could be leveraged (if they ever lang in stable) to make the following work

pub fn decode_packet(packet: &[u8]) -> Result<Packet, PacketError> {
    if packet.len() < 2 {
        return Err(PacketError::TooShort);
    }

    match &packet[0] {
        Ok(PacketKind::Foo) => decode_foo(&packet[1..]),
        Ok(PacketKind::Bar) => decode_bar(&packet[1..]),
        Ok(PacketKind::Baz) => decode_baz(&packet[1..]),
        err => return err,
    }
}

as long as there's some impl<'a> Deref<Result<PacketKind, PacketError>> for NewType(&'a [u8]) {}.

You can also use TryInto today, at the cost of forcing you to consume packet and having to make you write

    match packet[0].try_into() {
        Ok(PacketKind::Foo) => decode_foo(&packet[1..]),
        Ok(PacketKind::Bar) => decode_bar(&packet[1..]),
        Ok(PacketKind::Baz) => decode_baz(&packet[1..]),
        err => return err,
    }
2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.