Enums with 'Other' values

Hello! I came up with the idea of such a type of enum:

enum Ethertype {
    Ipv6, // = 0x86DD,
    Others(u16)
}

This type of enum cannot be marked as #[repr(u16)] in current Rust version, and we should write From and Into traits to convert between such an enum to u16. Also, this enumeration costs more than one u16 in memory, as number of values would be 2^16 + 1, it's more than 2^16 as a bare u16 value.

If Rust has a language feature like this:

#[repr(enum_others, u16)]
enum Ethertype {
    Ipv6 = 0x86DD,
    Others(u16)
}

Then we could use such functions conveniently:

/* packet.ether_type: u16 */
let typ = packet.ether_type as Ethertype;
// or
match packet.ether_type as Ethertype {
    Ipv6 => process_ipv6(),
    Others(typ) => println!("unsupported type: {}", typ),
}
// both prints '2'
println!("{}", mem::size_of::<Ethertype>());
println!("{}", mem::size_of::<u16>());
// ether_type == Ipv6 => value1 = 0x86DD
// ether_type == Others(other_value) => value1 = other_value
let value1 = ether_type as u16;
// u16_value = 0x86DD => typ = Ethertype::Ipv6
// u16_value = others => typ = Ethertype::Others(u16_value)
let typ = u16_value as Ethertype;

I believe such an enum operation would be useful not only in network packet processing, but also in embedded system developent and other usages.

#[repr(enum_others, usize)]
enum RiscvTrap {
    UserSoft = INTERRUPT + 0,
    /* ... */
    Others(usize)
}
// prints '8' on 64-bit systems
println!("{}", mem::size_of::<RiscvTrap>());

Any further ideas or would this kind of enum be okay for Rust? (Should I write it to a proc macro?) :slight_smile:

4 Likes

It can, it just doesn't mean what you wish it did. (It determines the type of the discrminant.)

I'd love to see a better mechanism for handling enums with a catch-all case that fully populate their discriminant type.

One issue, though: what is Others(0x86DD) as u16? Is (Others(0x86DD) as u16 as Ethertype) == Ipv6? Is Others(0x86DD) == Ipv6? (Likewise for Ord and Hash and similar.)

Any proposal to address this would need a reasonable, consistent set of answers to those questions. I personally think it'd be reasonable to design the type so all the answers were consistent with Others(0x86DD) == Ipv6.

4 Likes

You can get pretty close to this with a really-a-struct-under-the-hood enum:

#[repr(transparent)]
pub enum Ethertype { Others(u16) }
impl Ethertype {
    pub const IPV6: Self = Self::Others(0x86DD);
}
11 Likes

One downside is that it can't be checked for exhaustive matches of those constants. Otherwise I think it's good -- anything else holding that back?

1 Like

This is a very good way of handling special values on current Rust. I've also just directly used a single element tuple struct for this before, rather than having to name It::Other.

The big question, though, is whether matches!(Ipv6, Other(_)) is true. Obviously, Other(0x86DD) has to be allowed, because we can't express u16 but not these niches as a surface level type.

If that's true, then exhaustive match checking doesn't mean anything, because you need to have the wildcard Other arm anyway. In a way, this is similar to #[non_exhaustive].

If it's false, then you can't name previously unnamed variants (see also the io::ErrorKind problem for naming new kinds, where std is (ab)using stability to be able to do so). In a way, it's similar to #[non_exhaustive]...

I don't know how plausible it is in practice, but there's a potential design where this is "just" a non exhaustive enum, but the unspecified discriminants are not niches (validity invariant), but allowed variants a la C++ enum class.

The benefit of modelling this as non exhaustive enums is that then they would benefit from whatever linting non exhaustive enums get to lint for this-version-exhaustive matching. (Though they would have to be non exhaustive locally as well.)


The semantic idea is the core of the idea here, but to throw out syntax, I'm still fond of first class syntax for non exhaustive enums:

#[repr(u16)]
enum Ethertype {
    Ipv6 = 0x86DD,
    ..
    // other values are INVALID
    // but can be added semver-compatibly
}

So that could be extended for a "valid non exhaustive":

#[repr(u16)]
enum Ethertype {
    Ipv6 = 0x86DD,
    Other = ..
    // other values are VALID
    // and can be named semver-compatibly
}

Ethertype::Other would not be valid in expression position, as it could have many values. Instead, you create other values of Ethertype via 0u16 as Ethertype. (Or perhaps Ethertype(0), syntax is just random could-maybe-work ideas.)

Ethertype::Other not being a real variant I think makes this a meaningful improvement over just (a proc macro automating) the surface language version possible today, and potentially sidesteps the question of what Other(0x86DD) is and whether matches!(Ipv6, Other(0x86DD)).

1 Like

Whether this actually works, though, depends on whether the other crates can mention Other. If they can, at least one of them will end up with code looking for Other specifically, and thus will break when that gets a specific name -- as was seen with ErrorKind.

I think I might prefer the "it's just #[repr(transparent)] struct EtherType(u16);" solution, but with a new lint like non_exhaustive_omitted_patterns that suggests you mention all the associated constants in your match.

1 Like

If you can't name the pseudo variant, having it as a named pseudo variant doesn't make sense. But I agree that the io::Error::Other issue applies here as well.

Either way, at a surface language level, it "feels" right to me to model this as enum with valid unnamed variants, even if it's a newtype struct, some associated constants, and a special lint at a semantics level.

1 Like

I think having Result<Ethertype, u16> generated from impl TryFrom<u16> for Ethertype is not really less convenient than the solution OP proposed. You can wrap the u16 with some newtype in either position.

match packet.ether_type.try_into() {
    Ok(Ipv6) => process_ipv6(),
    Err(typ) => println!("unsupported type: {}", typ),
}
1 Like