Pre-RFC: Using existing structs and tuple-structs as enum variants

Currently, the solution to including existing structs and tuple-structs as variants of an enum is to wrap it in a tag, as with any other variant. I believe this is sub-optimal for reasons of ergonomics, brevity, and clarity, however. e.g. when constructing or pattern matching on such values, one ends up doing things like Foo::Bar(Bar(...)), where the duplication of Bar is theoretically unnecessary.

For this reason, I prepose the following syntax:

struct Foo { ... }
struct Bar(...)
type Baz = Bar;

enum A {
    type Foo,
    type Bar, // specifying `type Bar` again yields an error
    type Baz, // specifying `type Baz` again yields an error
}

// Construct a value of type `A`.
let a1 = A::Foo { ... };
let a2 = A::Bar(...);
let a3 = A::Baz(...);

// Pattern matching a value of type `A` with destructuring.
match a {
    A::Foo { ... } => ...,
    A::Bar(...) => ...,
    A::Baz(...) => ...,
}

// Pattern matching a value of type `A` with binding.
match a {
    A::Foo @ foo => ..., // `foo` is of type `Foo`
    A::Bar @ bar => ..., // `bar` is of type `Bar`
    A::Baz @ baz => ..., // `baz` is of type `Baz` == `Bar`
}

// If-let pattern matching with destructuring.
if let A::Foo { ... } = a { ... }

// If-let pattern matching with binding.
if let A::Foo @ foo = a { ... } // `foo` is of type `Foo`

which should be equivalent to the following in today’s Rust:

struct Foo { ... }
struct Bar(...)
type Baz = Bar;

enum A {
    Foo(Foo),
    Bar(Bar),
    Baz(Baz),
}

// Construct a value of type `A`.
let a1 = A::Foo(Foo { ... });
let a2 = A::Bar(Bar(...));
let a3 = A::Baz(Baz(...));

// Pattern matching a value of type `A` with destructuring.
match a {
    A::Foo(Foo { ... }) => ...,
    A::Bar(Bar(...)) => ...,
    A::Baz(Baz(...)) => ...,
}

// Pattern matching a value of type `A` with binding.
match a {
    A::Foo(foo) => ..., // `foo` is of type `Foo`
    A::Bar(bar) => ..., // `bar` is of type `Bar`
    A::Baz(baz) => ..., // `baz` is of type `Baz` == `Bar`
}

// If-let pattern matching with destructuring.
if let A::Foo(Foo { ... }) = a { ... }

// If-let pattern matching with binding.
if let A::Foo(foo) = a { ... } // `foo` is of type `Foo`

Unfortunately, enum variants are not presently treated as actual types. (I believe this has gone through the RFC stage before but not developed, for whatever reason.) While that proposal would mesh nicely with this one, it is not strictly necessary, and as should be clear from the above, this feature can be implemented purely as syntactical sugar for tagging the variant type with a name identical to the type (or type alias’s) name, as is the common practice today.

As for concrete use cases, one example of where I believe it could really cut down on boilerplate and improve clarity is in the syn crate (and likewise other code involving parsers and expression trees). See this file for example. Additionally, I think this feature would encourage the reuse of variants between various enums, and make it far more ergonomic to do.

Note: this proposal was discussed briefly on IRC with @Centril and @mbrubeck, who suggested it could work in principle. The only possible issue implementation issue we raised was memory alignment of direct vs. indirect struct & tuple-struct variants, but this would probably not be a major one.

7 Likes

Note that in the Syn source code you linked to the enum variant name generally differs from the nested type name.

pub enum Expr {
    Array(ExprArray),
    Call(ExprCall),
    Tuple(ExprTuple),
    Binary(ExprBinary),
    Unary(ExprUnary),
    Lit(ExprLit),
    /* ... */

It would not be possible to rename the nested types ExprArray to Array and ExprTuple to Tuple because we also need there to be TypeArray and TypeTuple following the same pattern.

If we keep ExprArray and ExprTuple then we would not want to use:

pub enum Expr {
    type ExprArray,
    type ExprTuple,
    /* ... */

as this would be exposed as Expr::ExprArray etc which is repetitive.

Maybe show in the RFC some ways to fit this use case better.

2 Likes

Prior art: postponed RFC 1450 – Types for enum variants.

pub enum Expr {
    Array {
        /* ... */
    },
    Call {
        /* ... */
    },
    /* ... */

Under that RFC Expr::Array would constitute its own struct type equivalent to the current ExprArray struct. It’s not quite the same use case that you described because in that proposal an enum can’t pull in arbitrary external structs as variants, and a struct cannot be pulled in as a variant of multiple different enums.

2 Likes

Right, though this RFC would complement mine nicely, I think, as mentioned in the above post. Semantically it would make the most sense if both features existed simultaneously.

Fair point. The best thing I can suggest here is adding additional optional syntax to modify the tag (variant name). This would be a desirable feature regardless.

Maybe something akin to type aliasing syntax, since it would basically work like a type alias local to the enum scope. Additionally, type aliases would already be supported, per above.

enum Expr {
    type Array = ExprArray,
    type Call = ExprCall,
    ...
}

(Whilst retaining the simpler syntax type Foo if you don’t want to rename the variant relative to the type.)

Anyway, I still think the general use case of expression trees / ASTs is a good one, just obfuscated slightly by the name clashes in syn, perhaps.

5 Likes

I’m interested in this kind of extension also because it enables re-using the same variants in different enums; that in turn enables representing subsets of enums with possibly automatically derived widenings and (checked) narrowings, which allows for added expressibility of statical guarantees. (I know that this Pre-RFC suggests no such a thing, but it’s a step in that direction.)

4 Likes

Indeed, I had the initial part of that use case in mind when I came up with this, but somehow forgot to mention it in the original post. I don’t want to side-track too much, but would you care to explain briefly what you mean by automatically derived widening and checked narrowings?

I mean that in the future, as the variants would be nominally of the same type, there could be a feature that allows deriving conversion methods between different enums, and depending on if it always succeeds (conversion from subset to superset) or not (conversion from superset to subset or between subsets) it would be an .into() style method or .try_into() style method. Or possibly further, some kind of a subtyping-without-inherintance style feature.

1 Like

For what it's worth, there's a thread on possible "embedding" sugars where "enum embedding" came up as a thing that might do that. To borrow the strawman syntax from this comment:

enum Foo {
    Var1,
    Var2(String).
}

enum Bar {
    use Foo;
    Var3,
}

// Could compile to something like:

enum Foo {
    Var1,
    Var2(String).
}

enum Bar {
    Var1,
    Var2(String),
    Var3,
}

I don't think anyone in that thread specifically suggested that "enum embedding" might automagically provide the obvious Bar: From<Foo> and Foo: TryFrom<Bar> impls we want, but that's what came to my mind when I caught up on this thread.

Oh yes, that would be a nice feature. It would probably belong as a derive attribute directive, right?

Interesting! That made me to realize that there is actually two valid ways of “extending” enum types: embedding, which is defining a new type in downstream that has the superset of variants of the upstream type, and narrowing which is defining a new type that has a subset of variants of he upstream type.

If the compiler is aware of the relationships between enums, it can also compile the tag to match between the types, which would also allow conversions between references (immutable only because mutable references are invariant) to the types, as they could be made binary-compatible.

@alexreg I imagined it originally with derive attribute, but the problem with that is that the derive macros can’t inspect code any other way than syntactically. That would mean that they wouldn’t have any information about the another subset/superset enum.

I'm a bit split on the idea.

One one side I've actually hit the annoyance and thought if it could be possible somehow.

On the other hand, it's also yet another alien-looking syntax extension one needs to learn and makes the language larger. And I'm not 100% sure it makes the code more readable or if it only makes it shorter.

So I'd have two questions to ask here:

  • Do we have an idea how prevalent this problem is through the existing source code? How many percent of crates could benefit from this and how often per million of lines of code it could be used, or something like that?
  • Does it need to be syntax extension, or could something (a procedural macro) be used to do that? If no, could the right way be to extend procedural macros so they could do it, possibly allowing whole bunch of other use cases (I have the feeling similar things would be needed for example for proc macro to delegate arbitrary traits into a 1-element tuple structs). So instead of doing a special case for enums, having some more general approach.

I think this'll break if you have two independent enums coming from independent extern crates and want to embed them into a common super-enum.

3 Likes

Are you replying to my original post, or @GolDDranks’s tag-on suggestion? It’s not immediately clear.

Actually, generally to both proposals :innocent:. I’m just suggesting that whatever the goals, it should first be clear there’s a large-enough need for that to warrant the feature and that it gets created with as minimal cost as possible.

1 Like

Probably worth linking to https://github.com/rust-lang/rfcs/pull/2363, in which the ability to set discriminant values on any enum (non just C-like enums) is proposed with the explicit motivation of making it safe to assume that a trivial transmute/pointer cast is a sound implementation of conversion from an enum to a subset enum. Apparently Servo badly needs that.

I’m guessing that guaranteeing a transmute-only From impl for the subset-to-superset conversion ought to be a requirement for any “enum subset” feature worth adding to the core language. I’m also guessing a proc macro cannot provide that since proving the transmute valid means proving the compiler won’t apply different layout optimizations to the subset and the superset enums. Hopefully there’s a real bit-twiddler here who can confirm that.

Fair enough. The syntax of my suggestion isn’t really adding anything new though: it just changes where it can be put. I find it pretty intuitive. :slight_smile:

Any more thoughts on this, folks?

A slightly different version of the idea: What about anonymous enums?

So you’d have the type enum(Foo|Bar|Baz), say, which could be matched with patterns like match y { x: Foo => ..., x: Bar => .... And it would magically implement any trait implemented by all of its variants (other than things like Default, I guess).

That would enable things like type IpAddrAlt = enum(Ipv4Addr | Ipv6Addr);, or a function that’s -> Result<i32, enum(NegativeInputError|OverflowError)>.

Obviously this couldn’t let you do things like type Foo<T, U> = enum(T|U);, but my intuition is that the rules needed for that are the same as the overlap rules for trait impls – that example doesn’t work for the same reason that impl<T> Foo for T and impl<U> Foo forU cannot both exist.

1 Like

I am not a huge fan of the ascription-style syntax, since it sort of suggests that it’s a general phenomenon (for example, the syntax you suggest is how Scala gives access to Java’s instanceof). I think that, for symmetry with tuples, we should have a “desugar” of

type Foo = Bar | Baz; // the `enum` bit is unnecessary given current grammar
// into
enum Foo {
    0(Bar), 1(Baz)
}

match foo {
    0(x) => ..
    1(y) => ..
}

As far as traits, I think they should be implemented manually (like Clone et. al. are in core for tuples) by macro (and eventually variadic generics). To take the analogy with structs further, I imagine “sum enums” (if we call these sum types) in analogy to “tuple structs”:

enum Foo(Bar | Baz); // generates the same enum with the same digit variants.

let x = Foo::0(Bar);

This syntax also allows us to write Foo | Foo as a valid type.

I'm not a fan because that means I need to remember whether it's Bar | Baz or Baz | Bar, for which my memory is not good enough. I think it's an important feature of enums and (non-tuple) structs that you can reorder their variants whenever you want without* affecting functionality.

* Well, unless you're using #[derive(Ord)], but I consider that an anti-pattern on things that aren't tuple structs.