Last updated 2020-05-08. See edit history (the orange pencil) for the edit history. Short changelog at the bottom.
Forked from Ideas around anonymous enum types, one non-anonymous potential new enum form that I think could be exceedingly useful for certain design spaces.
Specifically, a proposal in the design space of "types as enum variants" (not "enum variants are types"). I am not wedded to the name of "Tuple Enum;" it is chosen in analogue to Tuple Structs, but I am open to alternate naming that doesn't get too much into the weeds of type theory.
The RFC format is followed loosely here to introduce the concept in a semiformal manner. This is not yet really a proper RFC text nor even really "pre RFC" stage; this is just to get comments on the concept and proposal generally.
Additionally, please avoid discussing anonymous/structural enumerations in this thread. Ideas around anonymous enum types is where that discussion currently lives, and has a draft RFC for that feature.
Summary
Analogous to tuple structs (struct Foo(u32, u64);
), which are product types with indexed fields, we add tuple enums (enum Foo(u32, u64);
), which are pure sum types of indexed variants.
Guide-level explanation
Example
syn, the de-facto library for Rust syntax in procedural macros, uses what it describes as a syntax tree enum to store the typed syntax tree. For example, syn::Expr
is roughly defined as
pub enum Expr {
Array(ExprArray),
Cast(ExprCast),
If(ExprIf),
MethodCall(ExprMethodCall),
Tuple(ExprTuple),
// many variants omitted
}
where each variant of the enum is just a tag around a separately defined type—what serde calls a newtype variant (as opposed to a tuple variant or a struct variant). The idiom for use is to match and rebind with a refined type for further processing:
let expr: Expr = /* ... */;
match expr {
Expr::Cast(expr) => /* ... */,
Expr::If(expr) => /* ... */,
Expr::MethodCall(expr) => /* ... */,
/* ... */
}
As syn's syntax tree enums are "pure" sum types, in that each variant is a newtype variant, it could instead be written as a tuple enum to avoid the need to name each variant:
pub enum Expr (
ExprArray,
ExprCast,
ExprIf,
ExprMethodCall,
ExprTuple,
// many variants omitted
);
Matching on an Expr
is now done by index:
let expr: Expr = /* ... */;
match expr {
// NB: indexes are examples
Expr::1(expr) => /* expr: ExprCast */,
Expr::2(expr) => /* expr: ExprIf */,
Expr::3(expr) => /* expr: ExprMethodCall */,
/* ... */
}
More usefully, type ascription can be used in the pattern to infer which variant is meant:
let expr: Expr = /* ... */;
match expr {
Expr::_(expr: ExprCast) => /* ... */,
Expr::_(expr: ExprIf) => /* ... */,
Expr::_(expr: ExprMethodCall) => /* ... */,
/* ... */
}
When would I use this?
Any time you'd have an enum
today with exclusively "newtype variants" of distinct types is a good canditate to be a tuple enum. It's a similar tradeoff as a regular struct versus a tuple struct; either would probably do in any case, but each one has certain cases where it can make more sense.
This is not "variant types"
In the variant types proposal, you might instead lift the ExprVariant
types to be Expr::Variant
variant types. This could look similar to
pub enum Expr {
Array { attrs: Vec<Attribute>, /* ... */ },
Cast { attrs: Vec<Attribute>, /* ... */ },
If { attrs: Vec<Attribute>, /* ... */ },
MethodCall { attrs: Vec<Attribute>, /* ... */ },
Tuple { attrs: Vec<Attribute>, /* ... */ },
// many variants omitted
}
However, there is one big key difference between this proposal, "types as enum variants," and "enum variants are types." In "enum variants are types," e.g. Expr::Array
is a type representing exactly the variant Array
of Expr
. This means that the size and alignment of each variant type are the same as the full enum, as the variant type is just the enum type with a known variant. (In other words, transmute::<&Enum::Variant, &Enum>
is always sound.) With "types as enum variants," each type (e.g. ExprArray
) are still their own types with their own size and alignment which can differ from that of the enum's.
This is not "type sets" / "type unions"
Simply, if you write enum Foo(u32, u32)
, this is distinct from enum Foo(u32)
and from enum Foo(u32, u32, u32)
. When there are more than one variant of a given type, the type-refinement match syntax is not usable (as it would be ambiguous) and instead the indexed match syntax must be used.
Reference-level explanation
A brief type theory overview
A type is a set of values with that type. Additionally, any value has only one type. (E.g. the value 0u8
has the type u8
, but is a distinct value from 0u16
, which has type u16
.)
A product type is the type that results from taking the product of two sets. Given type P with |P| potential values and type Q with |Q| potential values, the product type P×Q has |P| × |Q| potential values.
In Rust, a struct is a nominal product type (it is identified by its name) with named fields, a tuple struct is a nominal product type with indexed fields, and a tuple is a structural product type (it is identified by it structure) with indexed fields. (Structural records would be the fourth kind of type in the family; a structural product type with named fields.)
A sum type (or disjoint union or coproduct) is the type that results from taking the sum of two sets. Again given type P and type Q, the sum type P+Q has |P| + |Q| potential values.
In Rust, an enum is a sum type, but it is also more than a sum type. Instead, it's a sum type of anonymous unnamed product types for each variant. This is what that "variant types" proposal is exposing: the extra product type layer in the enum
definition. In contrast, a tuple enum is just a sum type of existing types, and does not include the extra mechanics for introducing an extra product layer.
Strictly speaking, you cannot sum one set (type) with itself (e.g. u32 + u32
), as the two sets are not disjoint (thus a sum being a disjoint union). A (type theoretical, not C) union type solves this by deduplication; the set u32 ∪ u32
is identical to the set u32
. However, there is a standard solution to this problem -- just tag the values with from which set they originated.
For "struct like" enum
(the existing enum
), the difference between a sum and a union type doesn't matter, as it introduces the new product type wrapping any external types, and so every type summed is guaranteed distinct. Tuple enums follow the tagging behavior to keep the same behavior as an enum
of just newtype variants (effectively, they desugar to them) and to maintain generic hygeine. In the simple case of nongeneric tuple enums with a known set of variants of disjoint types (which the author believes is the main use case for enums with anonymous members), there is no difference between a sum type and a union type.
Generic hygeine
Basically, that given an Either
type defined as a tuple enum
enum Either<Left, Right>(Left, Right);
and processing code of
fn example<Left, Right>(
make_left: fn() -> Left,
make_right: fn() -> Right,
) -> Either<Left, Right>
{
let place: Either::<Left, Right>;
if random() {
place = Either::0(make_left());
} else {
place = Either::1(make_right());
}
match place {
Either::_(_: Left ) => println!("left"),
Either::_(_: Right) => println!("right"),
}
place
}
, this is equivalent to the rewritten
fn example<Left, Right>(
make_left: fn() -> Left,
make_right: fn() -> Right,
) -> Either<Left, Right>
{
let place: Either::<Left, Right>;
if random() {
place = Either::0(make_left());
println!("left");
} else {
place = Either::1(make_right());
println!("right");
}
place
}
as which arm is taken is based on the local type before monomorphization. This is the case no matter who constructs the enum; the arm is resolved to a specific index based solely on local type information.
As a second example, consider
enum Foo<T>(u8, T);
fn example<T>(foo: Foo<T>) {
match foo {
Foo::_(_: T) => println!("T"),
Foo::_(_: u8) => println!("u8"),
}
}
This is resolved to the indexed syntax of
enum Foo<T>(u8, T);
fn example<T>(foo: Foo<T>) {
match foo {
Foo::1(_: T) => println!("T"),
Foo::0(_: u8) => println!("u8"),
}
}
as arms are chosen based solely on local pre-monomorphization info. Even if a Foo<u8>
is provided, for which type ascription matching is unusable due to being ambiguous on which variant is meant, the arms are unambiguous for this generic function, because each type is a different local type.
Implementation details (effective desugaring)
The tuple enum enum Foo(A, B, C)
has identical layout semantics as enum Foo { 0(A), 1(B), 2(C) }
would, if integers were valid identifiers. Indexed based matching is handled as if the tuple enum were a "struct like" enum with this definition.
Index-based and type-ascription styles of matches may be mixed.
"Type ascription" matching relies on the locally expressed (potentially generic) type of each variant, an inferred variant index, and generalized type ascription in pattern position. If the variant's index is inferred, the resulting binding must* have a locally evident type (either through a pattern with known type or pattern type ascription). That type is then compared against each variant type and the type of the enum itself. If exactly one† variant type is found to match, that variant's index is used. If more than one variant's type matches, an error ("ambiguous match arm") is emitted.
* It is also possible that a simple binding in this position could be assigned a type inference variable, such that its type could be inferred from usage rather than having to be ascribed. The author thinks that this might have more potential for confusion that it is useful, but remains open to the idea.
† An alternative approach would say that given enum Foo(u32, u32)
and the pattern Foo::_(x: u32)
, the pattern matches all variants with the type of u32
.
Interactions with #[non_exhaustive]
A #[non_exhaustive]
tuple enum is treated the same way as a "struct like" enum: a fallback arm must be provided when matching over it (outside of the implementing crate) and variants may be added without breaking source compatibility. However, as an extra concern for tuple enums, any new variants must added after the previous ones (as otherwise it would shift the index of existing variants) and no new variant may have the same type as an existing variant, to avoid breaking source compatibility.
For generic tuple enums, this is unfortunately equivalent to not allowing any new variants that have a previously defined type without breaking source compatibility. For example:
#[non_exhaustive]
pub enum Example<T>(T, u32);
// consumer
match value: Example::<i32> {
value: i32 => /* ... */,
value: u32 => /* ... */,
}
it is now impossible to add a new i32
variant to Example
without breaking source compatibility.
The author of the RFC regrets this limitation but believes this is the only viable interpretation of #[non_exhaustive]
for tuple enums, as the alternative is to dissallow type-refinement syntax entirely and require the use of indexed syntax. Nongeneric #[non_exhaustive]
tuple enums continue to work as expected.
Drawbacks
- Makes the language larger, to add an indexed form to an existing nominal feature. (Would tuple structs be reasonable to add today if they weren't already in the language? The RFC author is unsure, but tentatively believes so.)
- Any new
enum
extension is going to have to deal with the sum type / union type distinction. Currentenum
cleverly avoids this problem (accidentally?) due to explicitly introducing a new "type" layer (in the theoretical sense, currently), making sure that all variants are disjoint and there is no difference between a set sum or union. - Sum types of existing types are typically nontrivial to use when they go beyond just being a disjoint union
- Any remaining unknown unknowns.
Rationale
This adds the ability to express a new kind of type in the 2×2×2 matrix of
- nominal vs. structural
- product vs. sum
- named vs. indexed members
of which three product types are currently expressable, and one sum type.
The rationale for enum tuples is therefore the same as it would be to add tuple structs to the language: adding more options for modeling data in types.
Specifically, the author believes that "types as enum variants" serves a very similar but distinct niche in data modeling to "enum variants are types," and that both are useful to data modeling of strongly typed trees such as syntax trees or other strongly typed tree-like data.
Alternatives
- Don't extend
enum
any, and just stick to the existing nominal/named member enums. They can do everything already, so any additions are just extra niceties on top. - While not strictly an alternative, the "variant types" proposal is an alternative proposal that addresses much the same use cases as tuple enums. The author believes both proposals can live alongside each other in the language, but accepting one also makes the other harder to justify purely on expressivity arguments. (Briefly, why both are useful: fine performance/behavior tuning, due to the differences in how variant types are sized/laid out.)
- Anonymous enums are technically an alternative to named tuple enums (as they are theoretically the same thing, just structural rather than nominal) but are also much more complicated of a feature, and much more likely to be confused with having union (deduplicating) semantics. Again, it's the tuple struct distinction; would we add tuple structs today, or just use type aliases of regular tuples?
Prior art
None that the author knows specifically for sum types of existing types, as most languages with "ADT enums" always name the enum variants. Typescript offers union types (e.g. string | number
) but these have union semantics and effectively work via duck typing and downcasting. In fact, every language that supports downcasting (including Rust with dyn Any
) supports union types; the general type upcast is a union of all potential downcastable subtypes.
If you know any languages that offer proper sum types of preexisting types, please share!
- C++
std::variant
(when usingstd::holds_alternative
/std::get
rather thanstd::visit
/decltype
/std::is_same_v
access)?
Unresolved questions
- Can and should type refinement syntax support matching further down beyond the tuple enum? How about when the tuple enum contains a tuple enum?
- Should inferred variant indexes use a type variable and allow type inference or require unambiguous ascription? How unambiguous must the ascription be? (E.g. is
Vec<_>
enough or must it beVec<T>
?) - Simple syntax substitutions, for example:
- Instead of the pattern
$path::_($pattern)
, use$path($pattern)
- Instead of the pattern
- A nice construction syntax should be provided for the simple disjoint union case.
- Option: coerce from values of the variant type to values of the tuple enum type.
- Option: provide a magic constructor function (similar to tuple structs).
- How exactly does a type refinement/ascription pattern interact with default binding modes?
- Unknown unknowns.
Future possibilities
This RFC proposed nominal sum types with indexed variants. After this RFC, the remaining kinds of types not expressible are
- structural product types with named fields (Structural records)
- structural sum types with named variants (unproposed)
- structural sum types with unnamed variants (Ideas around anonymous enum types)
CC
cc @Centril, author of the structural records RFC and type theorist active here
cc @robinm @Jon-Davis @lordan, active in Ideas around anonymous enum types
Changelog
- 2020-05-08
- Changed sugared match syntax from
expr: ExprArray
toExpr::_(expr: ExprArray)
. - Changed reference-level section's examples to be clearer.
- Changed sugared match syntax from